From an architectural standpoint, big data is the biggest thing to hit the data center in decades, says Jack Norris, chief marketing officer for MapR, which offers an enterprise-grade Hadoop platform. And it pokes holes in traditional assumptions and methods.
“The separation of compute and storage, how data is handled and processed … all of those things are being reconsidered,” Norris explains. “A lot of it is based on speed. Companies are figuring out how to do business as it happens.”
This means making recommendations and offering special deals before a customer checks out, for example. Or identifying cyber security threats before sensitive data has been compromised. Or anticipating maintenance needs before equipment and heavy machinery go down.
Conventional solutions and approaches—which typically require different data sets from different systems to be refined and normalized—make this difficult, if not impossible. With new data platforms that support distributed, fault-tolerant processing, however, organizations can re-imagine how they gather, store, and take advantage of internal and external data sources.
“Companies no longer have the luxury of time to scrub and transform their data,” says Norris. “They need an underlying, distributed data layer like Hadoop, which can handle traditional, mission-critical data as well as the massive amount of unstructured data being generated every second of every day.”
But a flexible, scalable data layer isn’t enough, he adds. It must be combined with an equally capable application layer and hardware platform that all work in concert.
“Architecture matters,” Norris insists.
He points to the combination of MapR’s Hadoop Distribution and the Cisco Unified Computing System™ (Cisco UCS®), powered by Intel® Xeon® processors, as an ideal architecture for big data. As the architecture scales to hundreds or even thousands of nodes, which is common in big data deployments, the Cisco® Application Centric Infrastructure (Cisco ACI™) extends policy models across networks, servers, storage, security, and services.
“You’re only as good as your weakest link,” says Norris. “You can’t have latency in the network, and you can’t have a shaky computing or storage foundation. With MapR, [Cisco] ACI, and [Cisco] UCS, organizations have a transformative big data platform that complements existing systems.”
As these new architectural paradigms become more commonplace, IT practitioners are being forced to evolve—and move faster.
“IT architects have been trained for years and years to slow down, understand every application, and define every database schema. Because if they got it wrong, they had to re-architect everything. Even if they got it right, changes in the business required major updates,” says Norris. “Hadoop is the opposite. There are no forced schemas, and there is a ton of flexibility. Things can be spun up in a day that would have previously taken weeks.”
This requires new thinking, new skills, and new approaches. Norris suggests starting small by deploying a big data platform and experimenting with a defined use case. Common starting points include:
- Offloading workloads from an enterprise data warehouse to reduce costs
- Pulling new data into existing applications for better customer service
- Analyzing log files to improve information security
“Deploy the platform, start to fill it with data, do a proof of concept, and expand over time,” Norris recommends. “Whatever the use case, big data is incredibly transformational and will tangibly impact a business and its customers. The competitive advantage that it can deliver isn’t hype; it’s real. Hadoop users are proving this time and again.”
And it all starts with the architecture.
“Get the architecture right, and everything else will be faster, easier, and more successful,” says Norris.