Where a company starts with big data isn’t necessarily where it will end up. As with any advanced technology, the learning and capability curve can take years.
“Every organization is at a different stage of big data maturity,” says Pandit Prasad, manager of Hadoop and open analytics systems at IBM.
The first stage, he explains, is typically focused on cost reduction, supplementing or replacing expensive enterprise data warehouse (EDW) systems with commodity hardware. Many companies will leverage their new hardware environment to create a data lake—the second stage—that supports data governance, cleansing, and matching.
“Most companies are in these first two stages,” Prasad claims. “It’s a matter of building foundational capabilities and bringing information assets together. But to get the most out of big data, those capabilities must be turned over to the business.”
This handoff is the third and arguably hardest stage of a company’s big data evolution. And it must be carefully considered in the very beginning.
“We always recommend starting small,” says Prasad, “but you have to have a long-term vision and the tools to support it.”
An entire suite of integrated tools
There are different tools for different stages of big data maturity. Companies focused on cost reduction and EDW offloading, for example, may have little need for sophisticated data science and analytics tools—but they might eventually.
“As your big data capabilities and objectives evolve, you don’t want to find yourself with incongruent tools,” Prasad says. “It’s a huge amount of work to integrate new components that don’t align with your pre-existing tools. And you certainly don’t want to re-design your entire stack just to take advantage of something like Spark 2.0 or new machine learning languages and libraries.”
This underscores the need for enterprise-grade platforms that come with an entire suite of integrated tools.
- IBM’s Open Platform and BigInsights leverage Hadoop and Spark to support each stage of a company’s big data evolution.
- The platform is currently being integrated with the Intel® Xeon® processor-based Cisco Unified Computing System™ for world-class performance and manageability.
“Our philosophy is to provide a stable and reliable Hadoop and Spark data platform while supplementing that platform with complementary systems and premium components to provide the enterprise-class functionality clients expect for their mission-critical data,” says Prasad.
In doing so, IBM is helping its clients focus on their data—not their tools.