After years of hype and development, big data has crossed the proverbial chasm from trendy concept to bona fide business enabler. According to Gartner1, 73 percent of organizations have invested or plan to invest in big data in the next two years. But because it is still relatively new, many of these investments are focused on pilots, proofs of concept (POCs), and early deployment projects.
“Most companies are keenly interested in big data,” says Dave Kloempken, global sales director for data center systems at Cisco, “but there’s a lot of confusion about how to get started and how to show value. It’s a process to find the right use cases, but once customers do, the deployments expand rapidly.”
“Companies that get past the initial hurdles are seeing great results,” adds Brandon Draeger, director of business development for big data solutions at Intel®. “Research from Bain and Company has shown that companies successfully leveraging big data are not just more profitable, but also better able to understand and meet their customers’ needs.”
The three Vs: volume, variety, velocity
The explosion of computing devices and applications is at the heart of today’s big data fervor. The growing volume, variety, and velocity of modern data sets—often called the “three Vs”—are not only a challenge for organizations, but also an incredible opportunity. Once captured, organized, and analyzed, big data can lead to differentiating insights that can be turned into business value.
But it’s not that simple. The proliferation of unstructured data is now greatly outpacing the structured and refined information typically found in an enterprise data warehouse. And these conventional database environments were never designed for the “three Vs,” making them a poor fit for big data purposes.
“Traditional data warehouses can’t accommodate unstructured data and can be extremely expensive to scale,” says Kloempken. “That’s why Hadoop has emerged so quickly.”
“It’s really about using the right tools for the job,” says Draeger. “Ninety percent of new data being created is semi-structured or unstructured and is expected to reach a scale of 40 exabytes by 2020. Conservatively, that is about 5.2 terabytes for every person on earth.”
The rise of Hadoop
An open source software project that enables the distributed, fault tolerant processing of large data sets across clusters of servers, Hadoop is quickly rising to the big data challenge. Designed for unstructured data, Hadoop can merge conventional reports with a wide variety of data sets—from web logs and sensors to Facebook and Twitter—all in one database. It’s also 20 to 50 times less expensive than traditional methods of data management and storage.
“Data warehouses aren’t going away. They’re still better suited for many important, repeatable needs, like operational reporting,” says Kloempken. “But because of its cost, flexibility, and scalability, Hadoop is extremely attractive for a number of newer data sources and use cases.”
These use cases don’t always start with big data intent, says Bob Fosina, big data software and solutions specialist at Cisco. Many organizations are adopting Hadoop for cost saving purposes and then using the platform for big data experimentation.
“Companies are learning that they can use Hadoop to offload their ETL [extract, transform, load] processes, or pull out some of the stale data that is crowding their data warehouse, or reduce the cost of mission-critical software licensing,” Fosina explains. “There are a number of compelling opportunities to reduce storage and licensing costs while at the same time creating a platform for big data POCs.”
These “big data on-ramps,” as Fosina likes to call them, are turning Hadoop into a mainstream data management and storage platform. And adoption will only increase as countless pilot projects blossom into full-blown production deployments.
Whether Hadoop is deployed for immediate cost savings, long-term business transformation, or both, Kloempken recommends starting small with big data. Data warehouse optimization is often the first step, laying the foundation for operational analytics and business insights that grow over time.
“Once you have the platform, you can define a use case, pull in a few data sources, and conduct a pilot project,” says Kloempken. “And that will invariably lead to other use cases. It tends to snowball.”
And for good reason: According to the MIT Sloan Center for Digital Business2, data-driven enterprises outperform industry peers by up to six percent, and are up to 26 percent more profitable.