Seeing Beneath the Surface with Post-Hadoop Big Data
Not long ago, the New York Times published a fascinating article about a rug designer named Luke Irwin who lives in Wiltshire, England. Irwin needed to run some electrical cables under his yard. While digging the trench, his contractor revealed an intricate mosaic floor of red, blue, and white tiles just 18 inches down. That’s how Irwin learned that his family home was built on top of a luxurious villa that was inhabited by upper-class Romans between A.D. 175 and 220.
Comprising an estimated 20 to 25 rooms on the ground floor alone, the Irwin site is one of the richest Roman-era archaeological discoveries in recent history. According to the Times, the heritage organization Historic England called the find “’unparalleled in recent years,’ in part because the remains of the villa, with its outbuildings, were so undisturbed.” With just a little digging, Irwin had uncovered a trove of nearly unprecedented value lying just beneath the surface.
Irwin’s story may be interesting, but what does it have to do with network traffic data? The answer is rooted in the experience of Kentik’s founders, who’ve spent decades building and operating some of the world’s biggest and most complex networks. They know first hand that network teams typically carry around a vast reservoir of technical and institutional knowledge in their heads. But the value of that knowledge often remains buried because even experienced organizations have traditionally lacked the timely, comprehensive information required to yield actionable insights. Until now the tools available to generate and access such information have been limited at best. At Kentik, we believe deeply in the power of post-Hadoop Big Data to address those limitations, making rich data readily accessible not only to engineering and operations, but also to wider areas of the organization.
Access to rich data matters in part because it enables insights that can make routine tasks far faster and more accurate. But information can also power innovation — not just seemingly unattainable innovation with a capital “I,” like flying cars, but also continuous incremental improvement in the operation of a digital business. Data-driven insights can reduce costs, achieving huge efficiencies over time. They can also improve network performance, laying the foundation for improved user experience, new features that weren’t previously feasible, and new revenue streams. The result is to boost user/customer satisfaction, make a business more competitive, and increase profits. (I wrote previously about this kind of potential in Moneyball Your Network.) At the same time, access to rich data makes network teams happy because it empowers them to go beyond drudgery, driving the business forward with passion, excellence, and creativity.
While this scenario sounds idyllic, it’s unfortunately not the reality for most network teams today. Like the pre-dig Irwin family, surrounded by buried riches, too many network organizations are separated from the true value of their network data by legacy limitations on the collection, storage, and analysis of flow records (e.g. NetFlow) and other other network traffic data like BGP and GeoIP. And too many network managers and operators are trapped in a whack-a-mole existence, with insufficient data to make decisions and insufficient tools and resources to close the gap.
Built on appliances, text files, or SQL databases, traditional network traffic analysis systems reduce rich, raw data to a few indexed tables, discarding most details in the process. Limited, slow, and costly, they’re too shallow to get you even 18 inches down, as it were, to the true value of your network data. Sure, you can get some pretty graphs of summary views, but without real analytical depth. For the practitioners who have to operate, engineer, and improve service delivery, shallow data is a bit of a curse.
The alternative to these old-school systems has been Hadoop-based Big Data approaches. Some (think MapReduce) are prohibitively slow for operational use. Others (Spark, ELK) are prohibitively costly when you add up what it takes to get both raw data ingest and ad-hoc analytics in operational time frames. And that doesn’t include the cost, in the OSS case, of building and maintaining your own user-friendly user interface for analytics. Without it, the utility of your system is limited to a tiny cadre of expert users. You put in a lot of hard work, capital, and operational expense, but you shut out the broader set of users that would enable you to get a meaningful return on your investment. So while building a Big Data system on your own may seem like a promising solution, in reality it can be a scary (business) proposition.
Kentik exists to enable customers to unearth the value of their network data. That’s a job that requires the retention of massive volumes of raw data, the ability to instantly dig deep into details, and the flexibility of unconstrained data exploration. No stingy, limited indexes, no fragile BI data cubes. Instead we give you the freedom to perform any ad-hoc query on any subset of your data and the speed to get results in a few seconds or less. We give you fast time-to-value, getting you from sign-up to traffic visibility in fifteen minutes or less — without installing software or deploying massive on-premises machines. And we give you an affordable datastore that you can leverage via REST or SQL APIs for use by 3rd-party systems for DDoS mitigation or business intelligence. So with Kentik Detect you won’t be left looking at just the surface of your network data, wondering what unrealized business value lies buried below.
Ready to learn more about Kentik Detect? Read how we handle queries against huge volumes of traffic in this blog post on designing for database fairness. Register to download the Kentik Data Engine white paper. Or see for yourself what you can do with Kentik by signing up for a free trial. And if you’re inspired to get involved, we’re hiring!