Kentik - Network Observability
Case Study

New Relic boosts network performance and digital experience with Kentik

Overview

The world’s best engineering teams rely on New Relic to visualize, analyze and troubleshoot their software. The company’s observability platform, New Relic One, relies on cloud services to help customers scale quickly. However, if a downstream network provider has a service disruption, New Relic’s platform performance could be affected and its customers could quickly feel the impact. With Kentik Synthetics, New Relic gains proactive insights about network performance and can ensure its customers have a reliable digital experience.

Situation

New Relic provides its 17,000 global customers with a full-stack observability platform delivered as a SaaS. Developer and operations teams that use New Relic One respond faster, optimize better and build more perfect software. As a result, enterprises in a broad range of industries gain increased confidence that their customers, partners and suppliers have the best possible digital experience.

New Relic performs processing of customer data in its six U.S. data centers, one in Europe, with two PoPs, and uses public cloud services, including AWS, to ingest application telemetry data and deliver the analysis needed to improve their customers’ software and services.

But if customers use New Relic to ensure that their businesses run smoothly, how does New Relic ensure the same can be said for its own infrastructure? That’s one of the challenges faced by the team responsible for running the data centers and network connections at the heart of New Relic’s services.

“We use our own software to monitor our operations,” says April Carter, the senior software engineering manager at New Relic. “That works well for our internal processes, but over time we came to a point where we didn’t have as much visibility as we wanted into the network connections that link us with our customers.”

Kentik’s network observability includes synthetic testing for data center and edge, in addition to flow monitoring. It’s an entire suite for the network.

Carter says that, increasingly, those network connections can be cause for concern. “We occasionally see ‘hiccups’ within AWS Direct Connect or some downstream provider, and those have a direct effect on the experience our customers have. If there’s a problem with ingest through the cloud, then all kinds of things can go wrong.”

Pedro Carvalho, a senior network engineer at New Relic adds, “One of the big challenges we find in a hybrid environment is troubleshooting the network. It’s very hard to find and diagnose network connections unless you have a tool that can see things from the outside.”

Carter, Carvalho and their team’s focus is totally on reliability. “Our journey through the cloud has taught us a hard truth: There’s a world of complexity in a hybrid environment, especially when it comes to the network,” says Carter.

Solution

The need for network observability led New Relic to evaluate providers with the technical depth and range of capabilities to satisfy their high standards for reliability. Top of mind was a solution for synthetic monitoring to complement real-time network visibility. This would allow Carter, Carvalho and the team to use synthetic network traffic to determine if the service delivered to its customers is functioning (or will function) in an optimal way based on changing conditions.

Kentik helps us diagnose network-related problems much faster. The ability to dive in and put a trace on every connection is a huge benefit.

“We first looked at another well-known synthetic monitoring vendor,” Carter says. “But they didn’t have a data center-caliber product. So, we also looked at Kentik, and found that they had the complete package of what we needed: network observability that includes synthetic testing for data center and edge, in addition to flow monitoring. It’s an entire suite for the network.”

“When it comes to synthetics, it’s challenging to monitor ourselves. We can’t use our own product to test the network, because if it’s down then everything is down,” says Carvalho. “We applied Kentik Synthetics right away.”

Results

“Kentik helps us diagnose network-related problems much faster,” Carvalho says. “The ability to dive in and put a trace on every connection is a huge benefit. And the network visualization is impressive, too.”

Improving digital experience

New Relic is well aware that network problems can impact its reputation with customers. “Even when it’s not our fault, it’s a ding on New Relic if there’s any interruption or slowdown in a link to the cloud. So, we want those interruptions to be as brief as possible or, of course, to be avoided in the first place,” says Carter.

Kentik Synthetics gives us the detailed information we need to identify any problems, and either head them off or resolve them quickly.

“Kentik Synthetics gives us the detailed information we need to identify any problems, and either head them off or resolve them quickly,” Carvalho notes. “For example, just the other day we were seeing some packet loss with one of our service providers, and we were able to very quickly identify the issue and address it before anyone noticed.”

An issue that might have taken an hour or more to resolve with previous tools can now be accomplished in 10-15 minutes with Kentik.

Clarifying supplier issues

An ongoing challenge in managing a network-based digital service is identifying the source(s) of performance issues and holding parties accountable. Carter jokingly refers to the process as “mean time to innocence,” or the hesitancy of network and access providers to own up to issues.

“Like all enterprises, we occasionally experience packet loss between our network and the cloud, and we usually have to prove it. Now, we have the proof,” she adds. “Kentik gives us the ability to diagnose issues, and if it’s vendor-related, we can instantly tell that vendor exactly what links are affected and when.”

Streamlining operations

The Kubernetes dashboard from Kentik means that even our container fabric team can apply insights from the network to the work they do.

New Relic was an early user of Kentik Firehose, which provides a full range of network observability data directly from Kentik’s network telemetry data platform. This adds an unmatched level of “network awareness” to the monitoring tools available to IT managers.

The data provided by Kentik is used by several teams within New Relic’s IT department, including network operations, traffic and routing, product management and software development. Each of these users gains added insights into their area of focus because of the highly granular, real-time information provided by by Kentik Firehose.

Operations teams also benefit from the high level of integration between Kentik and New Relic One, the primary dashboard for New Relic services. “Even though Kentik has an excellent UI, our teams can still take advantage of the Kentik data in a format they’re already familiar with, without learning a new interface,” Carvalho says. “And the Kubernetes dashboard from Kentik means that even our container fabric team can apply insights from the network to the work they do.”

Kentik also is incorporated into internal processes at New Relic. The team uses measurements populated with Kentik data to set service-level objectives (SLOs) and service-level indicators (SLIs). They also employ Kentik during what Carter calls “game days,” when the team purposefully disrupts IT operations to understand the response.

“Kentik shows me exactly what’s happening throughout the network, so I won’t miss anything because someone doesn’t notice it. That’s very valuable,” adds Carter.

Having worked with a variety of network-monitoring and mapping tools over the years, Carter says she is impressed with the thoroughness and ease-of-use of Kentik. “I didn’t have to do anything to get a full and complete network map out of Kentik. Once the data is there, it builds itself.”

Carvalho adds that Kentik enables him and the team to use fewer network-monitoring tools. “All the tools we have were developed in-house,” he says. “Kentik is the only tool I was willing to pay for; it’s that good.”

Key takeaways

Kentik is the only tool I was willing to pay for; it’s that good.

Kentik’s insights on external network connections are seamlessly integrated into New Relic’s own application-monitoring tools. This allows the New Relic team to move quickly when issues arise, or even prevent problems from occurring at all. Kentik’s detailed information also gives New Relic data to have meaningful conversations with cloud and network providers to resolve responsibility for problems that have occurred.

The time it takes to diagnose network issues with the potential to disrupt services to customers is sharply reduced. An issue that might have taken an hour or more for New Relic to resolve with previous tools can now be accomplished in 10-15 minutes with Kentik.

Ultimately, the actionable performance and reliability insights from Kentik help New Relic ensure its customers have a great digital experience.

Download the Case Study

Category

  • Software company

Challenge

  • Periodic network interruptions threatened customer experience
  • Lack of visibility hindered network diagnosis and resolution

Solution

  • Kentik provides a comprehensive, real-time view of performance throughout all networks, both private and cloud

Results

  • Reduces time to diagnose network-related problems
  • Improves digital experience for customers
  • Streamlines operations
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.