Kentik - Network Observability
Back to Blog

A Guide to Managing Large-Scale Enterprise Infrastructure

Phil Gervasi
Phil GervasiDirector of Tech Evangelism

Network Infrastructure


Learn all about the most common challenges enterprises face when it comes to managing large-scale infrastructures and how Kentik’s network observability platform can help.

Large enterprises, with their extensive network infrastructures, are the backbone of the digital age. However, managing these sprawling networks is challenging regarding cost management, resource scalability, and performance monitoring.

In this guide, you’ll learn about the issues large enterprises face when it comes to managing their network infrastructure and how you can efficiently manage it with Kentik.

Kentik provides large-scale enterprises with the visibility insights they need to manage their complex networks. It collects and analyzes data from traditional on-premises network devices, security appliances, cloud and SaaS providers, containers, and the public internet itself. Additionally, Kentik augments this hard data with business context to enrich the underlying unified data repository. This provides a comprehensive view of network traffic and performance. Kentik also uses machine learning (ML) to identify anomalies and potential problems and then suggests ways to fix them.

Common challenges in large-scale infrastructure management

Enterprises operating massive software-defined wide area network (SD-WAN), campus, or cloud network infrastructures often encounter the following issues:

  • Scaling challenges: As businesses grow, their networks expand exponentially, often integrating more tightly with the internet, using advanced technologies such as network abstractions, and employing complex delivery workflows. The challenge lies in guaranteeing that the network infrastructure can scale seamlessly to accommodate the influx of new devices, users, and technologies.
  • Difficulty meeting stringent service level agreements (SLAs): Meeting stringent SLAs is crucial for maintaining network performance, uptime, and reliability, which can be a daunting task at a large scale. SLAs for large-scale networks are typically very demanding, with high uptime, performance, and reliability requirements. The complexity of large-scale networks makes it difficult to identify and troubleshoot the root cause of performance issues, and the lack of visibility into network performance and resource utilization makes it difficult to manage SLAs proactively.
  • Inefficient resource allocation and cost management: Balancing the allocation of resources across various network components while keeping costs in check is a continuous challenge. Lack of visibility into resource utilization can make it difficult to allocate efficiently. Difficulty in forecasting future demand can lead to overprovisioning or underprovisioning of resources as well.
  • Complex interconnectivity and integration challenges: Integrating diverse components, services, and technologies into a cohesive network architecture can lead to complexity and integration bottlenecks. Large-scale networks often integrate a wide range of components, services, and technologies from different vendors, which may have very different control plane mechanisms and proprietary components (e.g., integrating SD-WAN devices from multiple vendors or setting up a variety of business applications, including enterprise resource planning or ERP, customer relationship management or CRM, and software-as-a-service or SaaS applications).
  • Monitoring and performance optimization: Continuous monitoring and optimization of network performance are essential but demanding tasks. Meeting evolving demands while preempting issues like latency, packet loss, and congestion necessitates vigilant oversight and proactive optimization efforts. Large-scale networks generate a massive amount of data, which can be challenging to collect, store, and analyze.

Effectively tackling these challenges is crucial for maintaining network resilience and operational efficiency.

How Kentik helps manage large-scale enterprise infrastructure

If you’re looking for a solution for collecting and analyzing vast amounts of very different data, Kentik is the answer.

Kentik integrates with other systems in a variety of ways:

  • Application programming interface (API): Kentik provides a comprehensive set of APIs that are flexible and easy to use, allowing you to integrate it with a wide range of systems, including ticketing and alerting systems, management tools, and security solutions. You can also use Kentik APIs to automate workflows (e.g., creating a ticket in your ITSM system when Kentik detects a network outage).
  • Webhooks: Kentik can send webhooks to notify other systems of events, such as network outages or performance issues. Webhooks are simpler to set up and use than APIs and stream data.
  • Stream data: Kentik can stream network data to other systems, such as data warehouses and security analytics platforms. Streaming data allows you to analyze network data in real time, which is useful for quickly identifying and responding to problems.

The integration method you choose depends on your specific needs. If you need a flexible and powerful way to integrate Kentik with other systems, then you should use the APIs. If you’re looking for the simplest solution, webhooks are the way to go. Or, if you need to analyze network data in real time, you should use streaming data. All three integration methods are scalable and can meet the demands of even the largest enterprises.

Kentik use cases

Here are some specific examples of how Kentik can be integrated with other systems:

  • Kentik can be integrated with alerting, ticketing, chatOps, and network management tools to provide a unified view of network performance and resource utilization. This can help network administrators to identify and troubleshoot problems quickly and effectively.
  • Kentik can be integrated with security solutions to help enterprises protect their networks from cyberattacks. For example, Kentik can be integrated with intrusion detection systems (IDS) and intrusion prevention systems (IPS) to provide real-time visibility into network traffic and security threats.
  • Kentik can also detect a DDoS attack and integrate with a third-party mitigation provider. Using a workflow of alerts and API integrations, Kentik can then kick off a mitigation activity.

By integrating Kentik with other systems, businesses can get the most out of their investment in Kentik and improve their overall network management capabilities:

Infrastructure diagram

In the rest of this article, we’ll explore how Kentik can assist large enterprises using a real-life scenario of a large e-commerce company that relies heavily on cloud services to run its online store. In this scenario, the company has a complex network with multiple cloud providers, and it needs to ensure that its network is always up and running to meet the demands of its customers.

When it comes to large-scale enterprise infrastructure, the reliability of external service providers and vendors plays a pivotal role in ensuring seamless network performance. Enterprises like the e-commerce company discussed previously heavily rely on external cloud, SaaS, and SASE service providers or vendors to meet their SLAs.

SLAs form the foundation for ensuring the smooth operation of mission-critical applications and services, defining expectations for network performance metrics, including latency, packet loss, uptime, and bandwidth availability. Enforcing SLAs is crucial as it prevents service disruptions, financial losses, and damage to a company’s reputation.

How Kentik can help

The Kentik Network Observability Platform equips enterprises with powerful tools, such as alerting and reporting tools, and a service level management (SLM) module that proactively manages and enforces SLAs.

For instance, to monitor SLA compliance in our e-commerce example, Kentik can continuously collect and analyze link performance data, including metrics like bandwidth utilization, latency, jitter, and packet loss. Additionally, enterprises can track the real-time performance of their service provider links, ensuring that SLAs are met:

Sample code snippet for monitoring SLA compliance

Additionally, to set up alerts, Kentik allows users to define alerting thresholds based on specific SLA metrics, serving as early warning systems. When link performance falls below predefined SLA thresholds, Kentik triggers immediate alerts, notifying IT teams or administrators:

Sample code snippet for setting up alerts

To generate SLA reports, Kentik provides robust reporting tools. These reports are valuable for auditing service provider performance, meeting contractual commitments, and providing documentation for stakeholders:

Sample code snippet for generating SLA reports

Track transit connectivity costs

Cost management is important for any business, especially when managing extensive network infrastructures. One of the ways Kentik can help enterprises optimize their network expenditure is by meticulously tracking transit connectivity costs.

Here’s why tracking transit connectivity costs is crucial:

  • Optimizes network expenditure: Large enterprises often allocate substantial resources to network connectivity. Tracking transit connectivity costs allows organizations to optimize their budget allocation and prevent overspending on underutilized resources.
  • Efficient resource allocation: With accurate cost tracking, organizations can allocate network costs to specific departments, projects, or services based on their actual usage patterns. This level of granularity empowers enterprises to make informed decisions about resource allocation.
  • Identifies cost-saving opportunities: Kentik’s data-driven insights enable enterprises to uncover cost-saving opportunities. By identifying underutilized links, inefficient routing, or redundant resources, organizations can make informed decisions to reduce expenses and maximize the efficiency of their network infrastructure.

How Kentik can help

Kentik offers a real-time cost analysis feature that provides granular insights into transit connectivity costs.

For instance, in the large e-commerce company discussed previously, Kentik can continuously collect and analyze data on network traffic and transit costs for the e-commerce company’s cloud infrastructure, providing real-time visibility into expenditure. Kentik offers insights and suggestions based on the data it collects, and Kentik can also identify and alert users to unusual traffic patterns and other network anomalies. Users can access this data through Kentik’s platform, which includes intuitive dashboards, AI-driven insights, and reports, enabling users to monitor costs effectively:

Sample code snippet for real-time cost monitoring

When it comes to cost allocation, the e-commerce company can use Kentik’s platform to allocate network resources or budget to specific entities within their organization, such as different product departments or development teams, based on the actual network usage patterns of these entities. Or you can allocate network costs to specific entities, such as projects or services, based on their network usage patterns. This feature provides a transparent view of which areas of the organization are incurring costs and helps optimize budget allocation:

Sample code snippet for cost allocation

Additionally, by leveraging Kentik’s data-driven insights, an e-commerce company can identify cost-saving opportunities within its cloud infrastructure. By analyzing network traffic patterns, the platform can pinpoint areas where resources are underutilized or inefficient routing may lead to unnecessary costs:

Sample code snippet for identifying cost-saving opportunities

Migrate and replicate workloads across data centers and to the cloud

Enterprises often find themselves in situations where migrating or replicating workloads becomes a necessity. For instance, enterprises may need to redistribute workloads among data centers or cloud instances via load balancing to ensure optimal resource utilization and prevent bottlenecks. And when it comes to disaster recovery, planning for unexpected events is crucial.

Replicating workloads to remote data centers or the cloud provides a failover solution, ensuring business continuity in the face of disasters. Moreover, to efficiently scale as demand increases, you may need to replicate workloads to accommodate a change in traffic or resource requirements.

Why and when enterprises should do this

Businesses often find themselves in situations where migrating or replicating workloads becomes a necessity. For instance, load balancing may be needed to redistribute workloads among data centers to ensure optimal resource utilization and prevent bottlenecks.

Additionally, planning for unexpected events is crucial. Replicating workloads to remote data centers or the cloud provides a failover solution, ensuring business continuity in the face of disasters. As businesses grow or experience spikes in demand, they may need to replicate workloads to accommodate increased traffic or resource requirements.

How Kentik can help

The Kentik Network Observability Platform offers a robust suite of tools and capabilities that facilitate seamless workload migration and replication. Kentik supports on-premises private data centers and public cloud environments, and it can collect and analyze data from a wide range of devices, including physical and virtual network devices, security appliances, load balancers, SD-WAN solutions, and containers. It can also collect data from cloud-based services like Amazon Web Services (AWS), Azure, and Google Cloud Platform.

Following are a few of the ways Kentik can assist in this critical process:

When it comes to network health monitoring, Kentik continuously monitors the health and performance of the entire network infrastructure, including links, routers, switches, and the path itself between the two points in the data migration. This real-time monitoring provides a comprehensive view of the network’s status, ensuring it remains stable and responsive.

For instance, if an e-commerce company is replicating workloads to a cloud provider, Kentik helps ensure that the network connectivity to the cloud remains reliable throughout the process:

Sample code snippet for network health monitoring

Additionally, Kentik’s advanced traffic analysis capabilities allow organizations to gain in-depth insights into data flows and dependencies between applications and services. This visibility is invaluable for planning and executing migrations. For example, when replicating critical e-commerce applications, a company can use Kentik to understand how this transfer affects network traffic and ensure that user experience remains unaffected.

Kentik also offers several additional security and compliance features that can be helpful during data migrations:

  • Network encryption: Kentik encrypts all network traffic in transit and at rest, protecting data from unauthorized access.
  • Role-based access control (RBAC): Kentik RBAC allows you to control who has access to your data and what they can do with it. This helps to ensure that only authorized users can access your data.
  • Audit logging: Kentik audit logging tracks all activities on your account so you can see who’s accessed your data and what they’ve done with it. This helps you to detect and investigate any suspicious activity.
Sample code snippet for traffic analysis

In regard to resource optimization, Kentik’s vigilant network monitoring during migration enables organizations to optimize resource allocation in real time. By ensuring that the right resources are available when needed, Kentik helps guarantee a smooth and efficient workload transfer while maintaining performance levels.

For instance, if a company replicates its online store during a busy shopping season, Kentik helps allocate sufficient resources to handle the increased load effectively:

Sample code snippet for resource optimization

Negotiate peering relationships

Negotiating peering relationships is crucial for an e-commerce company’s network management. Here’s why and when they should engage in this practice:

  • Efficient traffic routing: Peering relationships are fundamental for ensuring efficient traffic routing. As the e-commerce company’s online store serves customers globally, they must negotiate and maintain these relationships to minimize latency and provide a seamless shopping experience.
  • Cost-effective data transfer: By optimizing peering relationships, the company can lower the costs associated with data transfer, especially as they handle large volumes of customer data and transactions.

How Kentik can help

Kentik provides valuable insights into network traffic patterns, enabling enterprises to assess the performance and impact of peering relationships. Here’s how Kentik can assist:

  • Traffic analysis: Kentik’s platform allows e-commerce companies to analyze traffic patterns and identify the volume of data exchanged with peering partners. This information is critical for understanding which partners contribute most to their traffic.
  • Performance assessment: By monitoring traffic, latency, and packet loss, Kentik helps companies assess their peering relationship performance. This data allows an e-commerce company to make informed decisions about the effectiveness of their peering agreements, ensuring that customer transactions occur smoothly and without delays.
  • Route optimization: Kentik provides data on routing efficiency, helping the e-commerce company optimize traffic routing to ensure the best possible performance and cost savings. For instance, if they have multiple peering partners, Kentik can help them choose the most efficient routes for data transfer.

Kentik empowers enterprises to negotiate and maintain peering relationships confidently, ensuring their network traffic flows efficiently and cost-effectively.

Code for negotiating peering relationships


Managing large-scale enterprise infrastructure is complex, but it’s essential for staying competitive in the digital era. Kentik empowers data center engineers and network professionals to tackle network management challenges effectively.

In this article, you learned about some of the most common challenges enterprises face when it comes to managing their large-scale infrastructures and how Kentik’s network observability platform can help.

Kentik’s deep network monitoring, troubleshooting, capacity planning, and digital experience monitoring capabilities make it a powerful tool for large enterprises. By embracing network observability, enterprises can proactively manage their networks, ensuring they meet SLAs, optimize costs, and remain agile in a rapidly evolving digital landscape.

Explore more from Kentik

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.