Kentik - Network Observability
Back to Blog

NPM, encryption, and the challenges ahead: Part 1 of 2

Michael Patterson

Performance Monitoring
blog-npm-challenges

Summary

Encryption and cloud adoption are creating hurdles for network performance monitoring vendors. To survive, solutions must evolve or die.


It’s interesting to observe how encryption and network performance monitoring (NPM) have evolved over time. When I first entered the networking industry right out of college, many applications sent passwords over the network in clear text, unencrypted. Since just about everyone’s PC was wired back to a repeater (i.e., not a switch), we could observe each other’s traffic with free packet analyzers and laugh. Once you saw a person’s password to any given application, you knew they were generally using the same one for all of their other applications — email, the ticketing system, the FTP and Novell servers, etc. Well, that didn’t last long.

Encrypted passwords came along as did token authentication. Then TLS, HTTPS, SNMPv3 and it continues. It’s almost as if someone out there in the ether is determined to end all passive or pervasive “unwanted” monitoring.

When companies started outsourcing the hosting of their websites to the likes of Akamai and AWS, network teams learned quickly that many work- and non-work related applications shared the same IP address. This made accurately monitoring latency and packet loss to some applications impossible.

Some NPM vendors started pairing DNS lookup records with flow data in order to separate business applications from non-business applications hosted on the same IP address. The problem is that many companies have several DNS servers spread out in far-reaching locations, and not all DNS vendors allow access to the logs. Getting all the DNS logs is often impractical.

These problems have, of course, created opportunities in the NPM market. The changes have forced vendors to evolve their tools in order to keep NetOps teams informed on how applications are performing for the end users.

What is NPM?

As a refresher, NPM solutions provide historical and real-time predictive views on the availability and performance of the network and the applications that travel over the network. Traditionally, this is done using flow analysis, SNMP, packet capture and other forms of infrastructure telemetry. However, as explained above, these techniques are often not helpful when trying to measure SaaS application performance.

Next-generation NPM solutions ingest new forms of telemetry, which allow them to overcome their dependency on older collection methods. The NPM goal is still to improve the overall end-user experience with their applications. But, this goal gets harder and harder as more and more encryption gets introduced and more services move to the cloud. Because of this, NPM solutions must evolve.

In this short video, Kentik CEO Avi Freedman discusses the many types of network telemetry data and integrations that are important to improving the value of network performance monitoring, and network observability in general:

This video is a brief excerpt from "5 Problems Your Current Network Monitoring Can’t Solve (That Network Observability Can)"—you can watch the entire presentation here.

Is Google hijacking DNS with DoH?

Probably one of the more controversial evolutions in security that we are seeing in the pervasive monitoring space is DNS over HTTPS (DoH). Google has service providers and enterprises in an uproar over their intention to have the Chrome web browser circumvent local DNS servers. This migration will allow Google to know all the websites its users are visiting, but it breaks many enterprise security and policy platforms which depend on the collection of DNS lookups. Some NPM platforms that rely on DNS logs will have a serious challenge if Google gets their way. FireFox is already supporting DoH by default.

Consider SD-WAN as another example of DoH causing problems. The SD-WAN controller grants permission to connections based on the top-level domain (e.g., facebook.com) being visited. If Google has their druthers, a lot of SD-WAN vendors are going to be in a tough spot and will have to figure out another way to enforce policy. Some have stated that it could be done by passively monitoring for the SNI in the initial HTTPS connection handshake, but maybe not for long. There is a movement to encrypt that as well.

Google isn’t the only one forcing changes in the NPM space. The Internet Engineering Task Force (IETF) is also working toward changes that will impact pervasive monitoring.

The IETF declares “pervasive monitoring is an attack”

Check it out for yourself in RFC7258. The IETF community’s technical assessment is that pervasive monitoring (PM) is an attack on the privacy of internet users and organizations. If you’ve never read an RFC, read this one as it’s only three pages of actual content. Read page two or just check this out:

The term “attack” is used here in a technical sense that differs somewhat from common English usage. In common English usage, an attack is an aggressive action perpetrated by an opponent, intended to enforce the opponent’s will on the attacked party. The term is used here to refer to behavior that subverts the intent of communicating parties without the agreement of those parties. An attack may change the content of the communication, record the content or external characteristics of the communication, or through correlation with other communication events, reveal information the parties did not intend to be revealed.

Other NPM challenges

The NPM market faces several other challenges that have been brought about by industry changes.

  • HTTP is the new TCP. Fifteen years ago, NetOps could start up a packet analyzer and see all kinds of different unencrypted protocols such as FTP, SMTP, POP3, HTTP, DNS, etc. These protocols helped the NPM identify applications. Today, many downloads are done over HTTPS and the same holds true for checking email. It looks like DNS is heading in the HTTPS direction as well. Start up a packet or flow analyzer and make note of the most popular protocol. It’s encrypted HTTPS. Encrypted traffic means limited visibility. In ten years, QUIC (quick UDP internet connections) over UDP will likely become the protocol of choice to most websites especially if Google encourages it like they did with HTTPS back in 2015. This is the year that Google announced that its search engine would play favorites to websites supporting HTTPS. In less than a year, most companies (~70%) serious about market competition made the switch. Increased HTTPS traffic made it difficult for routers to provide insights beyond traditional NetFlow v5 metrics. Flow collectors and packet capture probes are finding it more difficult to provide the traditional insight they have been serving up for years.

  • Migration to the cloud has also meant that much less data is staying on-premises. Due to the global pandemic, employees are working from home and connecting only to third-party SaaS applications to do their job. This means all of their data never goes on-premises! As a result, the big investments made in powerful NPM appliances installed in the data center for passive “pervasive” monitoring are becoming less useful.

  • With more applications natively built in the cloud, the increased use of containers and microservices has fundamentally changed the visibility and flow of network traffic. Because of this, monitoring tools are forced to search out alternative methods to see into container-to-container traffic. To date, there is hope in letting NPMs leverage APIs to recover some of the visibility as the cost to export flow data has been prohibitive for many companies.

  • NPM tools are becoming commoditized. Some vendors are noticing that their differences are less significant. The largest differentiator for some is scalable collection rates which are expensive and difficult to achieve. All vendors tout impressive collection rates, few can deliver.

Customers looking for a network performance monitoring solution need to have a clear understanding of the traffic patterns of their organization. These questions need to be answered before purchasing a solution:

  • What percent of employees are working from home?
  • What percent of employee traffic is headed to SaaS applications maintained by third parties (Salesforce, Google Docs, Slack, Asana, Gmail, etc.)?

Based on the answers to the above, customers may find themselves excluding a lot of NPM vendors because SNMP polling, collecting DNS logs and packet capture aren’t going to cut it.

Stay tuned for part 2 of this blog post where we’ll discuss how NPM is evolving.

Explore more from Kentik

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.