Kentik - Network Observability
More episodes
Telemetry Now  |  Season 1 - Episode 3  |  December 13, 2022

What does machine learning have to do with network visibility?

Play now

 

Is data science, and specifically machine learning, just network industry marketecture, or do the process and workflows of ML actually solve real problems for network engineers working in the trenches? In this episode of Telemetry Now, Estefan Ortiz, Ph.D., joins us to talk about what ML has to do with network visibility and the truth of what it can do to solve real problems in networking.


Key Takeaways

  • [00:39 - 03:10] Introduction to Estefan Ortiz
  • [03:15 - 04:27] The definition of data science
  • [04:30 - 06:38] Why the rise in discussions about data science across industries?
  • [06:39 - 09:52] A desire to solve networking problems in new ways, and how data and the types we use can help
  • [09:53 - 10:38] Machine learning, applied statistics, and figuring out the problem you're trying to solve
  • [10:57 - 13:41] Is this a solution looking for a problem, and solving for time series data
  • [13:41 - 17:16] Detecting patterns in problem solving, actionable insights tied to operational data
  • [17:16 - 18:57] An iterative approach to problem solving with different processes and trial and error



Transcript

This is telemetry now, and I'm your host Phil Dhervasi, and Trayer warning, we will be talking about machine learning in this episode. Now I'm joking. Well, I'm I'm actually not joking about the machine learning part. I'm joking about the trigger warning.

So joining me is Estefan Ortiz, a senior data scientist with Kentic, and an expert in data science, machine learning in particular. So what we're gonna do is talk about how data science is being applied to networking today. Specifically network visibility. Yes.

We'll also touch on whether or not this is all just hype. So ultimately, that's the goal here today. Keep it re keep it honest, and learn what we're actually doing with ML in network visibility. Let's get started.

So, Estefan, it's great to have you here today. And, I do appreciate that you took some time out of your schedule to talk. Now, I know you're from Texas.

Right? But you went to grad school in Hawaii. Is that right?

That did. Yeah. Yeah. It's good to be here too.

Yeah. So I I finished up undergrad at Saint Mary's University in San Antonio.

And I thought, oh, let's see if I can go to grad school at a really nice, you know, area. I thought, why not Hawaii? And so I picked, an EE program that was, you know, strong in control theory and strong. And I think at the time when I was looking for it was, error control coding. And so Hawaii was. So I I was I went out there.

Oh, man. Two thousand three, I believe.

If I recall two thousand three, and then I worked on a master's graduated in two thousand and six, and then I decided to stay out there until about twenty ten or so in which I decided to go and pursue a PHH at the University of Notre Dame.

That's pretty amazing. So I have to imagine it was really awesome living in Hawaii for those years.

It was. It was great. It was, so at times, hard to concentrate on school, wanting to kinda get out learn to surf, be out and kind of, you know, then be out in nature and hike and whatnot. So it was a lot of fun.

Okay.

So I went to graduate school in Albany, New York, which, some people like to call the Hawaii of New England.

That is not true. I made that up. No. Nobody calls Albany that. In fact, we have bumper stickers around town. I don't live in the city of Albany, but in the area that says keep Albany boring.

Believe believe me it is. So anyway anyway, so before we get into it, you you gave me a little bit about your background.

But what what specifically do you do as a data scientist? And what is your doctorate in exactly? You mentioned EE. So I assume that's what it is.

Yeah. Well, sort of. Yeah. So I started off in EE. So I received the Masters in Electrical Engineering at the University of Hawaii, then then I went to Notre Dame to pursue, computer science and engineering.

So I received Another master's there and then a PhD.

My my focus was broadly computer vision, specifically biometrics with a focus on iris recognition and iris, detection, so to speak.

Very cool. Very interesting background. I gotta say But, I do wanna get into it now. So before we before we really unpack what data science is all about and how we apply it, can we can we establish a foundational definition of what data science is?

Good question.

So I guess the way that I see it is that data science is somewhere in between like, applied statistics and and kind of software development. So I forget I forget the the person who said it best, but it data scientists tend to be, better at software development than most applied statisticians, and then better at applied stats than most software up here.

So it's kind of that, it's kind of that middle ground.

But the, I guess, what motivates me, and I think maybe a lot of data scientists is being able to kind of sift through and look through data to kind of pull out, insights specific insights that are actionable for a given problem. Okay.

Yeah. And so I think in the network engineering space, is can you detect, you know, interesting things in, say, volumetric data over time? And then once you detect it, can you do something interesting with it?

That's really that's really cool. Now, I mean, I've been a network engineer for fifteen years or so. And this conversation, the application of data science to networking, is relatively new. Now I know technically I'm sure there were people like MIT and Stanford doing it for years and years, whatever. But generally speaking, it's relatively new conversation in the field. Do you do you have any idea why? Why is it that we're only now starting to use the methods and workflows and processes of data science applied to this industry in networking?

Yeah. So, I mean, I guess speaking just in general kind of generic terms from what I've seen, like, in other places that I've worked is that Sometimes it's just often, an adoption, part of it, like being able to express like the things that machine learning or data science can do when compared to what's already being used. And so I think maybe the slow adoption is there because the field is new and making the case for a given, area is difficult. Network engine and being one of those places.

Yeah.

And so I think that's one of the reason. The other reason is, I think, like other, like, like other areas, it's you don't wanna treat wait. You you you wanna make sure that you're expressing the ability of a given, say, trained model in a way that gives some explanatory behavior of the underlying system. And oftentimes, that's very difficult with either it it's difficult because either the the model that you've chosen is complex and it's difficult to kind of explain things, like, correctly from it.

Or the underlying data set isn't, you know, as clear to to to to those that are interested. Like, what Basically, what's the data that goes into this model? I want to know what are the factors if it's right. And and being able to explain, you know, what, what data goes in.

I think it's been you know, an ongoing process to, to kind of let everyone know how things were built, I guess, is the right way, or how things were estimated.

I think one of the things too is that, the past few years, maybe more than just a few. There seems to have been at least in my naive perspective.

Kind of like a a hockey stick exponential growth of complexity in networking. You know, when when you add all the overlays that we have now that we didn't when I first started. You know, when I was first starting in networking, things were very simplistic. You know, there was a a wan edge and there was some stuff going on in your land.

The complexity was like my wireless is acting funny. Today, it's like it's crazy. There's so much stuff going on. And I think Yeah.

I think that's probably one of the the things that's lending to this desire to to solve that problem in a new way. What's the how do we solve this problem of of visibility or of configuration management? Or whatever whatever it happens to be. Do do is that is that a problem that you're seeing with this industry?

The the type of data that we have. Network telemetry is what we we call it internally at kentech, but is that is that an issue the type, the kind of data that we're using?

It is. Yeah. So being able to I guess make sense of at the complex, datasets that are there, but not just like a single dataset, but multiple multiple datasets. So being able to correlate, you know, things in a meaningful fashion, at scale, really. So, like, you know, it's nice to be able to do small analysis on your laptop, but it's a whole different ball game whenever you're trying to, you know, put in a somewhat real time system that does the same thing across millions of data points across, you know, thousands of devices across thousands of interfaces. And that I think the scale of it all makes it very difficult to, do things correctly and then to explain the models of the results, in in a meaningful fashion.

So, yeah, so so the underlying data, the underlying, complex network topology really plays into that as well and what we can sample from from it is to yep.

Yeah. I have to imagine because you're not collecting one type of one one data type. Mean, it is Right. It is very diverse, data types and formats and, and, you know, I was just looking at some examples of how to how how scaling is done in utilization. And I'm like, well, that that must be like a huge part of what you have to do considering the diversity of data that we have. Right?

Yeah. Yeah. And and and then I think the other pieces that play into it is if you're if you're doing ML classification systems, is trying to capture, knowledge through, say, label datasets is a is another kind of whole other kinda issue to address. Like, you wanna make sure that it's quality labeled set.

You wanna make sure that you're capturing subject matter experts, knowledge correctly within that dataset. And you want to make sure that you can at least present a good I guess, stratified set, so to speak, so that you're you can see kind of edge cases in the, the classification type models, whether it's simple. This network is doing well versus this network's doing poorly. Or, yes, this is a known, you know, outlier, and it's not something that's, you know, expected, from from a given given set.

Now you just mentioned machine learning a moment ago. Is is that really what we're talking about when we say data science and how it's applied networking.

That's a good question. So I feel like it's on a, spectrum mix. Right? So, I mean, it yeah. For me, it typically starts with applied stats.

And, you know, how how far can you get away with modeling things as in a in terms of, you know, distributions.

And then you know, once that starts to hit its limits, can we use more flexible models that add a little bit more complexity and say like you know, classification type models or unsupervised learning if we're trying to do, discovery of, I guess, patterns within the dataset.

Right. Okay.

And Yeah. And so for me, it's it behaves kind of like all in the same type of spectrum. It just depends on what problem you're trying to address.

Well, I guess that sort of begs the question. What what are we trying to solve here?

You know, I mean, so we're talking about some pretty cool stuff. And, I so many questions about correlation and and causal relationships and strong correlation versus weak. So much stuff that I wanna ask you about. But what are we actually trying to solve? Are we We real honestly, a a lot of the industry is looking at this and saying, is this just a solution looking for a problem?

Yeah. Yeah. I would say guess in my day to day, what I I look at is mostly time series data. Like, how do how do things, behave over time. And so I I am trying to at least push, you know, what we currently do. And I I think you can correct me from what the industry kinda does with, these kind of detection, heuristics where you're looking at some mean and some, standard deviation, you're asking, how different is this new quantity relative to those two statistical measures. Am I far from the meme given some underlying variants?

And so there are ways that we can extend that to incorporate, you know, time series behavior. The one that quickly comes to mind is being able to capture, say seasonal patterns or periodic patterns so that if there is a peak, you can ask the question. Is this peak known? Did it occur every day at eight AM. And so I expect those values to be high. And so outliers may not be the same, whereas if it if it's maybe in a trough, where for whatever reason the network isn't as busy. And so being able to kind of take the same concepts that you use for current outlier detection but incorporate a time component to it and saying, this is my conditional expectations for this time of day.

This is the the center point, and this is the variability for that at least time of day. So it's not so sensitive to these kind of expected variations throughout the day or throughout the month or week, however we capture, seasonal patterns.

And and I guess that's what we mean by insight. Then. There's something beyond just looking at, the interface statistics. This is going beyond and doing some sort of not interpretation, but, in inference.

Right?

Yeah. I would add to that that insights is both that predictability or not predict that forecasting ability, but also, like, the, the action that's associated with it. So it's always been Hey, I've I've detected something cool, which, you know, from an ML or data science, perspective is awesome, but then it's What do we do with that? Like, and I think that's where a lot of the the subject matter expertise comes into play.

Like, can we present the scenario and say we've detected something now go now go fix this interface or it's a problem with this link or we're having problems reaching this point, but so are ten ten other people or ten other companies. And so the Yeah. Yeah. So for me, it's like both, the interesting part that you can take, but the call to action that that drives it too.

So we're talking about ingesting a ton of information doing, you know, a statistical statistical analysis, you know, perhaps using some ML algorithms. All this really, cool stuff But, ultimately, it's so that an engineer can go fix it in our face. Yeah. Whatever. Right? No. But I'm I'm I'm I'm thinking.

It is, though.

Maybe you can make the work run better in locations get delivered properly, well, properly, and all that.

Right?

Yeah. No.

I left because the the the the pattern seems to be consistent from back What do you mean?

From back when I used to work on aircraft health monitoring at University of Hawaii, it was the same problem. Can we look at operational data? Can we tie it to, maintenance information? And then use that to say, Hey, we've detected something wrong with the plane. What do we do with it? Like, what how do we how do we act so to speak?

Yeah. That makes sense. And that's that whole idea of actionable insight, not just an insight. Mean, something that you mentioned was, you know, you're collecting a bunch of data, right, and you're able to find some sort of pattern, but to a subject matter expert, then, you know, you presented to them and they're like, yeah, who cares? Yeah. At networking field day, twenty nine a few weeks ago of last month. I I made the comment that, you know, let's say you're you're you're analyzing, you know, this telemetry that's coming in and you see you have this four hundred gig interface in your data center, which is not an uncommon bandwidth amount.

And it's plugging away at one meg per second, not one gig, but one meg. It's a tiny, tiny fraction. And then you see it jump to ten megs, which is a statistically significant, increase. Right.

Yet it is so small. It really to a subject matter expert who knows networking, looking at that is they know immediately. Has no bearing on application performance, has no bearing. Maybe it's something to check out and you put up like a warning, like something's going on, but it's really not mission critical.

And that's what I remember.

A former colleague of mine talked about it in this way. He said it was the difference between weird and bad. It's weird, but it's not really, like, it's not really a bad thing. So what do we do with it? That's the that's the the quality of the insight. Right?

Yeah. And so for me, like, when, when, you discuss it or when I hear the same thing, like, hey, you've detected, you detected something interesting, but it's not significant. Like, I try to internalize that and say, well, how do I map that it's not interesting part to some sort of quantity or some sort of algorithm, whether it's doing something simple like a, an effect size to that tries to translate that into some, mathematical behavior where we've detected the outlier, but relative to, its magnitude change. So to speak, it doesn't It doesn't mean much.

Right. And then or or, relative to the device itself, Like, you know, you have a, I forget it was like a forty gig, a four hundred gig, bandwidth, like, relative to that, you know, threshold of that measure.

Is it significant? And so being able to kind of map that to, to something that we can kind of incorporate in the algorithmic process is is awesome. And so that would be kind of like, you know, that would be the the rule, the rule based approach, not rule based, but that would be incorporating that that rule or rule of thumb or being able to take that and go back and look at the data and have you know, myself or others label it saying, yes, these are insignificant. These are significant.

Use those labels now to kind of feed through the algorithm and say, let's improve upon this because we've showed it to, interested folks. They don't think it's very interesting. So now let's feed close that feedback loop text and use that information.

It sounds like the process of using, all of these methods and process of of data science from a high level is very iterative. That it's not like you slap an algorithm and you got all the answers and now all your engineers are happy because they all their problems are solved. Right. Sounds like it's a constant process of trial, maybe trial and error.

I hate to use that because, you know, as a scientist, I'm sure you don't wanna, you know. Yeah. You want black and white answers, but it's a process, right, of of getting things more accurate, more meaningful. That right?

Yeah. Sophisticated trial and error. Kind of re yeah. Rel rely on the scientific method a little bit.

But, yeah, Very good.

Very good. Very good. But you mentioned that, like, and there's a whole field that's starting to crop up behind just what you said. Like, there's m ML ops.

Just similar to DevOps, where, they're doing the monitoring of these systems to say, you know, this one's drifting away from some set point. Let's go ahead and retrain, or we've got feedback from the field that says these are these these insights aren't very good. How do we, how do we bring that and feed it back into either the original data set or the modeling portions, of, of the, of the algorithm.

So, Yeah.

We we've been we've been pretty high level today. I really wanted to get an idea of this concept of data science and why and how we apply it to networking, but so much of the stuff that you said has, you know, I have so many questions that I wanna follow-up on. Like, for example, you talked about anomalies earl earlier and something I struggled with in my career was, you know, getting a a platform in front of me that's firing off alerts for anomalies and and they're all false positives. I'm like, that's that's not an issue.

And and I end up not trusting the tool. And how do we how do we deal with that? You know, that's that's the there's a lot of questions that I I have for you. But, just because of time, you know, I do wanna close closest out now.

So it it has been really a great episode talking about data science and machine learning and, you know, getting to unpack some of this, this the real meat behind what the industry is doing right now. So I, thank you Estefan for joining me today. I really do appreciate it. And, and before we go, if folks wanna, you know, reach out to you online, maybe ask a question or have a comment.

How can they get in touch with you?

Sure. They can send me an email to to my email address, e Ortiz at, kintech dot com. That's he o r t I z at emtek dot com.

Okay. Great. And you can find me on Twitter at network underscore fill, and you can search my name in LinkedIn. And until next time, thanks very

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS? Well, you're in the right place! Telemetry Now is the podcast for you! Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.