Telemetry Now | Season 2 - Episode 51 | July 17, 2025

Getting Started with Building AI Applications for IT Operations

Play now

Ryan Booth, network engineer and AI developer, joins Philip Gervasi to explore the practical realities of building AI applications specifically for network and IT operations. They talk about the importance of starting with clear business objectives, understanding data engineering essentials, managing model evaluation, and bridging knowledge gaps between infrastructure and data teams. Ryan shares his insights about the need for iterative learning, using simple projects to gain practical skills, and recognizing how AI can effectively solve real ITOps and NetOps challenges.

Transcript

A lot of folks are now sort of getting past that initial hype and really even speculation about using AI in IT operations.

And, you know, the the industry has been using more classical AI in the form of ML and data science workflows for years, but there is a lot of new interest in building AI applications specifically for network engineers to use to run their environments better somehow.

But as anyone who's worked in any kind of software development role for any length of time, even a beginner would know, this dream of building applications, especially AI applications, it's very different from the reality of building them when the rubber meets the road, and you have to deal with data and missing data and disparate systems and different evaluation criteria and then different teams all working together.

So today, have Ryan Booth, who's a returning guest, a friend of the podcast, a friend of mine, and someone whose opinion I really value because he's been both a very senior network engineer operating at the CCIE level, but also a software developer specifically working in this realm of AI for years now.

And today, that's exactly what we're gonna talk about, building AI applications for IT and more specifically network operations, and what that means when you actually start putting the pieces together.

My name is Philip Gervasi, and this is Telemetry Now.

Hey, Ryan. Welcome back to the podcast. It's great to have you on again. This is like the third or fourth time that you've been on, and I'm grateful for that because you always have so much insight. And I love how you bring, you know, multiple worlds together. So I wanna start off with a personal question if that's okay.

Yeah. Absolutely. It's thanks for having me again, Phil. It's it's always great talking to you. These conversations are amazing. So let's have another one.

Yeah. So I know you're a CCIE, and you were a big infrastructure guy at a very high level. And you have been eyeball deep in software development, AI in a very real sense, in a very technically deep sense for years now. But do you ever miss being like a traditional network engineer?

Ironically, I think I know I have those moments where I'm in a deep level of frustration with whatever I'm building or whatever I'm doing, and the complexity level is through the roof, And that's contributing to it. And I have those frustrating moments where I'm like, dang, I really miss those days where I was just configuring VLANs on switch interfaces as support tickets all day long.

So I have those.

But honestly, I really enjoy just seeing everything, touching everything, and being around so many different areas that it's just fun to get involved with networking when networking pops around and comes around again. It kind of brings up those roots. But I don't necessarily miss it on a daily basis.

It's just part of the life now. Does that help? Is that a good answer?

It help. Well, I mean, I can relate to you in the sense that I don't deal with traditional networking as as regularly as I used to, especially when I was a VAR engineer going from project to project and, you know, racking and stacking.

But certainly, I do miss certain aspects like fixing things and having something that's a little bit more deterministic, where you set up your routing and then you know why things are going where they're going and it's predictable, which is not always the case with AI. But certainly, I do miss that aspect. But I think one of the reasons that I brought it up is because, for me, I am talking to so many folks about AI initiatives, which is I I say that with a level of ambiguity on purpose because folks don't really know what they're asking when they say something like that. Phil, I got this initiative from my c level, from my VP, to use AI in twenty twenty five and into twenty twenty six, and I don't know exactly what that means.

Can you help me? And I'm like, I don't know. I don't know. What what do you mean by that?

And and I'm talking to infrastructure folks, maybe, you know, people that lead networking or people that lead kinda IT more broadly.

And so, you know, I ask that question because I'm seeing these worlds come together very much. Folks that have never touched networking, are maybe they were working in MLOps or traditional software development, or data engineers, scientists, things like that, that are very interested in helping folks in the networking space but have no networking background. And then inversely, folks in networking or in infrastructure more broadly that are like, well, we're going to use AI, then they don't know any of that other stuff that I just mentioned, the data science, the data engineering, the ML Ops kind of stuff. And so these two worlds coming together, it's this interesting new paradigm.

And, you know, you embody that because you have that experience from a high level in both worlds.

But but one thing that I wanted to start us off with today was this idea that, you know, people don't quite understand what they're getting into on the infrastructure side. Right? Net let's stick with stick with networking. Yeah. They don't quite understand what they're getting into. And I say this from experience, from talking to to person after person.

When they say, wanna get into or I want to you know, I have an AI initiative for twenty twenty five. They don't quite understand. I think they think usually, we wanna employ some sort of a chatbot and have it deployed everywhere and make everybody's life easier. I'm like, those are lovely goals, but my goodness, those are ambiguous. And then, of course, where do I start? So what has been your experience in that as you start talking to folks or have been, not start talking to folks on the purely AI side, software dev side, but also mixing in that infrastructure component?

Yeah. It's one of those you'll hear this repeated out there all the time, but I I feel it's absolutely the truth.

The the quickest time to start working with it, to start playing with it, to start understanding it was yesterday.

And not necessarily you need to be productive with it today or tomorrow or in a week or in a year. You don't.

But it gets to the point where when you see this stuff coming up in meetings or you see it as a possible solution to to solve your problem or whatever you're after, it gets into that situation where you don't know what you don't know. So as network engineers, we have our skill set, and each and every one of us have built up a different set of skills in our toolbox, and we have to understand how to approach a project like this. And when we don't know what it takes to do AI or we don't know what it takes to make this happen or how to stably run it in production, we have to make guesses. We have to go out. We have to educate ourselves and do the best we can or seek help or things like that.

And so it really does come down to if you want to move faster with it or you to be able to do more and leverage more AI, you got to work with it more. You got to start doing more projects. You got to see where you can plug it in and and figure out how to work with it.

A good analogy or at least one I'm going to try right now. We'll see if it's good. Yeah. You know, one of the things that I believe is the strongest skill you learn from obtaining your CCIE is not necessarily the ability to type commands into the CLI crazy fast. It's your ability to recognize problems or design issues or engineering issues before they happen.

And it's being able to see this type of architecture, and OSPF is doing this type of broadcast mode, and it's doing architecture over here, that's going to cause this type of broadcast failure because of blah blah blah blah blah. We instinctively know that, and as we hone our skills through CCNA, CCNP, all the way up the network stack, we get better at that.

AI, software development, server administration, any of that stuff, it's all the same thing. You just got to start playing with it. You got to start working with it. You got to get familiar with where to plug it in. And then once you start doing that, you get better at it.

Yeah, there is some value to just building a simple end to end system as well.

Yes.

You know, and it doesn't need to be complex. A simple end to end system in your basement is a great way to start. If you're a network engineer out there listening and you're like, I want it, you know, there's an AI initiative coming down the pike, I don't know where to start. I mean, you could go just talk to a bunch of consultants and then rely on their slide decks and then do your thing and trust what they say is correct. But you should learn this stuff. And even building a simple end to end system where you touch a little bit on data pipelines, a little bit on selecting models, a little bit on evaluating models, stuff that you don't see in the blog posts right now. Nobody talks about model evaluation.

Nobody talks about I'm trying to think of the technical terms, I'll skip it because I can't think of them right now. But nobody talks about how you incorporate some of these things into your current CICD pipelines and workflows. And I think that's where we are now.

We've been talking about AI, you and me, but the industry for maybe two years now.

And here we are where a lot of folks are saying, are we heading into the next AI winter? Because we're getting this trough of disillusionment going on where there's no real value and benefit. So I'm trying to measure my ROI. So I'm going to spend all this time, money, my level of effort to get into this AI initiative for networking very, very, very high. And then what do I get out of it? I don't see the benefit.

Part of me wonders if we're heading into the AI winter, but we are starting this conversation. So I think if engineers out there can start to get a grasp on those components that are necessary for AI to actually work, including understanding where it fits, like you said, where the value is, what problems I can truly solve. Listen, because if you're looking at some problems out there and you're like, where can I use AI? That might be the wrong way to do it.

Mean, that's not bad. I do it. I do it. But identifying the problems and then saying, all right, what is the solution?

What is the best solution? Maybe it's just a quick runbook that you create in fifteen minutes. Do you really need to have an entire data pipeline feeding an AI model? Yeah, I mean, would be cool, but not always necessary.

So I think that's the place to start. And then once you see those things that can be automated, where a prediction might be able to help you, or you can, with the whole LLM wrappers today, democratize information and make querying data easy, whatever works, then you can start to build upon that and choose the right components to make that overall workflow work.

Ryan, I think one of the biggest pieces missing is that workflow. I think people look at the end state, the cool little app, and so know whether they're network engineers or leaders in networking department organization of their companies, they're looking at the cool chatbot, the end result LLM wrapper around some stuff. Yep. Probably not even a local LLM. It's probably something like an OpenAI that they're hooking it to GPT.

And they don't know how to get there. So like, want that wrapper. I want that chatbot. Mhmm.

It's like, okay.

Well, you know, do you have any data engineers on staff?

What what kind of telemetry are we ingesting? What do you actually wanna you know, are you looking to predict? Because we don't need to do any of that LLM stuff for prediction. You know, that kind of thing.

Yeah.

You hit on a number of really cool things there and a really a number of topics that I Well, it's because it's so top of mind.

I talk about this stuff, like, day in and day out with folks. I gotta tell you, Ryan, a lot of the time, the conversation is, woah. Woah. Woah.

Hold on. Hold your horses, my man. Like, you want the chatbot, but you can't do that right now with your environment and your staff and your resources. You know?

Right.

So for that one right there, so the thing that I've learned over I've been doing a ton of AI development. I've really been pushing AI to its boundaries on how far it can do complex workflows, handle PRDs or requirement documents to one hundred percent accuracy, figuring out how to make that happen. There's been a lot of stuff here, and the technology in itself is amazing. There's just a lot to it, and there's a lot of value behind it.

But it's learning how you work best with it.

I think I've killed so many feature branches and all the development stuff I've been doing over the past six to eight months because me personally, I led the AI down the wrong path, or I got frustrated with it and it sent it off somewhere else, but whatever.

But it really is about learning how you interact with it. And when I say it, that's a very, very genericized term, anything that has AI built into it. It could be a chat ATP. It could be your coding application. It can be Word that has Gemini built in to help with assist. Whatever you're using.

And I really think so. One of my strongest stances right now that I have for anybody out there is we might go through a bubble with AI. We're most likely, everybody will stop talking about it. The VC money will pull out.

It'll start normalizing. It's going to happen. It's a bubble just like anything else. But what's happening right now is all of the technology is being put into place.

Everything that limits LLM and transformer based models or even ML based models is being augmented. You're seeing frameworks come around that smooth over the rough edges. You're seeing stuff like MCPs and tools calling and all these other features come in that smooth over all those gaps that LLMs and transformers have.

So right now, as we're getting to that point, the people are building the stuff that is going to allow us to build what we need. And now what I challenge everybody is we got to look at our world and say, everything can be different. Nothing has to be how it is right now. AI can come in and hopefully help us augment how we're doing workflows, augment how we're approaching certain things, heck even augment how you're approaching personal projects of your own.

And then it's a matter of how can I do this better or how can I do this different, and can AI help me with it?

We're gonna see a lot of changes in the industry, we're gonna see a lot of changes in the economy across the world. And I think it's a matter of who figures out how to do it best with the tools they have in front of them.

So I really encourage, you know, it's been the path that I've seen so many people go through for success is you start playing, you start building, and don't limit yourself on what you build and where you build and just start going. And it is. It's a matter of you learn. You might start off simple.

You build a simple newsletter that you kick out once a week that sends out details of cool things that you saw all week. It's a simple newsletter, but it can be handled one hundred percent by AI. And so if you work through that workflow, maybe the first time, second iteration of it, it's a little clunky, it's not as good. Third, fourth, fifth iteration, as you go through it, you start really understanding this.

And then the next thing that happens when you come into work one week and they're discussing a project that needs to do some pipeline stuff, you actually have a little bit of knowledge in it. You've actually done some stuff, and you can speak intelligently to a point.

Yeah, yeah, absolutely. I agree. And that's why I am in favor of building that relatively straightforward and simple end to end system, regardless of what it is, which is what you just described.

And I think that, in my opinion, you start simple with just unstructured data like you're talking about, which is text. So you're talking about doing things with large language models and with like a newsletter was your example.

But certainly in the infrastructure world, you could start with very structured data and go that route, and it's not a text based thing, so you're doing things with making some simple predictions or maybe some kind of simple pipeline where it can help you with some basic capacity plan, whatever it happens to be.

Just focus on that one thing, that one type of data on top of that. Because Mhmm.

You know, what happens is once you have that kinda nailed down and you have a basic understanding, then you can start to build upon it. Alright. So how do I do this with two different types of data? Because the reality is in production, especially in a large enterprise environment, you're not dealing with just flow records.

You're dealing with a lot. So how do I expand this to look at multiple types of data? Well, you have the foundation of having built something. And then it's going to be more than two types of data, three types of data.

How do I differentiate between data that is historical and then real time? What do I do with data in flight?

I'm really leaning heavy into the whole data engineering thing, Ryan, because that's been so top of mind for me personally.

Yes. When I talk to folks, mostly in the networking space, but I've been talking to some cyber people and more broad IT generalists, the AI conversation does tend to go toward data engineering and data science very quickly, because we talk about, all right, where is your data? What kind of data? Okay, we get your goal.

Fine. How do we get there? And we start talking about the workflow, and just pseudo code. Let's jot it down.

Mhmm. And it turns into, well, I don't have access to that database, or I don't know what data cleaning means. You know, how do we how do you handle missing rows? Simple stuff, man.

Simple stuff when I say simple, let me take that back. Gotta walk that It's simple to say, simple in theory, but it can be much more complex in in practice. But, you know, simple like data cleaning and taking care of missing values and and and things like that, roll ups because it's historical, whatever.

That's like eighty percent of a project. And then finally, you have something that's clean and quality data that you can feed your model and then do the cool stuff that you want at the end. That's been a lot of conversation.

Let's walk through that. I have a lot of these podcasts, and I talk to a lot of infrastructure people on this very subject, but we always gloss over what data is, and the sanitization process that seems super scary, and it's crazy easy. So let's step through that and let people actually understand that.

Well, on a second, man. You just said that it's crazy easy, and I do have to disagree with you there.

It's easy in concept, but Exactly.

It can be yeah, that's a career unto itself, right?

It absolutely is, and I don't want to gloss over the job role that this is and how complex data can be. But if we're looking at it from approaching it from a simple standpoint or approaching it from a learning and getting started standpoint, this stuff is easy.

It's very simple stuff. It's just learning what the term is and what it's doing.

So one thing is sanitizing your data. You want to make sure that your complete table or your set of data has all of the elements that it's expected to have, and it's in the structure that you expect it to be in. You run into problems when, say, a property is missing or it's a null value when it should always be an int, integer, stuff like that. You also have where not every row or data element has the required properties that it should.

Or it has some, but it doesn't have others. So there's a lot of sanitization stuff. Now to visualize this, people think of data, and so we start thinking about all this data that flies around servers and goes all over the place. But very much so, if you're thinking network telemetry, you know what your data looks like.

It could be as simple as a routing table. Show IP OSPF, and you get data. That's all data.

And so it's just a matter of working with the data, pull it into an Excel spreadsheet, clean the data up so it has complete names or it has the accurate information it needs. And then that is the sanitization processes, all those little elements that need to be tweaked. Now with that, do you need to have good data to start? Hell no.

Take whatever you get, throw it into a data set or into a spreadsheet, CSV, whatever you're comfortable with, and start doing. Because what happens is all those sanitization issues that you overlook, they get presented to you as bugs. They get presented to you as issues. And so as you start working through stuff, you start recognizing, oh, hey, this is failing because when it hits the fifty seventh row on the iteration, that one's actually missing the password column.

Well, it's because it's a user that never logged in. So how do we handle that? How do we sanitize those so it doesn't stop here? So those those are the data management skills, and I'm I'm I'm gonna say it.

We all have to work on these. We we all have to start working and building on them.

The best thing for me that I've seen across the board is as I'm working through stuff, I have perplexity, a perplexity window sitting right next to me.

And every single time I hit something that I don't know or I'm uncomfortable with or I don't know if my solution is the right solution, I ask it. How do I handle this data? Or this error is happening. How do I do this better?

Constantly having a professional assistant right there stepping me through it has just been invaluable in my growth.

Yeah, I agree. And for me, before getting into the term AI and how we're using it now, which largely refers large language models and the whole realm of that, including what's happening now with agents and things. But prior to that, for maybe three or four years, twenty seventeen, twenty eighteen, right after I got interested in network automation. I was very interested in machine learning, which is part of the it is a type of artificial intelligence in the sense that there's no explicit programming or code.

In machine learning, the program learns from the data, and it operates more autonomously. Obviously, there's some supervision involved with supervised learning and other things. But in any case, so that was my foundation. And so when you say things like data cleaning and interpolation and things like that, what I'm thinking to myself is, how do I clean that data possible so I can make the prediction that I can trust, this prediction of what latency is going to be like every Friday night, things like that, identifying patterns and things like that in the data so there's something useful.

And so the entire data pipeline, it underpins everything that you do, whether it is traditional MLOps, which is kind of what I'm referring to, or something a little bit more contemporary with large language models and developing things like that, your data pipeline underpins all of that, one hundred percent.

And if you start with ingesting really clean data, that makes the data cleaning and sanitization process easier, for sure.

But that's all part of it in that workflow. That's why it's interesting because now I'm talking to folks, again, primarily networking folks, primarily IT generalists. And once I bring up those kind of things, they're like, well, we don't have data engineers on our team. They're a completely different team. I don't even know who they are in my organization. I could probably look them up in an org chart. That's the level of connectivity.

And so believe it or not, it's not a technical problem. But I say that tongue in cheek. Believe it or not, of course we believe it. We've been in IT long enough to know that people can be the biggest source of the issue.

This silo between infrastructure and your data folks is like a complete and utter roadblock stop sign for any movement forward.

And I'm talking to folks like, how do you bridge those gaps?

What are going to hire data engineers for your network team? Maybe, depending on the organization. So that's something that I'm seeing is this disparate, this dichotomous relationship between data folks and infrastructure folks that don't know each other.

And that's always been a problem we've had in IT. Yeah, exactly. I don't know if that problem's ever going to go away just because of the nature of how complex everything is.

But we have to learn to work with it. And I think this is a perfect example of where we apply AI, Not necessarily solving the end problem we're trying to get to, but if you can streamline how your organization operates through projects or through feature development, like something to take your telemetry data to the next level internally, then yeah, start considering some AI stuff. So some good examples are one of the classic things that always causes problems in the enterprise world, especially in tech, is we spend all this effort and all this time to pull in all the smart people into a single meeting in a room over the course of however long.

And we sit there and we hash out and we whiteboard and we wave our hands virtually in the air presenting what network stacks look like and yada yada. And we come out with these plans and these outlines and these decisions that were made and all these things that everybody thinks they agree on. And then we all go to our respective corner, and we start building what we think we should be building or doing what we should be doing. And then it really gets crazy when everything comes together, and it's a mess.

But you can take stuff like AI, and it's as simple as in those design meetings and in those discussions, know, have it just record the entire transcript.

And then from the transcript, build out all the documentation, all the requirements, all the outlines for all So by the time they're back from lunch, everything that was discussed before lunch is already in a pretty package that's ready to go into production, and it can be reviewed from there in front of everybody.

That's a great use case for like actual And more so, you get into accelerating fast through operations and through feature sets or feature builds.

Who says that by the time you finish that whole design session that you can't have a proof of concept product built before the next day by an AI agent.

I argue that you can do that with enough effort. And so you don't have to go back to your individual groups and start building your individual thing.

An AI bot that works under the product team puts all those ideas together and builds out a simple proof of concept that everybody reviews at the end of the week.

So these types of things are the things that can help our groups accelerate, that can help our teams accelerate, that helps our business accelerate, that are going in the direction that we're not used to thinking.

And these types of workflows and these types of changes live everywhere under every single damn rock in the company.

I love that use case because you basically said, oh, you have a problem implementing your AI initiative because you have disparate teams? Well, why don't you just use AI to fix your problem with your AI initiative? Yes. It's kind of meta AI, but but I love that use case.

That's great. And, I use some of those tools now, but not to the extent to the level that you did. Considering that I'm not in operations, I wouldn't anyway. But that's really neat.

One of the biggest things that I feel AI brings to the table is the Internet and technology that we know it up until now is really what brought data in front of everybody. Everybody has access to all data now.

I would agree with that.

But AI brings the skill set to do something with that data to everybody.

You don't have to have an advanced doctorate degree to do statistic analysis of blog posts across the entire internet anymore. You need a dude who's interested in that question. And as long as they leverage AI to build it, they can build, any of us can build whatever the hell we want.

And that is a massive, that's a massive shift in the human race, in my opinion, and I think we should have brace that.

Yeah. And I would even say that we can differentiate today difference between a contemporary AI engineer, which is a term that's being thrown around all the time. And a lot of folks laugh at it saying, oh, you're just a prompt person, you know, whatever. But there's a difference between like a traditional ML engineer and or a software developer and then this new concept of a very contemporary AI engineer because they're doing the things that you just described.

Whether it's using I'm gonna use the term vibe coding, which we discuss in a different podcast, but they're using vibe coding in a secure and safe and effective manner, so with all those safeguards in place, of course. But they're using that to very, very quickly, along with this team that they're working with, as you described, to create AI apps built on largely foundational models or foundation models. So whereas an ML engineer in years, not that long ago even, would be focused on selecting models, training models, and then model evaluation.

And then of course, serving up that model to the data scientist who's going to do some stuff with it.

An AI engineer in twenty twenty five, that's all done. That's all done. You're selecting the best models. Ensure there's some evaluation.

And then you're building your applications on top of that, and you're focused on that piece. And I think that's a in my mind, that's the difference between the two. Because for a time, folks were saying that they were synonymous, they cringed at the term AI engineer. But it's pretty much what you just described.

I think that's one role.

So, you know, for anybody who's listening to this and and kinda like, oh, cringing that I I don't want that life, there's all sorts of other roles too. Like, you don't have to be the AI guy. You don't have to be the guy who knows how to build models that respond with amazing results. You don't have to be the girl who builds workflows, advanced workflows in Python or applications that that solve your team's problems, it can be very simple as you are the person that defines the tribal knowledge that's in everybody's head on paper.

That's a simple one. When we have to do this workflow, here's the seven steps we have to go through, and then here's those official documentations that don't cover any of it. Build that out because that's the first step to automating it. And we've had this discussion in the networking industry and infrastructure world for at least fifteen years now.

You've got to build out your workflows before you can automate them. And so just that knowledge, just that tribal knowledge is extremely valuable. And I think a lot of us, all of us are specialists in something. We all have a strong specialist skill set in some area.

Some of us have it in multiple. But we will need to leverage that skill set to help the data scientists, to help the workflow developers, to help the software people that don't know it as strong as we do build that stuff. And in return, as you're helping them understand that stuff, they can also be helping you learn to build this stuff and learn the AI. And so there's a give and take relationship there, and that that collaboration is important as well.

Yeah, and of course, that doesn't presuppose that you don't have to build that end to end system that we were talking about earlier, right, Ryan? I I still think you should do that, so you can have kind of a basic understanding of how these pieces work together. And in that, you still bring your level of expertise in your sphere, in your realm.

That's how I see this. As the rubber meets the road for AI, it really is having at least a fundamental understanding of all of these components.

Then understanding, and that's including the data pipelines, that's including how to evaluate things, that's including the business side of it, which we haven't touched on. But if you're sitting there as a director level person somewhere trying to figure out what does an AI initiative look like? I need to do this this year. Well, coming at it from what are the business level KPIs that we want this model to produce.

Because you might come up with some technical KPIs from your engineering team, and it's phenomenal. And it's like, look how accurate this model is. But it doesn't really mean anything to the business. Like, who cares?

That's great. It's accurate. So mapping that. So there's a lot of those skills that are going to be absolutely vital to a successful AI initiative.

And this is where the rubber meets the road. Do you understand the basics of data engineering? You don't need to be a data engineer. You don't need to be a data scientist. You don't need to understand at such a deep level how deep learning works and how the various neurons and nodes and all that stuff, what they're doing discreetly at each level.

But what you do need, I believe, again, when the rubber meets the road of trying to deploy something useful to the business, yes, a basic understanding of all those parts, how they come together, and then how that is relevant to the business, how that is relevant to the organization. And what you're describing, Ryan, is bringing your piece to that puzzle, your part of that entire ecosystem. I don't think that that unicorn of knowing that person that knows everything about every part of the pipeline, including the business really exists. I mean, in reality, we are talking about different roles, data engineers, data scientists, ML engineers, infrastructure experts, business analysts that are specialized in business intelligence systems, right?

Those are all different roles and different specialty areas. So that's the reality. I think that there's a danger though for some that I've been speaking to that want to jump to the POC without some understanding of those pieces, then and the POC turns into this ambiguous, we don't really know what the value of the POC is, and it's never ending. It just kind of keeps going because there's no close date or well, more relevant to this conversation, there's no understanding of what it was supposed to produce in first place.

I'm going to play devil's advocate. I don't disagree with you there. And it goes back to basic design and engineering practices. But I'm going to challenge that. If I can screw something up with minimal effort in five minutes, why should I care? Because I can just delete it and move on.

Okay, that's fair.

My whole note so let me throw up an example, Okay? Everybody's going to love this one. I hope you do.

So I'm working through my application, and I got to build a bunch of third party integrations with social media platforms. And it's basically to allow the customers to do their thing with the social media platforms. It's OAuth authentication and yada yada. So I was adding one of these new platforms. I'm going to leave names out of it so we don't have to get into the hairy mess of legal and yada yada. But I went through this whole workflow that I had established to validate it. And it was a workflow that took about four weeks to build one of these client integrations.

And I had streamlined it down with AI to about two days at that point in time. And so it was relatively good. And so I went to add a new platform, and I was like, let me see how well this workflow can go. How can it do its job? Let's kick it off.

Two and a half hours later, it's done.

And I'm like, this is amazing. I start going through everything. I start validating. Everything looks great. There's a couple of things here or there that need to be tweaked on.

A few hours later, things are going, but it doesn't work. The integration's not there at all. I can't figure it out. The AI can't figure it out.

We waste all sorts of time. And so I take a step back after now I've wasted probably about a day on this guy. And I start digging. The client that I was adding, the social media platform doesn't even have a publicly exposed API.

They have no API. Okay. It's all behind the scenes and they intentionally did it that way. They don't want access to their social platforms directly.

So I just had an AI spend half a day, three quarters of a day wasting my time building a client that doesn't work, and then telling me all along it looks great. It does everything it should. And essentially, it just kind of blew over the whole thing, and I had to scrap the whole project.

In normal world, that would have wasted me probably about a month, maybe two weeks worth of time if I would have skipped that validation step. But half a day, I have some egg on my face. I kind of feel like I'm an idiot, and I scrap it and move on.

Don't be afraid to. If if you feel like jumping into something and you've got a strong why, why do I wanna do this?

Then do it.

Throw at it. You know, if you waste a couple hours or you waste, you know, three dollars worth of tokens, that's no big deal anymore. But you're right. If if you're on a professional project where failure could cause some serious problems, then you need to scope this out better. You don't just need to vibe code it and YOLO your life away.

Yeah. To be fair, two days of a large team worth of time and salary and productivity is a big number for some. Right? Absolutely. When you think about a large enterprise organization. But I do agree with the spirit of what you're saying, of course, because that is something that I employ for myself.

Just try something out. Oh my goodness. And I've never thought about it in the ways that you just put it, but that's happened to me so many times. I remember just six or eight months ago, I was in Container Lab trying to generate a bunch of streaming telemetry and some flow data, things like that for my environment.

I had this big spine leaf architecture, and things just weren't working. I spent, like, every day after work for, like, two weeks. It it still doesn't work. And, you know, whatever.

You know, it's done. But I I I did learn through the process, so there's that as well. I learned quite a bit through the process, actually. Yeah.

Yeah. I learned a lot about SR Linux, which I didn't really at the time. So so that's that's one positive. Right.

But also, I learned about I learned a lot more about InfluxDB, about handling real time telemetry. Even though none of the stuff really worked in the end, I did learn through that process. But again, know, you're an enterprise organization and you're on an IT team, you're it's not always time to be learning. You know, you're it's it's production level stuff. So it's time to to again, when the rubber meets the road for an engineer and you're trying to build an application for your network, an AI application for your network, having all that background is important.

Having all that background of like, yeah, it never worked, but my goodness, I did understand how a Kafka bus works for real time telemetry, things like that. And so you bring that to the table when it's time to build your AI app for your network team.

So in that sense, I completely agree. Of course, then when it is time to build that app and it is production time, it's, I think, a little bit more structured and probably a lot more akin to very traditional software development as well, right? Yeah. I would say so.

And I agree. There's two things that I feel are key if you're getting into that and you're serious about it. Like, you have to have either a proper lab environment or a proper dev environment that you can safely explore. We all know why we need a lab, and we all know why we need a stage changes before we do them. The software engineering industry does it one hundred times better than we do in infrastructure.

But it's a different world, too. But software developers, they're very comfortable with their local environments and what they can do in it. And that's a critical piece to doing stuff like this, is having that proper environment.

Yeah. I think it's starting to change a little bit. I mean, there's a cost and a level of effort to changing and adopting that mindset. But there are tools out there to create a digital twin of your environment, to create the software version of your environment, whether it is something like Eve or ContainerLab or whatever it happens to be.

And it's usually not perfect. It's not hardware. I get it. I get it. But you do what you can, and it's better than doing nothing.

I think that there are folks that have really embraced that.

But again, for folks that are like, all right, I want to build my AI app for my networking environment. You know, where do I start?

I think I think, you know, where again, I I've said this term, like, several times. Where the rubber meets the road is, what are you trying to solve? What is the actual business requirement here? And that's, to me, standard what we've been talking about for software for years. It's what we're talking about in networking for years. Why are you implementing this protocol for no particular reason?

So starting there and then understanding what those technical and business KPIs are, that's going to drive a lot of that architecture design, a lot of that workflow design.

Yeah. Yeah. And that's exactly it.

I would even throw in as a very next step is or in a step that starts working parallel with others right after that is you have your SMEs, you have your skilled day to day workers that are in the trenches, start building eval sets for your AI.

So basically, one thing that I see every single POC and every single new team that I've worked with is they build this awesome AI system, or they build this POC that proves its point.

And then I sit down and talk with the engineering team to move it to the next step, and they're all scared to touch it, or they're all scared with how it's going to change if they flop this model. They don't have confidence in it like they are used to in normal software projects. And I think this is across the board, not just software developers. But we don't have a strong way to test AI. We know that generative AI will respond differently with every single message, and it's not going to ensure the same response every single time.

So the software engineers are uncomfortable with testing it or validating it, which makes them uncomfortable with supporting it and owning it and building on it. And so one of the first things you need to be able to do is give your users and give your teams a baseline of its performance.

When we accept it, it has a standard baseline of a sixty eight percent acceptance rate of the responses.

And that just takes stuff like you have your SMEs building the evaluation sets. And those evaluation sets are basically, when I ask this AI this question, it's going to return this response.

And you just build out as many of those questions as you can. And so some of these data sets I see grow upwards of hundreds of questions entered by users manually. But I've also seen it be synthesized, and you have data sets of upwards of tens and hundreds of thousands of questions and responses.

Once you have these and you have your business goals lined with them, you can then start running these evals regularly to understand how your team is improving or how your product is improving.

This helps a lot of teams and a lot of people build confidence in building AI. And so I do suggest this is one of those very quick steps to take right after the why.

Yeah. And there are ways to automate that, right? I mean, haven't used this method, but I'm seeing folks using LLMs as a judge. That's the phrase where you're talking about using other large language models to evaluate your AI app and your large language model result or output.

Yep.

And it's more than just the overall output too. You need to, like in software development, where you have your checkpoints and stages and you have your evaluations there, you need to include that into your AI application design as well. So in networking, if you're using a large language model to do some tool calling, maybe there's some agents involved, do you have the mechanism in place, the observability, to know, hey, is the tool that I want it to be called actually being called, or is it using something And so those things are very, very important to your overall design of your AI app. So we're talking about data engineering. We're talking about the difference between foundation models or a local model or something like that.

But let me go back to that point you just made. I want your opinion there. You talked about how the large language model is not deterministic. It's probabilistic, and so you're going get a different response every time. Do you think that that's a problem? I mean, we could talk about RAG and other mitigation mechanisms for reducing hallucinations and all that kind of stuff. But you think that that's a problem in building AI apps specifically for folks that are using hard metrics like infrastructure and in networking?

It's an interesting question.

I mean, we can use it for sentiment analysis of our ticketing system, of emails. So for our unstructured data, image stuff, so you can go down that road, and it's not traditional ML, where you're doing predictive analysis with logistic regression or something like that, not that be linear, but whatever.

But strictly for using your LLM as, you know, the the brain of your agentic workflow, if it's not deterministic, like purely deterministic, is it going to eventually be detrimental to the functioning of my AI app?

Yeah, yeah, yeah.

It could.

And you need to understand when it hits that point.

I think, as I mentioned earlier, we're smoothing over a lot of the rough edges, and we're filling in gaps with technologies that aren't necessarily optimized. But ultimately, what we're doing as humans right now is figuring out how to plug AI into our current setup, our current workflow. But AI honestly needs to be approached differently, and we should be learning to build new inside of AI. And I think both are actually happening, so we got to handle both for a while.

But what I'm seeing as a trend is we're doing the same thing to AI that we did with humans as human workers. We need to start focusing in its scope, and we need to start narrowing in its responsibility.

Because we've learned with AI, just like humans, the wider the scope of responsibility you have, the more likely you are to cause issues in any areas of that scope. And so what we're doing is this is a lot of what agentic AI is bringing to the table, and especially with MCPs, is you start separating tasks up. And so just like you would a workflow, as you add bullet points to a workflow, just kind of assume every bullet point's going to be an agent.

It does its specific focus. And some agents will do better where they have a larger focus, and some will do better where they have a more narrow focus.

It just kind of depends.

And so the workflows that are out there right now, stuff like lane graph, lama index, which are the biggest popular ones on the market that I would say, is they're handling that. They're handling that communication between agents. So you have a simple agent that all it does is it goes in there and adds doc strings to every API endpoint it can find.

So that agent then returns its evaluation, and a higher level supervisor agent makes sure that it did what it's supposed to do, and then moves on. So I think we're breaking that down, and it's getting closer or more focused in scope, and that's the trend we're going on.

But I think it's going to take a brand new type of AI model or AI approach that will shift everything.

Something else besides transformers will have to come along. But now I'm looking and blowing my glass ball and Yeah.

But I mean, the spirit of what you're saying is that it's the world beyond the large language model, that as we're really looking at creating AI applications for really any industry, but going back to networking and infrastructure where we have this need for dealing with structured data and hard metrics and things like that, it's a world beyond just a large language model and slapping on a foundation model. What you referred to, what you alluded to, was the idea that foundation models, a GPT or a Claude, fantastic, huge models, large generalist models. And so I'm seeing this where folks are much more interested in using smaller models, very small targeted models that have been fine tuned or trained from the beginning, pre trained, on a specific vertical, specific industry, specific context.

So they're highly effective. And I don't mean highly effective as in model overfitting. I mean highly effective as in they are really, really good in the context of networking or whatever it happens to be.

And so that's one. And what I found is that that reduces incidence of problems with the probabilistic nature of LLMs, number one. Number two, the models are much smaller, so they're less expensive. You can even run them locally if you choose.

And if not, they're far less resource intensive. You just think about the number of parameters that you need for a much smaller model that's targeted for an industry. Instead of a trillion parameters, maybe you can get away with a couple hundred thousand. I don't know, making up numbers, but that changed the dynamic as well.

So as you're thinking about AI applications and building something for your environment, whether it's networking or something else, all of these things have to be considered. You don't need to default to like a GPT or any other foundation model. There are other options. And maybe it's not local because you don't want to run things locally, but there are other options that might be much better suited for your environment.

And that kind of leads into the architecture point again. It's thinking about data pipelines, thinking about how to clean that data, thinking about security, thinking about all these things, but also thinking about which models work best for you, which and and then evaluating why.

Let me let me jump into this one.

You know, one of the things that I'm really been kicking around, especially with with what I just said about, you know, us really limiting the scope of agents.

It does open up the door, like you said, to having smaller agents. Like the biggest problem with LLMs is their size. And if we're telling an LLM that I need you to structure this JSON object for me, it's kinda like wasting ninety nine percent of its resources. So if we build smaller models, how far do we go with that? So I bring up the point that we really focus in just like we do with humans, but let's go a step further. Take it to how we used to do the pets versus cattle discussion when containers came around.

You don't need one model that that, you know, you you call it a specific name and you feed and love it and treat it, you know, in in the corner of your data center and it does everything for you, you need one thousand of them. You have one of them that is amazing at doing BGP. You have another one that is dead on with being a multicast. You have another one that's amazing with Duoven virtualization or containerization.

And then all of these models come in together into a workflow, and you establish a workflow and pull which models come in to where. It's a mixture of mixture of experts type model.

Yep. Mhmm.

And then you're getting into this this notion that we can have thousands of models across a single workflow, and then one or two key models that sees the entire workflow handle step by step. I'd be curious if models, if we really focus them down to that type of level, like all you do is work with a NumPy array. That's it. That's all you do. Or all you do is you do show runs in in a Cisco switch.

Stuff like that, does value? Do we add value there?

Does it help, or is it actually going backwards? I'm curious about that one. So if anybody has a take on that one, I'd love to hear it.

Yeah. Do you remember I don't remember this in the sense of I was working in the 1980s. I was a kid, so I don't remember it in that sense. But I remember learning after the fact about expert systems in the 1980s, which was an evolution of artificial intelligence where the system, the computer, was programmed with the explicit knowledge from some subject matter expert, a doctor or a physician, that kind of person. And so the expert system was really good at answering questions, but in a question answer format, it just knew all that stuff about that specific realm. Didn't know anything about anything else, and it didn't really know anything anyway. It was programmed with explicit information.

I feel like that's kind of how agents are to an extent. They're not in the sense that we do have the large language model that adds the dynamic component all of that in the brain.

But we keep boiling it down to saying, this is what you do, Mr. Agent, and this is your specific task, your specific knowledge area. And in that sense, does remind me of expert systems, but in a much bigger context of multiple expert systems all handling things individually, reducing the incidence of error, but under this umbrella, this orchestration framework.

Yeah, that's the power.

And I think that's where, when you're thinking about your AI app, sure, we wanna get there, but we we probably wanna start much smaller.

Oh, yeah. And, you know, we we gloss over the orchestration piece, and we've been focusing in on the agents a lot and what we've been talking about.

But the probability of what fails is the complete opposite. The the workflow is what's gonna fail you every single time. The agents usually do pretty dang good at what they do.

And so it really is. And there's a lot of work this industry needs right now around the orchestration and the management of these workflows, but there's also a lot of people and a lot of money flowing into this area right now too.

So Yeah.

That might evolve into one of the primary focuses of this new AI engineer. Right? Yeah. This new version.

That's an interesting I'm I'm gonna throw this out there for anybody who's still listening intensely to me and is interested in some of this stuff.

There's a tool out there called Taskmaster AI. If you haven't heard of it, it basically is a project management and an AI agent. It does an amazing job with with a lot of my exploratory efforts in handling a lot of these complex workflows.

So if you're looking for something to probably improve in that space, Taskmaster is a good place to start.

Awesome. Yeah. Very cool. I haven't heard of that one, so I'll check it out. Yeah.

Yeah. Yeah. I mean, the goal is to do something useful.

This isn't a science experiment for most people anymore.

So No. Yep, yep. So with this advent of these various technologies all related to artificial intelligence built upon the foundation of seventy, eighty, ninety years of artificial intelligence development. How do we use this to improve our lives?

And for us as engineers, how do we do this to improve application delivery and our IT operations workflows or whatever else happens to be? And how do we actually build that application? That's the question. It's not necessarily just throwing some Python together and you have a chat bot.

There's so much more opportunity, so much more potential. But it does require having a much broader understanding of all these components and how they fit together. So in that sense, I'm looking forward to it. And, you know, I I I have my own never ending POCs for sure.

But Yep. Yeah. I I love it. I love it.

Yeah. My myself, it's the how I've always kind of approached this, if anybody's interested, is I always start with something that is nonrelated to my professional career. I I find something in my or personal life. When I first jumped into Python development, the first thing I did was a a stock a stock tracker that tracked my stocks.

Really? Nothing to do with networking at all. Yeah. You you kinda when you force yourself out of those same ruts you've always been in, you force your mind to start thinking differently.

Mhmm. But you also learn this stuff from a different perspective and almost a better perspective sometimes. And then when you circle back and tie it into your work stuff like networking, you come with a lot more knowledge of what you can do with those tools. So I I encourage that approach, but honestly, whatever works for you, I I would never knock.

Yep. For sure.

Well, Ryan, I think it's time to wrap it up. And I always appreciate you coming on and talking about, well, today AI, but really, we've had some really good conversations about other areas as well and, look forward to the next one. So, so for now, I encourage you to just dive right in, and I assume, Ryan, you encourage the same. Dive right in, vibe code your way into a simple app, get started, and don't be scared. So until next time. Thanks so much for listening. Bye bye.

About Telemetry Now

Tired of network issues and finger-pointing? Do you know deep down that, yes, it probably is DNS? Well, you're in the right place. Telemetry Now is the podcast that cuts through the noise. Join host Phil Gervasi and his expert guests as they demystify network intelligence, observability, and AIOps. We dive into emerging technologies, analyze the latest trends in IT operations, and talk shop about the engineering careers that make it all happen. Get ready to level up your understanding and let the packets wash over you.

All Episodes

Kentik is the network intelligence platform for modern infrastructure teams.

844-356-3278

Platform

Solutions

Technology

New and Notable

Learn

Company

We use cookies to deliver our services.

By using our website, you agree to the use of cookies as described in our Privacy Policy.