Machine Learning and AI: The Superhero Solution for IT Operations
At last week’s ONUG Spring 2018 event in San Francisco, I moderated a panel discussion on re-tooling IT operations with machine learning (ML) and AI. The panelists provided a view “from the trenches,” sharing insights into how their organizations are applying ML and AI today, each in different operational domains, but with a common theme of overcoming the challenge of managing operations at scale.
- Harmen Van Der Linde, Global Head of CitiManagement Tools at Citigroup: His organization is responsible for the delivery of infrastructure software deployment automation and monitoring solutions. This involves managing highly dynamic, cloud-scale infrastructure in which things happen too fast for human operators to keep up as they are continuously flooded with alerts and alarms. Harmen’s team has applied statistical analysis of time series data to model network behavior and is using linear regression to analyze trends and predict future behavior.
- Keith Shinn, SVP of Service Experience and Insights at Fidelity Investments: He manages a global team that is applying data-driven user experience principles to service delivery. Keith’s team wanted to improve business processes and better serve customers at Fidelity’s nationwide investor centers. They had a large-scale event correlation infrastructure in place but switched from polling to streaming, supported by a Kafka data pipeline. The team then used machine learning algorithms to analyze time series data and generate insights relevant to Fidelity’s business objectives.
- Bryan Larish, Director of Technology at Verizon: He leads a team tasked with the simple goal of using ML and AI to make Verizon’s network run even better. Easier said than done, given that Verizon’s mobile and fixed-line networks are among the largest in the world! The Verizon team started with statistical analysis of time series data that resulted in operational improvements compared to existing methods. ML algorithms were able to derive useful correlations from the massive amount of KPI metrics collected from the network. It is notable that Verizon is also starting to use neural network based on deep learning to optimize network performance, leveraging that technology’s pattern-matching capabilities.
Common Themes from the Panel
All three panelists stressed the need to have a clear understanding of your organization’s business objectives because these will determine how you source and curate the data that will be collected and analyzed. They all recommend starting with the low-hanging fruit — statistical analysis of time series data — which can yield immediate operational efficiencies.
With a wealth of ML and AI technology in the public domain, it came as no surprise that each organization relied heavily on open source software across the entire ML and AI software stack. However, while open source tools are freely available, the people who know how to use these tools are generally not. Therefore, each organization had to hire additional staff with relevant expertise in ML and AI, and the message was to be prepared to make a similar investment.
By eliminating time-consuming, labor-intensive tasks, there is legitimate concern that ML and AI will lead to the elimination of jobs. However, Verizon’s Bryan Larish offered a different take. He used an analogy inspired by the Marvel Comics character Tony Stark, who is transformed into a superhero with special powers while wearing his Iron Man suit. Verizon intends to make its network run better by augmenting the capabilities of its operations teams with ML and AI, transforming ordinary operators into a legion of extraordinary Iron Men. Hard to argue with that!