An AIOPS assessment provides a framework for identifying automation candidates by working with individual operations teams to understand where they spend the most time. By collecting process-specific data and operational data (tickets, events, and logs) you can apply ML models to test out various improvement targets. For example, you might want to investigate the impact of event clustering on your current incident volumes. The deliverables might look like the below:
When outlining the strategy the project team should identify a set of key guiding principles that establish a strong AIOPS foundation and aligned automation goals, for example:
- We want to learn relationships between events and relationships between logs over time to better correlate them and prioritize them for notification- minimize the need to construct fixed correlation rules;
- We want to learn what the normal behavior of systems looks like so that we can only notify operations when abnormal things are happening, thereby reducing noise while minimizing the need to build and maintain thresholds;
- We want to be able to predict Incidents from patterns in abnormal behavior (via anomalies, events, and logs) to group and prioritize operational data presentation (notification and ticketing);
- We want to be able to reinforce each ‘learner’ with feedback from human operators in order to fine-tune the learner and teach it new things- systems should get better through human interaction, naturally, without the need for data science or software development for optimization;
- We want to be able to detect configuration changes that have not yet been captured in an approved change request;
- We want to be able to identify problems (from repetitive incidents) early in the problem lifecycle;
- We want to provide a path to near immediate Root Cause Isolation, especially for Incidents with High Impact.
Learn more about how Grok utilizes AIOPS and automation by starting a free trial here.