Amazon Lookout for Metrics uses machine learning (ML) to automatically detect and diagnose anomalies (outliers from the norm) without requiring any prior ML experience. Amazon CloudWatch provides you with actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health.
This post demonstrates how you can seamlessly connect to your data in CloudWatch to set up a highly accurate anomaly detector across metrics, dimensions, and namespaces of your choice using Lookout for Metrics. The solution allows you to set up a continuous anomaly detector and optionally set up alerts to receive notifications when anomalies occur.
The following diagram shows the architecture of our continuous detection system.
To implement our solution, we complete the following high-level steps:
The dataset used for this post is an Amazon API Gateway based service with various supported APIs that emit metrics like Latency, 4XXError, 5XXError, and Request count available through CloudWatch.
To create your anomaly detector, complete the following steps:
After you create the anomaly detector, a banner appears that confirms its creation. You can then add a dataset to your newly created detector.
Lookout for Metrics supports multiple data sources. For this post, we use CloudWatch.
We now define the relevant CloudWatch metrics.
Lookout for Metrics automatically populates this list with all the available namespaces for your account.
Lookout for Metrics makes this easy for you by pre-populating the available dimensions for a given namespace.
These metrics should also be associated with the same namespace.
Now that the dataset is created, we activate the detector.
A message appears to confirm that the detector is activating.
At any time before or after you activate the detector, you can create an alert.
For this post, we use Amazon SNS.
When the anomaly detector is active, you can use the Detector log tab on the detector details page to review the detector runs that have been performed by Lookout for Metrics.
You can also choose View anomalies on the detector details page to manually inspect anomalies that may have been detected.
On the Anomalies page, you can adjust the severity score threshold on the threshold dial to filter anomalies above a given score.
When detecting an anomaly, Lookout for Metrics helps you focus on what matters most by assigning a severity score to aid prioritization. To help you find the root cause, it intelligently groups anomalies that may be related to the same incident and summarizes the different sources of impact.
In the following screenshot, the anomaly in latency on June 7 at 20:00 GMT had a severity score of 86, indicating a high-severity anomaly that needs immediate attention. The impact analysis also tells you that the primary API impacted was ListMetricSets.
Lookout for Metrics also allows you to provide real-time feedback on the relevance of the detected anomalies, which enables a powerful human-in-the-loop mechanism. This information is fed back to the anomaly detection model to improve its accuracy continuously, in near-real time.
You can seamlessly connect to your data in CloudWatch to set up a highly accurate anomaly detector across metrics, dimensions, and namespaces of your choice using Lookout for Metrics.
To get started with this capability, see Using Amazon CloudWatch with Lookout for Metrics. You can use this capability in all Regions where Lookout for Metrics is publicly available. For more information about Region availability, see AWS Regional Services.
Ankita Verma is the Product Lead for Amazon Lookout for Metrics. Her current focus is helping businesses make data-driven decisions using AI and ML. Outside of AWS, she is a fitness enthusiast, and loves mentoring budding product managers and entrepreneurs in her free time. She also publishes a weekly product management newsletter called The Product Mentors on Substack.
Raj Vippagunta is a Senior SDE at AWS AI Services. He uses his vast experience in large-scale distributed systems and his passion for machine learning to build practical service offerings in the AI space. He has helped build various solutions for AWS and Amazon. In his spare time, he likes reading books and watching travel and cuisine vlogs from across the world.