Unpack Azure SRE: The Key To Network Operations With Powerful Services

In this blog post, I walk through how to get started with Azure SRE Agent and demonstrate how agentic assistance can be applied to real networking operational scenarios. The post covers SRE agent fundamentals, followed by a live demo showing how Azure networking services can be operationalised with SRE agent workflows to reduce operational toil.

What is an SRE Agent in Azure?

SRE Agent provides an AI-driven platform that connects observability tools, incident management platforms, and source code repositories to automate workflows end-to-end. This service enhances site reliability engineering by introducing automation and intelligence. It helps reduce manual work, improve system uptime, and ensure consistent operational results. The agent works with Azure services and other external systems to perform operational tasks with minimal human input.

SRE Agent can manage all Azure services via Azure CLI and REST APIs, including: –

  1. Compute services: Virtual machines, App Service, AKS, Azure Functions
  2. Storage services: Blob storage, file shares, managed disks.
  3. Networking services: Virtual networks, load balancers, application gateways and NSGs.
  4. Database services: Azure SQL Database, Cosmos DB, PostgreSQL, MySQL
  5. Monitoring: Azure Monitor, Log Analytics, Application Insights.

How does the SRE Agent work?

SRE Agent combines expert knowledge of Azure with the ability to customise. It can automatically manage Azure resources for specific services and set smart defaults for common tasks. At the same time, you can add your own knowledge and custom runbooks and connect to tools and data sources, such as monitoring platforms.

The agent uses various automation tools, including:

  • Built-in Azure knowledge: It has a preconfigured understanding of Azure services and uses efficient operational methods.
  • Custom runbooks: They can run Azure CLI commands and REST API calls for any Azure service.
  • Subagent extensibility: You can create specialised agents for specific services, such as VMs, databases, or networking.
  • External integrations: It can connect to monitoring systems, incident management tools, and source control.

This flexibility allows the SRE Agent to fit your needs and adapt to your entire Azure infrastructure.

How to Create and Use Azure SRE Agent?

I have created a video that covers the agent creation, managed identity access permission and how to use the SRE Agent with networking services.

The YouTube link:

The SRE agent can be integrated with the incident response plan for the cloud resources. The agent can be used to automate some tasks within cloud resources. For example, send me the status of a specific production application every day at 6 am.

Azure SRE Agent integrates with your ecosystem through:

  • Monitoring and observability: Azure Monitor, Application Insights, Log Analytics, Grafana –
  • Incident management: Azure Monitor Alerts, PagerDuty, ServiceNow
  • Source control and CI/CD: GitHub, Azure DevOps
  • Data sources: Azure Data Explorer clusters, Model Context Protocol servers

I hope you had a good learning experience about the Azure SRE agent and how it can be combined with Azure networking services for monitoring.