Transform Your AKS Networking Experience with the Innovative Container Network Insights Agent!

In the ever-evolving landscape of cloud-native applications, the complexity of managing Kubernetes clusters can be complex. When it comes to troubleshooting network issues within Azure Kubernetes Service (AKS), time is of the essence. Azure has introduced the Container Network Insights Agent — an innovative solution that simplifies and accelerates your network troubleshooting. In this post, we’ll explore how this agentic AI tool can transform your approach to managing AKS networks.

Container Network Insights (CNI) Agent is a powerful tool designed to enhance troubleshooting of Azure Kubernetes Service (AKS) networking. By leveraging AI capabilities, it simplifies the diagnosis and resolution of network issues in your AKS clusters.

The Container Network Insights Agent is a helpful tool for identifying and resolving network issues in your Azure Kubernetes Service (AKS) clusters. You can describe an issue, like problems with DNS, packets being dropped, services that can’t be reached, or traffic that’s blocked. The agent will gather information from the AKS cluster and give you a detailed report that includes the cause of the problem and how to fix it.

Unlike other tools that only check the Kubernetes layer, this agent also examines host-level network data via a special Linux plugin. It checks things like network card stats, packet counts, and potential bottlenecks. This helps uncover hard-to-find issues, such as dropped packets or hardware overload.

The agent runs as a web app in your AKS cluster and is accessible in your web browser. It provides insights, analysis, and recommendations. You can then review what it finds and make any suggested changes yourself.

The container network insights agent achieves visibility through two main data sources:

  • AKS MCP Server: It integrates with the AKS MCP (Model Context Protocol) server, providing a secure interface for diagnostic commands via familiar tools such as kubectl, Cilium, and Hubble. This ensures security without the need for custom scripts or integrations.
  • Linux Networking Plugin: For deeper diagnostics, the agent gathers kernel-level telemetry from cluster nodes, including NIC ring buffer stats and packet counters. This allows it to identify issues such as packet drops and network saturation that surface-level metrics might overlook.

When you describe a networking issue, Container Network Insights Agent follows a structured diagnostic workflow:

The Container Network Insights Agent runs as a pod in your AKS cluster. You can access it through a web browser using HTTPS. Inside the cluster, the agent runs diagnostic commands using the AKS MCP server and connects to five data sources through special plugins:

  1. Kubernetes API Server: This plugin queries pods, services, nodes, network policies, and other cluster resources using kubectl via the AKS MCP server.
  2. CoreDNS: This plugin collects DNS health status and metrics.
  3. Cilium Agent: This plugin checks Cilium network policies and endpoint states using the AKS MCP server with the Kubernetes Networking plugin.
  4. Hubble: This plugin monitors live network flows and detects dropped traffic via the AKS MCP server using the Kubernetes Networking plugin.
  5. Node Network Stack: This plugin gathers host-level network statistics, including RX/TX buffers, ring buffer state, and softnet counters, using the Linux Networking plugin.

The agent communicates with Azure OpenAI Service. It sends your natural-language query and the gathered diagnostic evidence for analysis. In return, it receives clear diagnostic insights.

The container network insights agent is a complementary tool to existing observability stacks, enhancing diagnosis without replacing continuous metrics or dashboards. It provides deep telemetry by collecting kernel-level network statistics and diagnostics, enabling precise identification of issues beyond surface metrics.

The agent correlates evidence across multiple layers to uncover complex networking incidents, ensuring a comprehensive understanding of root causes. Its findings are structured and auditable, allowing for deterministic investigations and reproducible results. Additionally, it provides remediation guidance, empowering operators to make informed decisions based on the analysis.

The diagnostic workflow consists of four steps:

  1. Classify: The agent identifies the issue category (e.g., DNS, connectivity, network policy, service routing, or packet drops) based on your description.
  2. Collect evidence: The agent executes diagnostic commands on your cluster via the AKS MCP server, utilising kubectl, cilium, and hubble. Each diagnostic category uses a specific evidence-collection workflow to automatically gather relevant data.
  3. Analyse: The agent evaluates the collected evidence to distinguish between healthy signals and anomalies. All conclusions are drawn from actual command output, not from speculation.
  4. Report: You will receive a structured report that includes:
  • A summary of the issue and its current status.
  • An evidence table detailing each check, its result, and whether it passed or failed.
  • An analysis of what is functioning properly and what is not.
  • Identification of the root cause with specific citations of evidence.
  • Exact commands to resolve the issue and verify the fix.

The container network insights agent analyses symptoms by:

  • Classifying issues and planning tailored investigations.
  • Collecting evidence from the AKS MCP server and its Linux networking plugin, covering DNS, service routing, network policies, Cilium, and node statistics.
  • Identifying failures across layers and their impacts.
  • Delivering structured reports with pass/fail evidence, root cause analysis, and remediation guidance.

The agent focuses on AKS networking issues, such as DNS failures, packet drops, and connectivity problems. It does not modify workloads or configurations; all guidance is advisory, leaving the decision to apply it up to you.

Conclusion:

To read more about how to use the agent, follow the Microsoft step-by-step guide.

The Container Network Insights Agent is more than just a troubleshooting tool; it’s a game-changer in how we approach network management in AKS. By streamlining the troubleshooting process, it empowers teams to focus on innovation rather than getting bogged down by complexities. Embrace this agentic AI and transform your cloud infrastructure management today!