Choosing a CNI Plugin#

The Container Network Interface (CNI) plugin is one of the most consequential infrastructure decisions in a Kubernetes cluster. It determines how pods get IP addresses, how traffic flows between them, whether network policies are enforced, and what observability you get into network behavior. Changing CNI after deployment is painful – it typically requires draining and rebuilding nodes, or rebuilding the cluster entirely. Choose carefully up front.

What CNI Plugins Do#

Every Kubernetes cluster needs a CNI plugin to handle three responsibilities:

  1. IP address management (IPAM): Assign each pod a unique IP address from a configured range.
  2. Pod-to-pod routing: Ensure any pod can reach any other pod by IP, across nodes, without NAT (the Kubernetes networking model requirement).
  3. Network policy enforcement: Implement NetworkPolicy resources that control which pods can communicate with which other pods.

The CNI plugin runs as a DaemonSet (one instance per node) and configures the host networking stack. Some use iptables, some use eBPF, some use kernel routing tables, and some use overlay networks (VXLAN, Geneve) to encapsulate traffic.

Decision Criteria#

Criteria Why It Matters
Network policy support Without it, all pod-to-pod traffic is allowed. Required for production security posture.
L7 policy support L3/L4 policies filter on IP and port. L7 policies filter on HTTP method, path, headers – much more granular.
Performance Overlay networks add encapsulation overhead. eBPF-based solutions bypass iptables for better throughput and latency.
Encryption WireGuard or IPsec for encrypting pod-to-pod traffic in transit. Important for compliance and multi-tenant clusters.
Observability Built-in flow logs, DNS visibility, and network topology maps reduce debugging time.
Managed K8s compatibility Some CNIs are pre-installed or required by managed services (EKS, GKE, AKS).
Operational complexity Simpler plugins have fewer failure modes. More capable plugins have more knobs to tune.

Comparison Table#

Feature Flannel Calico Cilium AWS VPC CNI Azure CNI GKE Native
Network policy (L3/L4) No Yes Yes Requires addon Yes (with Azure NPM or Calico) Yes (with GKE Dataplane V2/Cilium)
Network policy (L7) No Limited (with Envoy) Yes (native) No No Yes (via Cilium)
eBPF dataplane No Optional Yes (default) No No Yes (Dataplane V2)
Overlay mode VXLAN VXLAN, IPIP, WireGuard VXLAN, Geneve, native No (VPC routing) No (VNet routing) No (VPC routing)
Native routing (no overlay) No Yes (BGP) Yes Yes Yes Yes
Encryption No WireGuard WireGuard, IPsec No No No
Built-in observability No Basic flow logs Yes (Hubble – flow logs, DNS, service map) VPC Flow Logs NSG Flow Logs GKE Dataplane V2 flow logs
Service mesh capabilities No No Yes (Cilium Mesh) No No No
Multi-cluster networking No Yes (with Typha) Yes (Cluster Mesh) Transit Gateway VNet Peering GKE Multi-cluster
Maturity Very mature Very mature Mature (CNCF Graduated) Mature Mature Mature
Complexity Very low Medium Medium-High Low (managed) Low (managed) Low (managed)

Flannel#

Flannel is the simplest CNI plugin. It creates a VXLAN overlay network, assigns pod IPs, and routes traffic. That is all it does. It does not implement network policies.

Choose Flannel when:

  • You are running a development or test cluster where network policies are not needed.
  • You are using minikube, kind, or k3s for local development.
  • Simplicity is the highest priority and you will never need to restrict pod-to-pod traffic.
  • You want the smallest operational footprint possible.

Do not choose Flannel when:

  • You need network policies now or might need them in the future. Migrating away from Flannel to add network policy support later is a cluster rebuild.
  • You need encryption, observability, or performance beyond what a basic VXLAN overlay provides.

Calico#

Calico is the most widely deployed CNI plugin in production. It supports both overlay (VXLAN, IPIP) and native routing (BGP) modes. It provides full L3/L4 network policy support, and optionally uses eBPF for the dataplane instead of iptables.

Choose Calico when:

  • You need network policy enforcement in production.
  • You are running on bare metal and want BGP-based routing for high performance without overlay overhead.
  • You need broad compatibility – Calico works on every cloud, bare metal, and most managed Kubernetes offerings.
  • You want a proven, battle-tested solution with extensive documentation and community support.
  • You need WireGuard encryption for pod-to-pod traffic.

Limitations:

  • L7 policy support requires deploying Envoy alongside Calico, adding complexity.
  • Observability is basic compared to Cilium’s Hubble.
  • BGP configuration on bare metal requires understanding of network topology.
# Calico NetworkPolicy example -- deny all ingress, allow from specific namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-server-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: frontend
    ports:
    - protocol: TCP
      port: 8080

Cilium#

Cilium is an eBPF-based CNI that moves networking logic into the Linux kernel. This means no iptables rules, lower latency, and the ability to make policy decisions at L7 (HTTP, gRPC, Kafka, DNS). Cilium includes Hubble, a built-in observability platform that provides real-time flow visibility, DNS query logs, and service dependency maps.

Choose Cilium when:

  • You need L7 network policies (e.g., allow GET /api/users but deny DELETE /api/users).
  • Network observability is a requirement – Hubble provides deep visibility without deploying separate tools.
  • You want high-performance networking without iptables overhead.
  • You are evaluating service mesh capabilities and want to avoid deploying a separate mesh (Cilium can handle mutual TLS, load balancing, and traffic management).
  • You need transparent encryption (WireGuard) without application changes.
  • You are building a new cluster and want to invest in the direction the ecosystem is moving.

Limitations:

  • Requires a Linux kernel version >= 4.19 (5.10+ recommended for full feature set). Most modern distributions satisfy this.
  • Higher operational complexity than Calico for basic use cases.
  • More resource consumption on each node (the eBPF agent and Hubble relay add overhead).
  • If you only need L3/L4 policies, Cilium’s additional capabilities are unused complexity.
# Cilium L7 NetworkPolicy -- allow only GET requests to /api/public
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-l7-policy
  namespace: production
spec:
  endpointSelector:
    matchLabels:
      app: api-server
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: GET
          path: "/api/public.*"

Cloud-Native CNI#

Each major cloud provider offers a CNI that integrates directly with the cloud’s networking layer. Pods receive IP addresses from the cloud VPC/VNet, eliminating overlay networks entirely.

  • AWS VPC CNI: Each pod gets a real VPC IP address. Enables security groups for pods, VPC flow logs, and native AWS networking integration.
  • Azure CNI: Pods get Azure VNet IP addresses. Integrates with NSGs and Azure networking.
  • GKE Dataplane V2: GKE’s native networking layer, built on Cilium. Provides eBPF-based networking and network policy enforcement.

Choose Cloud-Native CNI when:

  • You are using a managed Kubernetes service and want the tightest integration with cloud networking.
  • You need pods to be directly addressable from other cloud resources (VMs, Lambda functions, RDS).
  • You want the cloud provider to manage CNI upgrades and compatibility.
  • You have no cross-cloud or hybrid requirements.

Limitations:

  • IP address exhaustion: AWS VPC CNI consumes VPC IP addresses for every pod. In large clusters, this can exhaust subnet capacity. Plan your VPC CIDR ranges carefully.
  • Vendor lock-in – your network configuration is not portable across clouds.
  • Feature availability depends on the cloud provider’s release cycle.
  • Network policy support varies: AWS VPC CNI requires Calico or a separate policy engine; Azure CNI requires Azure NPM or Calico; GKE Dataplane V2 includes Cilium natively.

Migration Difficulty#

Switching CNI plugins is one of the hardest operational changes in Kubernetes. The CNI is deeply integrated into the node networking stack – every pod IP, every route, and every policy rule is managed by the CNI agent.

Typical migration approaches:

  • Cluster rebuild: Create a new cluster with the target CNI and migrate workloads. This is the cleanest approach and usually the least risky despite the apparent overhead.
  • Node-by-node replacement: Cordon and drain each node, reconfigure the CNI on the replacement node, and uncordon. This is technically possible but fragile and time-consuming.
  • In-place swap (not recommended): Removing one CNI and installing another on live nodes risks network partitions and pod connectivity loss.

The high migration cost is why the initial CNI choice matters disproportionately. If there is any chance you will need network policies, do not start with Flannel.

Common Mistakes#

Choosing Flannel and needing network policies later. This is the most common regret. Flannel is appealing for its simplicity, but when a security audit or compliance requirement demands network segmentation, you are facing a cluster rebuild.

Ignoring IP address planning with cloud-native CNI. AWS VPC CNI can consume thousands of VPC IPs in a large cluster. Without proper subnet sizing (use /16 or larger for pod subnets), you will hit IP exhaustion and pods will fail to schedule.

Over-engineering with Cilium for simple use cases. If your requirements are L3/L4 network policies and standard routing, Calico delivers that with less complexity. Choose Cilium when you genuinely need L7 policies, Hubble observability, or eBPF performance.

Choose X When – Summary#

Scenario Recommended CNI
Development/test clusters, no policy needs Flannel
Production clusters needing network policies Calico
Bare metal with BGP routing Calico
L7 network policies required Cilium
Network observability is a priority Cilium (Hubble)
High-performance, iptables-free dataplane Cilium
Managed EKS, native AWS integration AWS VPC CNI + Calico (for policies)
Managed GKE, native Google integration GKE Dataplane V2 (Cilium-based)
Managed AKS, native Azure integration Azure CNI + Calico (for policies)
New cluster, future-proofing Cilium (ecosystem direction, CNCF Graduated)
Simplest production-ready option Calico

If you are starting a new production cluster today and have no strong constraints, Calico is the safe default with the broadest compatibility. If you want to invest in the direction the ecosystem is heading and can accept slightly higher operational complexity, Cilium is the forward-looking choice. Use Flannel only for throwaway development environments.