Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures#

DNS problems are responsible for a disproportionate number of Kubernetes debugging sessions. The symptoms are always vague – timeouts, connection refused, “could not resolve host” – and the root causes range from CoreDNS being down to a misunderstood setting called ndots.

How Pod DNS Resolution Works#

When a pod makes a DNS query, it goes through the following chain:

The application calls getaddrinfo() or equivalent.
The system resolver reads /etc/resolv.conf inside the pod.
The query goes to the nameserver specified in resolv.conf, which is CoreDNS (reachable via the kube-dns Service in kube-system).
CoreDNS resolves the name – either from its internal zone (for cluster services) or by forwarding to upstream DNS.

Every pod’s /etc/resolv.conf looks something like this:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

The nameserver IP is the ClusterIP of the kube-dns service. The search domains let you use short names like my-service instead of my-service.default.svc.cluster.local.

ndots:5 – Why It Matters#

The ndots option controls when the resolver appends search domains. With ndots:5, any name with fewer than 5 dots is treated as a “relative” name and gets the search domains appended first.

When a pod queries api.example.com (2 dots, less than 5):

First tries: api.example.com.default.svc.cluster.local – fails
Then tries: api.example.com.svc.cluster.local – fails
Then tries: api.example.com.cluster.local – fails
Finally tries: api.example.com as an absolute name – succeeds

That is 4 DNS queries instead of 1 for every external hostname resolution. For high-traffic services calling external APIs, this multiplies DNS load and adds latency.

Fix for specific pods:

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Setting ndots:2 means names with 2 or more dots (like api.example.com) are tried as absolute names first. Cluster-internal names like my-service (0 dots) or my-service.production (1 dot) still get search domains appended.

Alternative – use trailing dots: A name ending with a dot is always treated as absolute, bypassing search domain expansion entirely:

# In application config, use:
api.example.com.    # trailing dot = FQDN, no search domain appended

CoreDNS#

CoreDNS runs as a Deployment in kube-system. Check its health:

# Are the pods running?
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns

# Is the service reachable?
kubectl get svc kube-dns -n kube-system

The CoreDNS configuration is stored in a ConfigMap:

kubectl get configmap coredns -n kube-system -o yaml

A typical Corefile looks like:

.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf {
        max_concurrent 1000
    }
    cache 30
    loop
    reload
    loadbalance
}

The kubernetes plugin handles cluster.local lookups. The forward plugin sends everything else to the node’s upstream DNS servers.

CoreDNS Crashlooping#

If CoreDNS is crashlooping, check the logs first. Common causes:

Loop detection. If CoreDNS forwards to itself (the node’s /etc/resolv.conf points to 127.0.0.53 on systemd-resolved systems), the loop plugin detects it and kills the pod. Fix by configuring the forward plugin to point to a real upstream DNS server instead of /etc/resolv.conf:

forward . 8.8.8.8 8.8.4.4

Resource limits. Under high query load, CoreDNS can OOM. Check kubectl describe pod for OOMKilled. Increase memory limits in the CoreDNS Deployment.

Debugging DNS from Inside a Pod#

Quick one-off test:

kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.default

If this returns the ClusterIP of the kubernetes API service, DNS is working. If it times out, CoreDNS is either down or unreachable.

Detailed debugging with a long-lived pod:

kubectl run dnsutils --image=registry.k8s.io/e2e-test-images/agnhost:2.39 \
  --restart=Never --command -- sleep 3600

kubectl exec -it dnsutils -- nslookup kubernetes.default
kubectl exec -it dnsutils -- dig +search my-service.production.svc.cluster.local
kubectl exec -it dnsutils -- cat /etc/resolv.conf

The agnhost image includes nslookup, dig, and other network tools. You can also kubectl exec into an existing application pod if it has DNS tools available.

DNS Policy and Custom Configuration#

The dnsPolicy field controls how /etc/resolv.conf is populated:

ClusterFirst (default): Uses CoreDNS. Cluster names resolve. External names forward through CoreDNS.
Default: Uses the node’s DNS directly, bypassing CoreDNS. Cluster service names will NOT resolve.
ClusterFirstWithHostNet: Same as ClusterFirst, but for pods using hostNetwork: true.
None: Completely custom. You must provide all DNS config via dnsConfig.

A common mistake: setting dnsPolicy: Default thinking it means “the default behavior.” It does not. The actual default is ClusterFirst.

To add custom nameservers or tune DNS without overriding cluster DNS:

spec:
  dnsPolicy: ClusterFirst
  dnsConfig:
    nameservers:
      - 10.0.0.53           # additional nameserver
    searches:
      - mycompany.local      # additional search domain
    options:
      - name: ndots
        value: "2"

With dnsPolicy: ClusterFirst, the dnsConfig entries are merged with cluster DNS settings rather than replacing them.

Quick DNS Debugging Checklist#

Check /etc/resolv.conf in the pod. Is the nameserver correct? Are search domains present?
Test cluster DNS: nslookup kubernetes.default. If this fails, CoreDNS is the problem.
Test external DNS: nslookup google.com. If this fails but cluster DNS works, the CoreDNS forward plugin is misconfigured.
Check CoreDNS pods: kubectl get pods -n kube-system -l k8s-app=kube-dns. Are they Running?
Check CoreDNS logs: kubectl logs -n kube-system -l k8s-app=kube-dns. Look for loop detection or upstream errors.
Check dnsPolicy: Is it accidentally set to Default instead of ClusterFirst?
Check network policies: Is egress to kube-system on port 53 allowed?