Kubernetes DNS Deep Dive: CoreDNS, ndots, and Debugging Resolution Failures#
DNS problems are responsible for a disproportionate number of Kubernetes debugging sessions. The symptoms are always vague – timeouts, connection refused, “could not resolve host” – and the root causes range from CoreDNS being down to a misunderstood setting called ndots.
How Pod DNS Resolution Works#
When a pod makes a DNS query, it goes through the following chain:
- The application calls
getaddrinfo()or equivalent. - The system resolver reads
/etc/resolv.confinside the pod. - The query goes to the nameserver specified in
resolv.conf, which is CoreDNS (reachable via thekube-dnsService inkube-system). - CoreDNS resolves the name – either from its internal zone (for cluster services) or by forwarding to upstream DNS.
Every pod’s /etc/resolv.conf looks something like this:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5The nameserver IP is the ClusterIP of the kube-dns service. The search domains let you use short names like my-service instead of my-service.default.svc.cluster.local.
ndots:5 – Why It Matters#
The ndots option controls when the resolver appends search domains. With ndots:5, any name with fewer than 5 dots is treated as a “relative” name and gets the search domains appended first.
When a pod queries api.example.com (2 dots, less than 5):
- First tries:
api.example.com.default.svc.cluster.local– fails - Then tries:
api.example.com.svc.cluster.local– fails - Then tries:
api.example.com.cluster.local– fails - Finally tries:
api.example.comas an absolute name – succeeds
That is 4 DNS queries instead of 1 for every external hostname resolution. For high-traffic services calling external APIs, this multiplies DNS load and adds latency.
Fix for specific pods:
spec:
dnsConfig:
options:
- name: ndots
value: "2"Setting ndots:2 means names with 2 or more dots (like api.example.com) are tried as absolute names first. Cluster-internal names like my-service (0 dots) or my-service.production (1 dot) still get search domains appended.
Alternative – use trailing dots: A name ending with a dot is always treated as absolute, bypassing search domain expansion entirely:
# In application config, use:
api.example.com. # trailing dot = FQDN, no search domain appendedCoreDNS#
CoreDNS runs as a Deployment in kube-system. Check its health:
# Are the pods running?
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Check logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns
# Is the service reachable?
kubectl get svc kube-dns -n kube-systemThe CoreDNS configuration is stored in a ConfigMap:
kubectl get configmap coredns -n kube-system -o yamlA typical Corefile looks like:
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}The kubernetes plugin handles cluster.local lookups. The forward plugin sends everything else to the node’s upstream DNS servers.
CoreDNS Crashlooping#
If CoreDNS is crashlooping, check the logs first. Common causes:
- Loop detection. If CoreDNS forwards to itself (the node’s
/etc/resolv.confpoints to 127.0.0.53 on systemd-resolved systems), theloopplugin detects it and kills the pod. Fix by configuring theforwardplugin to point to a real upstream DNS server instead of/etc/resolv.conf:
forward . 8.8.8.8 8.8.4.4- Resource limits. Under high query load, CoreDNS can OOM. Check
kubectl describe podfor OOMKilled. Increase memory limits in the CoreDNS Deployment.
Debugging DNS from Inside a Pod#
Quick one-off test:
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- nslookup kubernetes.defaultIf this returns the ClusterIP of the kubernetes API service, DNS is working. If it times out, CoreDNS is either down or unreachable.
Detailed debugging with a long-lived pod:
kubectl run dnsutils --image=registry.k8s.io/e2e-test-images/agnhost:2.39 \
--restart=Never --command -- sleep 3600
kubectl exec -it dnsutils -- nslookup kubernetes.default
kubectl exec -it dnsutils -- dig +search my-service.production.svc.cluster.local
kubectl exec -it dnsutils -- cat /etc/resolv.confThe agnhost image includes nslookup, dig, and other network tools. You can also kubectl exec into an existing application pod if it has DNS tools available.
DNS Policy and Custom Configuration#
The dnsPolicy field controls how /etc/resolv.conf is populated:
ClusterFirst(default): Uses CoreDNS. Cluster names resolve. External names forward through CoreDNS.Default: Uses the node’s DNS directly, bypassing CoreDNS. Cluster service names will NOT resolve.ClusterFirstWithHostNet: Same asClusterFirst, but for pods usinghostNetwork: true.None: Completely custom. You must provide all DNS config viadnsConfig.
A common mistake: setting dnsPolicy: Default thinking it means “the default behavior.” It does not. The actual default is ClusterFirst.
To add custom nameservers or tune DNS without overriding cluster DNS:
spec:
dnsPolicy: ClusterFirst
dnsConfig:
nameservers:
- 10.0.0.53 # additional nameserver
searches:
- mycompany.local # additional search domain
options:
- name: ndots
value: "2"With dnsPolicy: ClusterFirst, the dnsConfig entries are merged with cluster DNS settings rather than replacing them.
Quick DNS Debugging Checklist#
- Check
/etc/resolv.confin the pod. Is the nameserver correct? Are search domains present? - Test cluster DNS:
nslookup kubernetes.default. If this fails, CoreDNS is the problem. - Test external DNS:
nslookup google.com. If this fails but cluster DNS works, the CoreDNSforwardplugin is misconfigured. - Check CoreDNS pods:
kubectl get pods -n kube-system -l k8s-app=kube-dns. Are they Running? - Check CoreDNS logs:
kubectl logs -n kube-system -l k8s-app=kube-dns. Look for loop detection or upstream errors. - Check dnsPolicy: Is it accidentally set to
Defaultinstead ofClusterFirst? - Check network policies: Is egress to
kube-systemon port 53 allowed?