Scenario: Debugging Kubernetes Network Connectivity End-to-End#
The report comes in as it always does: “my application can’t reach another service.” This is one of the most common and most frustrating categories of Kubernetes issues because the networking stack has multiple layers, and the symptom (timeout, connection refused, 502) tells you almost nothing about which layer is broken.
This scenario walks through a systematic diagnostic process, starting from the symptom and narrowing down to the root cause. Follow these steps in order. Each step either identifies the problem or eliminates a layer from the investigation.
Step 1 – Clarify the Symptom#
Before touching kubectl, get precise answers to three questions:
What is the exact error?
- “Connection refused” – something is listening on the wrong port, or nothing is listening at all
- “Connection timed out” – packets are being dropped (firewall, network policy, or routing issue)
- “Name resolution failed” or “could not resolve host” – DNS problem
- “HTTP 502 Bad Gateway” or “503 Service Unavailable” – the proxy (ingress controller or service mesh) cannot reach the backend
What is the source and destination?
- Source: which pod (name, namespace, node) is making the request?
- Destination: which Service name, namespace, and port is it trying to reach?
- Is this pod-to-Service, pod-to-pod, or external-to-ingress?
What changed?
- Was this working before? If yes: what was deployed, restarted, or reconfigured since it last worked?
- New network policy? New deployment? Node replacement? Cluster upgrade?
Step 2 – DNS Resolution#
DNS is the most common source of connectivity failures. Test it first.
Exec into the source pod and run a DNS lookup:
kubectl exec -it <source-pod> -n <source-namespace> -- nslookup <service-name>.<target-namespace>.svc.cluster.localIf the pod does not have nslookup, try dig, host, or getent hosts:
kubectl exec -it <source-pod> -n <source-namespace> -- getent hosts <service-name>.<target-namespace>.svc.cluster.localIf the pod has no DNS tools at all, use an ephemeral debug container:
kubectl debug -it <source-pod> -n <source-namespace> --image=busybox:1.36 --target=<container-name> -- nslookup <service-name>.<target-namespace>.svc.cluster.localIf DNS resolution fails:
Check that CoreDNS is running:
kubectl get pods -n kube-system -l k8s-app=kube-dnsIf CoreDNS pods are not Running/Ready, check their logs:
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50Common CoreDNS failures: OOMKilled (increase memory limits), CrashLoopBackOff (check Corefile for syntax errors), or stuck in Pending (node resource pressure).
Verify the Service actually exists:
kubectl get svc <service-name> -n <target-namespace>If the Service does not exist, that is your answer. DNS cannot resolve a name that has no corresponding Service object.
Check the pod’s /etc/resolv.conf to confirm it points to the CoreDNS ClusterIP:
kubectl exec <source-pod> -n <source-namespace> -- cat /etc/resolv.confThe nameserver should match the ClusterIP of the kube-dns Service (typically 10.96.0.10 or similar).
If DNS resolution succeeds, note the ClusterIP returned and proceed to Step 3.
Step 3 – Service to Endpoints#
A Service routes traffic to pods via Endpoints. If the Service has no endpoints, traffic goes nowhere.
kubectl get endpoints <service-name> -n <target-namespace>If the Endpoints list is empty, the Service selector does not match any Ready pods. Compare the selector with actual pod labels:
# Check what the Service selector is
kubectl describe svc <service-name> -n <target-namespace> | grep Selector
# List pods matching that selector
kubectl get pods -n <target-namespace> -l <selector-key>=<selector-value>Common causes of empty endpoints:
- Selector mismatch: the Service selector says
app: my-servicebut the pods haveapp: myservice(missing hyphen) - No Ready pods: pods exist but their readiness probes are failing, so they are not added to the Endpoints
- Wrong namespace: the Service and the pods are in different namespaces (Services only select pods in the same namespace)
Check pod readiness:
kubectl get pods -n <target-namespace> -l <selector-key>=<selector-value> -o wideIf pods show 0/1 Ready, check their readiness probe:
kubectl describe pod <dest-pod> -n <target-namespace> | grep -A 10 "Readiness"If endpoints exist, note the pod IPs and ports listed. These are the actual backends receiving traffic.
Step 4 – Direct Pod-to-Pod Connectivity#
Bypass the Service layer entirely and test direct connectivity from the source pod to a destination pod IP.
# Get destination pod IP from endpoints
kubectl get endpoints <service-name> -n <target-namespace> -o jsonpath='{.subsets[0].addresses[0].ip}'
# Test direct connectivity
kubectl exec -it <source-pod> -n <source-namespace> -- curl -v http://<dest-pod-ip>:<container-port>/healthzIf curl is not available:
kubectl exec -it <source-pod> -n <source-namespace> -- wget -qO- http://<dest-pod-ip>:<container-port>/healthzIf no HTTP tools are available, test TCP connectivity:
kubectl exec -it <source-pod> -n <source-namespace> -- sh -c "echo > /dev/tcp/<dest-pod-ip>/<container-port>"If direct connectivity fails: the problem is at the network level. Proceed to Step 5 (Network Policies) and Step 7 (Node-level).
If direct connectivity works: the problem is in the Service layer. Proceed to Step 6 (Service port mapping).
Step 5 – Network Policies#
Network policies are the most common cause of “it was working and now it’s not” scenarios, especially after a team adds a default-deny policy.
Check for network policies in both the source and destination namespaces:
kubectl get networkpolicies -n <source-namespace>
kubectl get networkpolicies -n <target-namespace>If network policies exist, inspect them:
kubectl describe networkpolicy <policy-name> -n <target-namespace>Default-deny ingress policy blocks all incoming traffic to pods in the namespace unless an explicit allow rule exists:
# This policy blocks EVERYTHING inbound
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: target-namespace
spec:
podSelector: {}
policyTypes:
- IngressTo allow traffic from the source namespace, add an ingress rule:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-source
namespace: target-namespace
spec:
podSelector:
matchLabels:
app: my-service
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: source-namespace
- podSelector:
matchLabels:
app: source-app
ports:
- protocol: TCP
port: 8080Common network policy mistakes:
- Forgetting DNS: A default-deny egress policy blocks DNS (UDP port 53) unless explicitly allowed. The symptom is DNS resolution failures, which look identical to CoreDNS being down.
# Allow DNS egress -- required if you have default-deny egress
ingress: []
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53-
AND vs OR in selectors: In a single
fromrule,namespaceSelectorandpodSelectorat the same level are ANDed (must match both). Separatefromentries are ORed (match either). This is a frequent source of confusion. -
Wrong port: The policy allows TCP 80 but the container listens on 8080.
-
Missing label on namespace: The
namespaceSelectorrequires a label on the source namespace. Kubernetes 1.21+ automatically addskubernetes.io/metadata.name, but custom labels must be added manually.
Step 6 – Service Port Mapping#
If direct pod connectivity works but Service-based access fails, the port mapping is wrong.
kubectl describe svc <service-name> -n <target-namespace>Check these fields:
- Port: the port the Service listens on (what clients connect to)
- TargetPort: the port on the pod where traffic is forwarded (must match what the container is actually listening on)
- Endpoints: should show pod IPs with the target port
A common misconfiguration:
# The Service exposes port 80 but forwards to port 80 on the pod
# However, the container actually listens on port 8080
spec:
ports:
- port: 80
targetPort: 80 # Should be 8080Verify what the container is actually listening on:
kubectl exec <dest-pod> -n <target-namespace> -- ss -tlnp
# or
kubectl exec <dest-pod> -n <target-namespace> -- netstat -tlnpThe output shows which ports have processes listening. If the process listens on 8080 but the Service targetPort is 80, fix the Service definition.
Also check if the pod is binding to 0.0.0.0 (all interfaces) versus 127.0.0.1 (localhost only). A container that binds to localhost will reject connections from the Kubernetes network:
kubectl exec <dest-pod> -n <target-namespace> -- ss -tlnp | grep 8080
# Good: *:8080 or 0.0.0.0:8080
# Bad: 127.0.0.1:8080Step 7 – Node-Level Networking#
If pod-to-pod connectivity fails and network policies are not the cause, the problem is in the cluster networking layer.
Same-node connectivity#
If both pods are on the same node and cannot communicate, the CNI plugin has a problem:
# Check which nodes the pods are on
kubectl get pods -o wide -n <source-namespace> <source-pod>
kubectl get pods -o wide -n <target-namespace> <dest-pod>Cross-node connectivity#
If pods on different nodes cannot communicate, check:
CNI plugin health:
# For Calico
kubectl get pods -n kube-system -l k8s-app=calico-node
# For Cilium
kubectl get pods -n kube-system -l k8s-app=cilium
# For Flannel
kubectl get pods -n kube-system -l app=flannelIf CNI pods are not Running, check their logs for errors.
kube-proxy:
kube-proxy maintains iptables or IPVS rules that implement Service routing. If kube-proxy is down, Services do not work even though direct pod-to-pod connectivity may work.
kubectl get pods -n kube-system -l k8s-app=kube-proxy
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20Check that iptables rules for the Service exist:
# On the node (via SSH or debug pod)
iptables -t nat -L KUBE-SERVICES | grep <service-name>If the Service is missing from iptables, kube-proxy is not syncing rules for that Service. Check kube-proxy logs for errors.
Step 8 – Ingress-Level (External Traffic)#
If the connectivity issue is from external clients to the cluster, the problem may be in the Ingress layer.
Check the Ingress resource:
kubectl describe ingress <ingress-name> -n <namespace>Verify the backend Service and port are correct:
kubectl get ingress <ingress-name> -n <namespace> -o jsonpath='{.spec.rules[0].http.paths[0].backend}'Check the ingress controller logs:
# For nginx ingress controller
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=50Look for upstream connection errors like connect() failed (111: Connection refused) or upstream timed out.
Test TLS:
curl -v https://<hostname>
# Check certificate subject, issuer, and expiryIf the certificate is expired or does not match the hostname, the TLS handshake fails before the request reaches the application.
Check cloud load balancer health checks:
# AWS
aws elbv2 describe-target-health --target-group-arn <arn>
# GCP
gcloud compute backend-services get-health <backend-service> --globalIf health checks are failing, the load balancer stops sending traffic to the cluster even though everything inside the cluster is working.
Decision Tree Summary#
Connectivity failure
|
+-- DNS lookup fails?
| +-- CoreDNS running? No -> fix CoreDNS
| +-- Service exists? No -> create Service
| +-- resolv.conf correct? No -> check dnsPolicy in pod spec
|
+-- DNS works, Service has no Endpoints?
| +-- Selector matches pods? No -> fix selector
| +-- Pods ready? No -> fix readiness probes
|
+-- Endpoints exist, direct pod-to-pod fails?
| +-- Network policy blocking? Yes -> add allow rule
| +-- CNI plugin healthy? No -> fix CNI
| +-- Cross-node? Check kube-proxy and node routes
|
+-- Direct pod-to-pod works, Service access fails?
| +-- TargetPort matches container port? No -> fix Service
| +-- Container binding to 0.0.0.0? No -> fix app bind address
|
+-- Everything internal works, external fails?
+-- Ingress backend correct? No -> fix Ingress
+-- Ingress controller healthy? No -> check controller logs
+-- TLS certificate valid? No -> renew certificate
+-- Load balancer health check passing? No -> fix health checkQuick Reference Commands#
# Full diagnostic sequence -- run these in order
# 1. DNS
kubectl exec -it <pod> -n <ns> -- nslookup <svc>.<target-ns>.svc.cluster.local
# 2. Service and Endpoints
kubectl get svc,endpoints <svc> -n <target-ns>
# 3. Pod readiness
kubectl get pods -n <target-ns> -l <selector> -o wide
# 4. Direct connectivity
kubectl exec -it <pod> -n <ns> -- curl -s -o /dev/null -w "%{http_code}" http://<pod-ip>:<port>/healthz
# 5. Network policies
kubectl get networkpolicies -n <target-ns> -o yaml
# 6. Port verification
kubectl exec <dest-pod> -n <target-ns> -- ss -tlnp
# 7. CNI and kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get pods -n kube-system -l k8s-app=kube-proxy
# 8. Ingress
kubectl describe ingress <name> -n <ns>
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=30Work through these steps in order. Most connectivity issues resolve at Step 2 (DNS), Step 3 (missing endpoints), or Step 5 (network policies). Steps 7 and 8 are rarely needed but essential when the simpler explanations do not apply.