Zero Trust Architecture#
Zero trust means no implicit trust. A request from inside the corporate network is treated with the same suspicion as a request from the public internet. Every request must prove who it is, what it is allowed to do, and that it is coming from a healthy device or service — regardless of network location.
This is not a product you buy. It is an architectural approach that requires changes to authentication, authorization, network design, and monitoring.
Core Principles#
1. Verify Explicitly#
Every request is authenticated and authorized based on all available data: identity, device health, location, resource sensitivity, and behavioral patterns. There is no “trusted zone” where requests skip verification.
In practice this means:
- Every API call carries credentials (tokens, certificates, or both).
- Every service validates those credentials before processing the request.
- Network position does not grant access. Being inside the VPN or the same Kubernetes cluster does not bypass authentication.
2. Least Privilege Access#
Grant the minimum permissions needed for the task. Permissions are scoped by:
- Identity — Which user or service.
- Resource — Which specific data or endpoint.
- Action — Read, write, delete, admin.
- Time — Short-lived tokens, just-in-time access.
- Context — Device posture, location, risk score.
Overly broad permissions are the root cause of most privilege escalation attacks. An application that has read access to every database table will eventually be exploited to read data it should not.
3. Assume Breach#
Design systems assuming an attacker is already inside the network. This changes how you architect:
- Encrypt all internal traffic (mTLS between services).
- Segment networks so a compromised service cannot reach everything.
- Log everything for forensic analysis.
- Detect lateral movement by monitoring unusual access patterns.
- Limit blast radius by isolating services and data stores.
From Perimeter Security to Zero Trust#
The Perimeter Model (What We Are Replacing)#
Internet ──→ Firewall ──→ Internal Network (trusted)
├── Service A ──→ Service B (no auth)
├── Database (accessible from internal)
└── Admin panel (accessible from VPN)Once past the firewall, everything trusts everything. An attacker who compromises one service has lateral access to all internal services and databases.
The Zero Trust Model#
Internet ──→ Identity-Aware Proxy ──→ Service A
│
├── mTLS + authz ──→ Service B
├── mTLS + authz ──→ Database (scoped access)
└── Denied ──→ Admin panel (wrong identity)
Internal Network:
Service C ──→ mTLS + authz ──→ Service D
│
└── Denied ──→ Service E (no policy allows it)Every connection requires identity proof and authorization, whether it originates from the internet or from the next pod in the same namespace.
Identity: The Foundation#
Zero trust replaces network location with identity as the access control primitive. Identity must be strong, verifiable, and tied to the specific entity making the request.
Service Identity with SPIFFE#
SPIFFE (Secure Production Identity Framework for Everyone) provides a standard for service identity:
spiffe://example.com/service/payment-api
spiffe://example.com/service/order-processor
spiffe://example.com/service/user-serviceEach service gets a SPIFFE ID and a short-lived X.509 certificate (SVID) that proves it. SPIRE (the SPIFFE Runtime Environment) automates certificate issuance and rotation.
# SPIRE server entry: payment-api can talk to order-processor
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
name: payment-api
spec:
spiffeIDTemplate: "spiffe://example.com/service/payment-api"
podSelector:
matchLabels:
app: payment-api
dnsNameTemplates:
- "payment-api.{{ .PodMeta.Namespace }}.svc.cluster.local"Service Mesh Identity#
Istio and Linkerd implement service identity using mTLS certificates issued per-pod:
Istio identity: spiffe://cluster.local/ns/my-app/sa/payment-apiThe service mesh handles certificate issuance, rotation, and mTLS negotiation automatically. Authorization policies reference these identities:
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: payment-api-policy
namespace: my-app
spec:
selector:
matchLabels:
app: payment-api
rules:
- from:
- source:
principals: ["cluster.local/ns/my-app/sa/order-processor"]
to:
- operation:
methods: ["POST"]
paths: ["/api/v1/payments"]Only order-processor can call payment-api, and only the POST method on the payments endpoint. Everything else is denied.
User Identity#
For human users, zero trust requires strong authentication:
- Single Sign-On (SSO) with a central identity provider (Okta, Azure AD, Google Workspace).
- Multi-Factor Authentication (MFA) on all access to sensitive resources.
- Short-lived sessions with re-authentication for elevated operations.
- Device posture checks — is the device managed, encrypted, up to date?
Microsegmentation#
Traditional networks use VLANs and subnets for segmentation. Microsegmentation creates fine-grained boundaries around individual services or workloads.
Kubernetes Network Policies#
Network policies are the simplest form of microsegmentation in Kubernetes:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-api
namespace: my-app
spec:
podSelector:
matchLabels:
app: payment-api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: order-processor
ports:
- port: 8080
protocol: TCP
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
protocol: TCP
- to: # DNS
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP
protocol: TCPThis policy says: payment-api can only receive traffic from order-processor on port 8080, and can only send traffic to postgres on port 5432 (plus DNS). All other traffic is blocked.
Default Deny#
The first step in microsegmentation is denying all traffic by default:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: my-app
spec:
podSelector: {}
policyTypes:
- Ingress
- EgressThen explicitly allow only the traffic that should exist. This is the network equivalent of least privilege.
Service Mesh Segmentation#
Network policies operate at L3/L4 (IP and port). Service mesh policies operate at L7 (HTTP method, path, headers):
# Istio: allow only GET /api/v1/products from frontend
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
name: product-api-policy
spec:
selector:
matchLabels:
app: product-api
rules:
- from:
- source:
principals: ["cluster.local/ns/my-app/sa/frontend"]
to:
- operation:
methods: ["GET"]
paths: ["/api/v1/products", "/api/v1/products/*"]L7 policies are more precise. A network policy can allow traffic from frontend to product-api on port 8080. A service mesh policy can further restrict it to GET requests on specific paths.
Use both layers. Network policies prevent unauthorized network access. Service mesh policies enforce application-level authorization.
Policy Enforcement#
Open Policy Agent (OPA)#
OPA is a general-purpose policy engine that evaluates access decisions using Rego policies:
# policy.rego
package authz
default allow = false
# Allow if the caller's service identity matches an allowed caller
allow {
input.source.principal == "cluster.local/ns/my-app/sa/order-processor"
input.destination.service == "payment-api"
input.request.method == "POST"
}
# Allow health checks from anywhere
allow {
input.request.path == "/healthz"
input.request.method == "GET"
}OPA integrates with Envoy (as an external authorizer), Kubernetes (as an admission controller via Gatekeeper), and application code (as a library or sidecar).
Policy as Code#
Store policies in Git alongside application code:
policies/
authz/
payment-api.rego
order-processor.rego
network/
default-deny.yaml
payment-api.yaml
admission/
pod-security.regoReview policies in pull requests. Test policies in CI. Deploy policies through the same GitOps pipeline as applications. This makes security changes auditable and reversible.
BeyondCorp: Zero Trust for User Access#
Google’s BeyondCorp model replaces VPNs with identity-aware proxies:
Traditional:
User → VPN → Internal network → Application (trusted because VPN)
BeyondCorp:
User → Identity-Aware Proxy → Application
│
├── Verify user identity (SSO + MFA)
├── Check device posture (managed? encrypted? patched?)
├── Evaluate access policy (role + resource + context)
└── Allow or deny per-requestIdentity-Aware Proxy Implementation#
User IAP / Access Proxy Application
│ │ │
│── Request ──────────────→│ │
│ │── Check auth cookie ─────────│
│ │ (redirect to IDP if none) │
│ │ │
│ │── Evaluate policy ───────────│
│ │ (user + device + resource) │
│ │ │
│ │── Forward with identity ────→│
│ │ (X-Forwarded-User, etc.) │
│ │ │
│←── Response ─────────────│←─────────────────────────────│Products implementing this pattern: Google IAP, Cloudflare Access, Pomerium, Ory Oathkeeper, OAuth2 Proxy.
Example with Cloudflare Access:
# Protect an internal application
Application:
name: Internal Dashboard
domain: dashboard.example.com
type: self_hosted
Policy:
name: engineering-team
decision: allow
include:
- email_domain: example.com
require:
- group: engineering
- device_posture:
- disk_encryption: true
- os_version_min: "14.0"Users access dashboard.example.com directly — no VPN. Cloudflare Access verifies their identity, group membership, and device posture before forwarding the request.
Implementation Steps#
Moving to zero trust is incremental. Do not attempt a big-bang migration.
Phase 1: Visibility#
Before enforcing policies, understand your current traffic patterns:
- Deploy a service mesh in permissive mode (Istio with
PERMISSIVEmTLS). - Enable access logging on all services.
- Map service dependencies: which services talk to which.
- Identify sensitive data flows: where does PII, financial data, and credentials move.
Phase 2: Identity#
Establish strong identity for all entities:
- Enable mTLS in the service mesh (switch to
STRICTmode). - Implement SSO + MFA for all user-facing access.
- Replace VPN-only access with an identity-aware proxy for internal tools.
- Issue service accounts and API keys with specific scopes instead of shared credentials.
Phase 3: Segmentation#
Restrict traffic to only what is needed:
- Apply default-deny network policies in all namespaces.
- Create explicit allow policies for each service’s required communications.
- Add service mesh authorization policies for L7 control.
- Restrict database access to specific services (not the entire namespace).
Phase 4: Continuous Verification#
Move from point-in-time checks to continuous evaluation:
- Monitor for anomalous access patterns (unusual source, time, volume).
- Re-evaluate access decisions on context changes (device posture degrades, location changes).
- Implement just-in-time access for privileged operations (admin access granted for 1 hour, then revoked).
- Automate policy testing and deployment through CI/CD.
Common Mistakes#
- Treating zero trust as a product purchase. No single vendor delivers zero trust. It is an architecture that requires changes across identity, network, application, and monitoring layers.
- Deploying microsegmentation without understanding traffic flows first. Default-deny in a namespace you do not understand breaks applications. Observe first, then enforce.
- Relying on network policies alone. Network policies enforce IP and port rules. They do not authenticate callers or authorize actions. Combine with service mesh policies and application-level auth.
- Exempting internal services from authentication. The most common zero trust failure is “but that service is internal.” Internal services are exactly where lateral movement happens after an initial compromise.
- Not testing policies before enforcement. Deploy policies in audit/dry-run mode first. Istio’s
PERMISSIVEmode and OPA’s decision logging let you see what would be blocked without actually blocking it.