GitOps for Kubernetes#
GitOps is a deployment model where git is the source of truth for your cluster’s desired state. A controller running inside the cluster watches a git repository and continuously reconciles the live state to match what is declared in git. When you want to change something, you commit to git. The controller detects the change and applies it.
This replaces kubectl apply from laptops and CI pipelines with a pull-based model where the cluster pulls its own configuration. The benefits are an audit trail in git history, easy rollback via git revert, and drift detection when someone makes manual changes.
Pull vs Push#
Push model (traditional CI/CD): A CI pipeline runs kubectl apply or helm upgrade after building an image. The pipeline needs cluster credentials. If the pipeline fails mid-deploy, the cluster may be in a partial state.
Pull model (GitOps): A controller inside the cluster watches git and applies changes. No external system needs cluster credentials. If the controller restarts, it simply re-reads git and reconciles. The cluster converges to the declared state regardless of transient failures.
The pull model is more secure (credentials stay in-cluster) and more resilient (self-healing on drift). The push model is simpler to start with and integrates naturally with existing CI pipelines.
ArgoCD vs Flux#
Both are CNCF projects that implement GitOps. The choice depends on your team’s preferences and operational model.
ArgoCD#
ArgoCD centers on the Application custom resource. Each Application points to a git repo path and a target cluster/namespace:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: order-service
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/company/k8s-config.git
targetRevision: main
path: apps/order-service/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
retry:
limit: 3
backoff:
duration: 5s
factor: 2Key characteristics:
- Web UI: ArgoCD has a rich web interface showing sync status, resource trees, diff views, and log access. This is its biggest differentiator.
- Multi-cluster from a single instance: One ArgoCD installation can manage applications across many clusters.
- Sync waves and hooks: Control the order of resource application with annotations. Apply CRDs before the resources that use them.
- Supports Helm, Kustomize, Jsonnet, and plain YAML.
Flux#
Flux uses a set of controllers, each handling a specific concern:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: k8s-config
namespace: flux-system
spec:
interval: 1m
url: https://github.com/company/k8s-config.git
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: order-service
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: k8s-config
path: ./apps/order-service/overlays/production
prune: true
targetNamespace: production
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: order-service
namespace: productionKey characteristics:
- CLI-focused: Flux is managed through
fluxCLI and Kubernetes manifests. No built-in UI (third-party dashboards exist). - Per-cluster installation: Each cluster runs its own Flux instance, watching the same or different repos.
- Native Helm controller: Flux’s HelmRelease CRD manages Helm releases declaratively, including values from ConfigMaps or secrets.
- Native SOPS support: Decrypt secrets in git as part of the reconciliation pipeline.
- Composable controllers: GitRepository, Kustomization, HelmRelease, ImagePolicy – each is independent and can be combined.
Repository Structure#
Monorepo#
All Kubernetes manifests in a single repository, separated by directory:
k8s-config/
clusters/
staging/
kustomization.yaml # points to apps with staging overlays
production/
kustomization.yaml # points to apps with production overlays
apps/
order-service/
base/
deployment.yaml
service.yaml
kustomization.yaml
overlays/
staging/
kustomization.yaml # patch: 1 replica, staging image tag
production/
kustomization.yaml # patch: 3 replicas, production image tag
payment-service/
base/
overlays/
infrastructure/
cert-manager/
ingress-nginx/
monitoring/Advantages: single source of truth, easy to search across all services, atomic commits across services. Disadvantage: large repos become slow, access control is coarse (everyone can see everything).
Polyrepo#
Each team or service owns its own config repo:
team-orders/k8s-config/ # order-service, inventory-service
team-payments/k8s-config/ # payment-service, billing-service
platform/k8s-config/ # infrastructure, shared servicesThe GitOps controller watches multiple repositories. This scales better for large organizations with distinct team ownership but makes cross-cutting changes harder.
App Repo + Config Repo#
Application code lives in one repository. Kubernetes manifests live in a separate config repo. The CI pipeline in the app repo builds the image, pushes it, and then opens a PR or commits the new image tag to the config repo:
# App repo: company/order-service
src/
Dockerfile
.github/workflows/build.yml # builds image, updates config repo
# Config repo: company/k8s-config
apps/order-service/
base/deployment.yaml # image tag updated by CIThis separates concerns cleanly. Application developers do not need access to cluster configuration. The config repo becomes the deployment audit trail.
Environment Promotion#
Directory-Based with Kustomize#
The most common pattern. A base/ directory contains the canonical manifests. Overlay directories patch per-environment values:
# apps/order-service/base/kustomization.yaml
resources:
- deployment.yaml
- service.yaml
# apps/order-service/overlays/staging/kustomization.yaml
resources:
- ../../base
patches:
- target:
kind: Deployment
name: order-service
patch: |
- op: replace
path: /spec/replicas
value: 1
images:
- name: order-service
newTag: "abc123-staging"
# apps/order-service/overlays/production/kustomization.yaml
resources:
- ../../base
patches:
- target:
kind: Deployment
name: order-service
patch: |
- op: replace
path: /spec/replicas
value: 5
images:
- name: order-service
newTag: "v2.3.1"Promotion from staging to production means updating the image tag in the production overlay – a simple commit or PR.
PR-Based Promotion#
Merge to main deploys to staging automatically. Promotion to production requires a PR to a release branch (or a specific path in the same branch). The PR serves as the approval gate. Reviewers verify the staging deployment is healthy before approving.
Branch-based promotion (different branches per environment) is less recommended because it leads to merge conflicts and divergent state between branches.
Secrets in GitOps#
Plaintext secrets cannot be committed to git. Three approaches:
Sealed Secrets#
Sealed Secrets uses asymmetric encryption. You encrypt secrets with the cluster’s public key using kubeseal. Only the Sealed Secrets controller in the cluster can decrypt them:
# Encrypt a secret
kubectl create secret generic db-creds \
--from-literal=password=hunter2 \
--dry-run=client -o yaml | \
kubeseal --format yaml > sealed-db-creds.yaml# sealed-db-creds.yaml -- safe to commit
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: db-creds
spec:
encryptedData:
password: AgBy3i4OJSWK+PiTySYZZA9rO...Sealed Secrets are cluster-specific. A secret sealed for cluster A cannot be unsealed by cluster B. This is a security feature but adds complexity for multi-cluster setups.
SOPS with age or KMS#
Mozilla SOPS encrypts individual values within YAML files. Flux has native SOPS support:
# secrets.yaml (encrypted in-place, safe to commit)
apiVersion: v1
kind: Secret
metadata:
name: db-creds
stringData:
password: ENC[AES256_GCM,data:abc123...,iv:...,tag:...]SOPS supports age (local keys), AWS KMS, GCP KMS, and Azure Key Vault for key management. Flux decrypts SOPS-encrypted files during reconciliation.
External Secrets Operator#
No secrets in git at all. The External Secrets Operator reads secrets from an external source (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault) and creates Kubernetes Secrets:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-creds
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: db-creds
data:
- secretKey: password
remoteRef:
key: secret/data/production/database
property: passwordThe ExternalSecret resource is safe to commit. It contains no secret data – only a reference to where the secret lives. This is the most secure option for production but requires an external secrets management system.
Drift Detection and Remediation#
Drift occurs when someone runs kubectl apply, kubectl edit, or kubectl scale directly against the cluster, bypassing git.
ArgoCD: Shows the application as OutOfSync in the UI and API. If selfHeal: true is set in the sync policy, ArgoCD reverts the manual change automatically. If not, it shows the diff and waits for a manual sync.
Flux: Reconciles automatically on its interval (typically 1-5 minutes). Manual changes are overwritten on the next reconciliation. There is no “show me the diff” UI – the reconciliation is automatic.
Both approaches have tradeoffs. ArgoCD’s explicit sync gives visibility but requires action. Flux’s automatic reconciliation is simpler but can surprise operators who make deliberate manual changes during an incident.
Image Update Automation#
Instead of manually updating image tags in git, automate it:
Flux: The ImageRepository and ImagePolicy CRDs watch a container registry for new tags and update the git repository automatically:
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
name: order-service
spec:
imageRepositoryRef:
name: order-service
policy:
semver:
range: ">=2.0.0 <3.0.0"Flux commits the new image tag to the git repo, creating a full audit trail.
ArgoCD Image Updater: A separate component that watches registries and updates ArgoCD Application resources or commits to git.
Multi-Cluster GitOps#
Managing many clusters from git:
ArgoCD ApplicationSets: Generate Applications dynamically from a template:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: order-service
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'order-service-{{name}}'
spec:
source:
repoURL: https://github.com/company/k8s-config.git
path: 'apps/order-service/overlays/{{metadata.labels.env}}'
destination:
server: '{{server}}'
namespace: productionThis generates one Application per cluster matching the label selector. Add a new production cluster, and it automatically gets order-service deployed.
Flux: Each cluster runs its own Flux instance. Use a shared git repo with per-cluster Kustomization paths (clusters/us-east-1/, clusters/eu-west-1/). Shared infrastructure goes in a common path that all clusters reference.
Common Gotchas#
CRD ordering: If your application uses custom resources, the CRDs must be applied before the resources that reference them. In ArgoCD, use sync waves (argocd.argoproj.io/sync-wave: "-1" on CRDs). In Flux, use dependsOn to ensure the CRD Kustomization is applied before the application Kustomization.
Helm release state drift: Helm stores release state in Kubernetes secrets. When a GitOps controller manages Helm releases, manual helm upgrade commands create conflicting state. The controller’s next reconciliation may fail or produce unexpected results. The rule is simple: if GitOps manages a Helm release, never run helm upgrade manually.
Repository access and authentication: GitOps controllers need read access to your git repositories. Use deploy keys (SSH) with read-only access, not personal access tokens. Rotate credentials periodically.
Large repositories and sync performance: GitOps controllers clone your repo on every reconciliation interval. Monorepos with large binary files or deep history slow down the sync loop. Use shallow clones, keep binary artifacts out of the config repo, and set reasonable reconciliation intervals (1-5 minutes, not 10 seconds).