ArgoCD Multi-Cluster Management#
A single ArgoCD instance can manage deployments across dozens of Kubernetes clusters. This is one of ArgoCD’s strongest features and the standard approach for organizations with multiple environments, regions, or cloud providers.
Hub-Spoke Architecture#
The standard multi-cluster pattern runs ArgoCD on one “hub” cluster that deploys to multiple “spoke” clusters:
Hub Cluster (management)
├── ArgoCD control plane
├── Application/ApplicationSet definitions
├── RBAC policies
└── Cluster credentials (Secrets)
│
├──→ Spoke Cluster: dev (us-east-1)
├──→ Spoke Cluster: staging (us-west-2)
├──→ Spoke Cluster: prod-us (us-east-1)
├──→ Spoke Cluster: prod-eu (eu-west-1)
└──→ Spoke Cluster: prod-apac (ap-southeast-1)ArgoCD on the hub cluster connects to each spoke cluster’s API server to apply manifests and check health. The spoke clusters do not need ArgoCD installed.
Why Hub-Spoke#
- Single pane of glass. All applications across all clusters are visible and manageable from one ArgoCD UI and CLI.
- Centralized RBAC. One set of policies controls who can deploy what to which cluster.
- Consistent configuration. ApplicationSets can deploy the same stack to every cluster with per-cluster overrides.
- Simpler operations. One ArgoCD instance to upgrade, back up, and monitor instead of one per cluster.
When to Use Multiple ArgoCD Instances#
Hub-spoke has limits. Consider separate ArgoCD instances per cluster when:
- Network isolation is required. The hub cannot reach spoke API servers (air-gapped environments, strict network policies).
- Blast radius concerns. An ArgoCD outage on the hub stops deployments to every cluster. Separate instances limit the impact.
- Regulatory requirements. Some compliance frameworks require deployment tooling to run within the same trust boundary as the workloads.
- Scale. A single ArgoCD instance managing thousands of applications across many clusters can hit performance limits in the application controller.
A common hybrid pattern is one ArgoCD per region or per trust boundary, each managing a subset of clusters.
Registering External Clusters#
Using the CLI#
The simplest method uses the ArgoCD CLI, which reads your kubeconfig:
# Make sure your kubeconfig has the target cluster context
kubectl config get-contexts
# Register the cluster
argocd cluster add prod-us-east-1 --name prod-us
# Verify
argocd cluster listThis creates a ServiceAccount and ClusterRoleBinding on the target cluster and stores the credentials as a Secret in the ArgoCD namespace. The default role is cluster-admin, which ArgoCD needs to manage all resource types.
Using Declarative Secrets#
For GitOps-managed cluster registration, define cluster secrets directly:
apiVersion: v1
kind: Secret
metadata:
name: prod-us-cluster
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
stringData:
name: prod-us
server: https://k8s-api.prod-us.example.com
config: |
{
"bearerToken": "<service-account-token>",
"tlsClientConfig": {
"insecure": false,
"caData": "<base64-encoded-ca-cert>"
}
}This approach lets you manage cluster registrations in Git (encrypted with Sealed Secrets or External Secrets Operator), but you must create the ServiceAccount and RBAC on the target cluster separately.
ServiceAccount on the Target Cluster#
Create a dedicated ServiceAccount for ArgoCD on each spoke cluster:
apiVersion: v1
kind: ServiceAccount
metadata:
name: argocd-manager
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: argocd-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: argocd-manager
namespace: kube-systemFor least-privilege, replace cluster-admin with a custom ClusterRole that only allows the resource types ArgoCD needs to manage. This is more work to maintain but limits exposure if the ArgoCD credentials are compromised.
AWS EKS Clusters#
EKS uses IAM for authentication. Create an IAM role that ArgoCD can assume and map it in the EKS aws-auth ConfigMap:
stringData:
name: prod-eks
server: https://ABCDEF1234.gr7.us-east-1.eks.amazonaws.com
config: |
{
"awsAuthConfig": {
"clusterName": "prod-eks",
"roleARN": "arn:aws:iam::123456789012:role/argocd-manager"
}
}The ArgoCD application controller pods need IAM credentials (via IRSA or instance profile) that can assume the target role.
Cluster Labels#
Add labels to cluster secrets for use with ApplicationSet cluster generators:
metadata:
labels:
argocd.argoproj.io/secret-type: cluster
env: production
region: us-east-1
cloud: awsThese labels are what ApplicationSet cluster generators use for matchLabels selectors.
Fleet Deployments with ApplicationSets#
Deploy to All Production Clusters#
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: monitoring
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'monitoring-{{name}}'
spec:
project: infrastructure
source:
repoURL: https://github.com/myorg/gitops-config.git
targetRevision: main
path: monitoring/base
destination:
server: '{{server}}'
namespace: monitoring
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueEvery cluster with env: production gets the monitoring stack. Register a new production cluster with that label and the monitoring stack deploys automatically.
Per-Cluster Overrides#
Use the matrix generator to combine clusters with per-cluster values from Git:
gitops-config/
monitoring/
base/
kustomization.yaml
prometheus-values.yaml
overlays/
prod-us/
kustomization.yaml # patches for US cluster
prod-eu/
kustomization.yaml # patches for EU cluster (different retention, etc.)spec:
generators:
- matrix:
generators:
- clusters:
selector:
matchLabels:
env: production
- git:
repoURL: https://github.com/myorg/gitops-config.git
revision: main
directories:
- path: monitoring/overlays/*
template:
metadata:
name: 'monitoring-{{path.basename}}'
spec:
source:
path: '{{path}}'
destination:
server: '{{server}}'
namespace: monitoringThis creates one Application per cluster-overlay combination, deploying environment-specific monitoring configs.
Cluster-Scoped Resources#
Some resources (Namespaces, ClusterRoles, CRDs) are cluster-scoped and need special handling. AppProjects control which cluster-scoped resources applications can create:
spec:
clusterResourceWhitelist:
- group: ''
kind: Namespace
- group: rbac.authorization.k8s.io
kind: ClusterRole
- group: rbac.authorization.k8s.io
kind: ClusterRoleBindingWithout this whitelist, applications cannot create cluster-scoped resources even if ArgoCD has the RBAC to do so.
Network Connectivity#
The hub cluster’s ArgoCD pods must be able to reach each spoke cluster’s Kubernetes API server (typically port 443 or 6443).
Same VPC / VNet#
If hub and spoke clusters are in the same network, the API servers are directly reachable via internal DNS or IP. No special networking needed.
Cross-VPC / Cross-Cloud#
For clusters in different VPCs, regions, or clouds:
- VPC Peering — Direct network link between two VPCs. Simple but does not scale beyond a handful of connections.
- Transit Gateway (AWS) / Virtual WAN (Azure) — Hub-and-spoke network topology at the cloud layer.
- WireGuard / Tailscale mesh — Overlay network connecting clusters across cloud boundaries. Works well for multi-cloud setups.
- Public API endpoint with IP allowlisting — EKS, AKS, and GKE can expose API servers publicly. Restrict access to the hub cluster’s egress IPs.
Credential Refresh#
Cluster credentials can expire. For EKS with IRSA, token refresh is automatic. For static bearer tokens, set up a rotation process. Monitor the ArgoCD cluster connection status:
argocd cluster list
# Look for connection status errors
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller --tail=50 | grep -i "error.*cluster"Scaling Considerations#
Application Controller Sharding#
For large fleets, shard the application controller so each replica manages a subset of clusters:
controller:
replicas: 3
env:
- name: ARGOCD_CONTROLLER_REPLICAS
value: "3"ArgoCD distributes clusters across controller replicas using a hash ring. Each replica only reconciles the applications targeting its assigned clusters.
Repo Server Caching#
The repo server clones and renders manifests. With many applications pulling from the same repos, increase the repo server replicas and cache:
repoServer:
replicas: 3
env:
- name: ARGOCD_EXEC_TIMEOUT
value: "180"Rate Limiting#
ArgoCD reconciles all applications periodically (default 3 minutes). For large fleets, this creates bursts of API calls to spoke clusters. Increase the reconciliation interval for stable applications:
# Per-application
metadata:
annotations:
argocd.argoproj.io/refresh: "10m"Common Mistakes#
- Using cluster-admin on spoke clusters without considering blast radius. A compromised ArgoCD instance with cluster-admin on every production cluster is a catastrophic scenario. Use least-privilege ServiceAccounts and restrict what ArgoCD projects can deploy.
- Not labeling clusters consistently. ApplicationSet cluster generators depend on labels. Inconsistent labeling means inconsistent deployments. Define a labeling schema early (env, region, cloud, team) and enforce it.
- Ignoring API server connectivity before registering clusters. Register a cluster, create applications, and then discover the network path does not exist. Test connectivity first:
kubectl --context=hub exec -n argocd deployment/argocd-server -- curl -k https://<spoke-api-server>/healthz. - Running one ArgoCD for hundreds of clusters without sharding. The application controller becomes a bottleneck. Enable sharding when managing more than 20-30 clusters or 500+ applications.
- Not monitoring cluster credential expiry. A silently expired token means ArgoCD cannot sync to that cluster. Set up alerts for cluster connection failures.