Running 7 Helm-Managed Services on One Kubernetes Cluster: A Cross-Cutting Survey

A single-node Kubernetes cluster running seven Helm-managed services concurrently — Gitea, Mattermost, PostgreSQL, kube-prometheus-stack, Jenkins, Temporal, and NATS — looks tractable on paper. The charts are all upstream-maintained. The hardware is modest but adequate. The operational reality is that zero of the seven ran cleanly on out-of-the-box values. Every chart needed at least one customization to coexist with the others, and several needed substantial rewrites of the helm-values surface. This survey catalogs what those customizations are, why each was necessary, and what the common failure modes look like across the fleet.

The frame is “production-shape services on a small cluster” rather than “production cluster, scaled down.” The trade-offs are different. HA dependencies that are free on a multi-node cluster are pure overhead on a single node. Backup discipline that is automatic in a managed offering is a per-service homework assignment. Resource budgets that nobody thinks about at scale dominate every chart-version decision.

A survey of 7 helm-managed services#

The table is the article’s anchor. Every column is load-bearing.

#	Service	Helm chart	Chart version	ARM64-friendly?	Custom image needed?	Backup mechanism	Per-service customization required	Major gotchas hit
1	Gitea	`gitea-charts/gitea`	12.5.3	yes (rootless image is multi-arch)	no	dedicated cron-driven repo backup script	disable bundled redis-cluster + redis + postgresql + postgresql-ha; point at external PG; admin user inline in values	bundled HA deps light up by default and exhaust resources unless explicitly disabled
2	Mattermost	`mattermost/mattermost-team-edition`	6.6.96	NO — upstream is amd64-only	YES — built locally from MM ARM64 binary tarball; tag `your-registry/mattermost-team-edition:10.5.0-arm64`	none (PV snapshot only)	external PG via env vars + `configJSON`; bot accounts + access tokens enabled; `SiteURL` fixed	upstream image lacks ARM64 manifest; QEMU fallback runs but Go runtime crashes (`lfstack.push`); must bake own image
3	PostgreSQL	`bitnami/postgresql`	18.6.2 (multiple revisions)	yes	no	none yet — backlog gap	`initdb.scripts.init-all-dbs.sql` creates 4 databases + 4 service roles on first boot; `architecture: standalone`	Bitnami chart wants HA by default; standalone + initdb is the multi-tenant pattern; PG15+ schema permissions trap
4	kube-prometheus-stack	`prometheus-community/kube-prometheus-stack`	84.3.0	yes	no	none (recreate from values)	disable `kubeEtcd` + `kubeControllerManager` (minikube static pods, no scrape endpoint); disable Grafana persistence (emptyDir for dev); disable `NodeClockNotSynchronising` + `AlertmanagerClusterCrashlooping` default rules; full alertmanager routing config inline	`mattermost_configs` receiver has no `title:` field — operator config validator silently rejects; chart-default rules produce single-node false-positives; `--reuse-values` silently ignores `-f`
5	Jenkins	`jenkins/jenkins`	5.9.18	yes	YES — `your-registry/jenkins:latest` with plugins pre-baked via `jenkins-plugin-cli`	PV snapshot only	`controller.installPlugins: []` to skip broken plugin-copy step; JCasC inline (Gitea server, credentials, shared library, organization folder, kubernetes cloud); Docker socket hostPath mount; runs as root for socket access	helm chart’s `apply_config.sh` has a broken `yes n \| cp -i ...` line that crashes the pod when `installPlugins` is non-empty; pre-baking is the only stable path
6	Temporal	`temporalio/temporal`	0.74.0	yes	no	none (state in PG)	`configMapsToMount: "sprig"` + `setConfigFilePath: true` (chart default `dockerize` leaves `{{ .Env.* }}` unrendered); external PG with separate `temporal` + `temporal_visibility` DBs; disable bundled cassandra/mysql/postgresql/elasticsearch/prometheus/grafana	chart 0.74.0 ships server image 1.30.3 which uses sprig, not dockerize — default values produce `Cassandra.Hosts: zero value` error; chart 1.x restructures persistence keys (pinned to 0.74.0)
7	NATS	`nats/nats`	2.12.6	yes	no	none (ephemeral pub/sub)	`cluster.enabled: false`; `jetstream.enabled: false` (no persistence); single replica; tight memory cap	resource block lives under `container.merge.resources` not top-level `resources` — easy to miss

Every customization in the rightmost column is mandatory for that chart to coexist with the others on a single node. The chart versions are pinned because the gotchas are version-specific — Temporal 1.x restructures persistence keys, the Jenkins apply_config.sh bug surfaces in 5.x, and the kube-prometheus-stack alertmanager schema validation tightened in the 80.x series. Re-verify against the chart you actually install.

Cross-cutting patterns#

Every chart needs at least one customization#

Of the seven surveyed, zero ran cleanly on default values. The reasons cluster into five categories:

HA dependencies defaulting on: Gitea bundles redis-cluster + redis + postgresql + postgresql-ha. Mattermost bundles MySQL. Temporal bundles cassandra + mysql + elasticsearch + prometheus + grafana. Each one consumes 256-512 MiB of requests before doing any useful work.
Resource defaults too aggressive for a shared single node: kube-prometheus-stack at defaults requests over 4 GiB by itself. Jenkins requests 2 GiB. Helm-default resources: blocks across these seven charts sum to over 16 GiB requests — the cluster won’t schedule a single application pod.
Architecture mismatch: Mattermost ships only amd64. QEMU user-mode emulation runs the binary but the Go runtime crashes on lfstack.push. The fix is rebuilding from the ARM64 binary tarball — see building ARM64 container images when upstream doesn’t ship them and Kubernetes on Apple Silicon setup gotchas.
Chart bugs and broken install paths: Jenkins’s plugin install in apply_config.sh crashes the pod. Temporal 0.74.0 ships a server image that uses sprig templates while the chart’s defaults assume dockerize. Both are workarounds-required, not fixable from the outside.
Single-node false-positive alerts: kube-prometheus-stack ships rules for VM clock drift and alertmanager cluster crashlooping that fire constantly on a single-node minikube cluster. They have to be disabled at install time or alert fatigue arrives within hours.

The one-line takeaway: plan a values file before you helm install. Treating helm-defaults as a starting point on a small cluster guarantees a wedge state.

Helm defaults assume multi-node production#

Every “free” HA dependency in a helm chart is a memory tax on a single-node cluster. Bundled redis is free on a 5-node cluster because nothing else needs that 256 MiB. On a single node sharing memory with PostgreSQL, Prometheus, Grafana, the application pods themselves, and the kubelet, that 256 MiB has to come from somewhere. The shape of the trade-off is identical for every chart surveyed: the bundled dependency is convenient, defaults to on, and is the first thing to disable.

The principle generalizes: read the values.yaml top to bottom before installing any chart on a constrained cluster. Look for enabled: true on anything labeled redis, postgresql, mysql, cassandra, elasticsearch, or prometheus. Most of them want to be off.

Resource sizing is a budget, not a target#

Helm resources: blocks across these seven charts default to numbers that assume infinite headroom. The actual budget is finite and shared. The discipline is to set requests and limits per service that sum to less than the cluster’s allocatable memory, with headroom for application workloads. See the resource budgeting section below for concrete numbers.

Authentication has three layers#

The seven services together use three distinct credential patterns, and treating them uniformly leads to mistakes:

Helm-values inline admin credentials (Gitea, Jenkins, Grafana). Convenient. Leaks into git history. Fine for dev clusters with a <dev-password> placeholder; use existingSecret references for any cluster reachable from outside localhost.
Per-service service users (PostgreSQL roles, Mattermost user, Gitea user). Set via initdb scripts or chart configJSON. Survive helm upgrades. Don’t change unless you intend to.
Per-app tokens (Gitea API tokens, Mattermost bot tokens, Slack/Mattermost webhook URLs). Always live in K8s Secrets, mounted via secrets: (the alertmanager pattern) or env-from-secret. Never inline in helm-values, even in dev — they tend to outlive the dev cluster.

Chart structure varies — read it before customizing#

Each chart organizes its values differently, and the difference matters when overriding:

NATS puts container resources under container.merge.resources, not top-level resources. Setting top-level does nothing.
Mattermost uses both extraEnvVars AND configJSON for the same SQL settings. Both must agree or the pod refuses to start.
Temporal nests every server component (frontend, history, matching, worker) with its own replicaCount and resources. Setting one doesn’t set the others.
kube-prometheus-stack has prometheus.prometheusSpec.resources (the CRD-mode operator-managed pod) AND prometheusOperator.resources (the operator pod itself). They are separate budgets.

helm get values <release> after install is the only reliable way to confirm that an override took effect. See helm gotchas: reuse-values, revisions, rollback for why this matters.

Single-node-specific overrides#

A consistent set of overrides applies across charts when the target is single-node:

Disable kubeEtcd and kubeControllerManager scrapes (minikube runs them as static pods with no scrape endpoint).
Disable NodeClockNotSynchronising rule (minikube/Docker Desktop VMs drift constantly; the alert is a false positive).
Disable AlertmanagerClusterCrashlooping (single replica means no cluster, the rule fires forever).
Set imagePullPolicy: Never for any locally-built image.
Disable initChownData for Grafana — minikube hostPath PVs don’t need it.

These are not generic best practices; they’re single-node-specific. Carry them forward as a checklist for any chart added to the same cluster.

Resource budgeting under a memory cap#

Pick a memory cap that matches the hardware. The example below uses 24 GiB — appropriate for a Mac mini-class workstation running Docker Desktop. The arithmetic is the same for any cap.

service              requests (cpu / mem)   limits (mem)
gitea                100m / 128Mi           512Mi
mattermost           250m / 512Mi           1Gi
postgresql           250m / 512Mi           2Gi
prometheus           200m / 512Mi           1Gi
grafana              100m / 128Mi           512Mi
alertmanager          50m /  64Mi           256Mi
prom-operator        100m / 256Mi           512Mi
jenkins              250m / 1Gi             2Gi
temporal (4 svc)     400m / 768Mi (sum)     1.5Gi (sum)
nats                  50m /  64Mi           128Mi
TOTAL requests:    ~1750m CPU / ~4.2 GiB memory
TOTAL limits:      ~9.4 GiB memory

The 4.2 GiB requests sum is what the cluster scheduler reserves before any application workload arrives. With a 24 GiB cap, that leaves roughly 20 GiB for application pods plus minikube node overhead — enough headroom for a meaningful workload. Going to helm-defaults on every chart blows past 16 GiB of requests alone, before any application pod is scheduled. The cluster wedges.

The arithmetic forces three decisions early:

Requests must be tight. The number that gets reserved is requests, not limits. Default requests are usually conservative for production and wasteful for single-node. Halve them and watch behavior under load before halving again.
Limits should reflect peak, not average. PostgreSQL at idle uses 100 MiB; under a backfill query it uses 1.5 GiB. The limits slot exists to allow that peak without OOMKilling.
Multi-tenant > N instances. A single Bitnami PostgreSQL with initdb creating four databases + four roles uses 512 MiB. Four chart-bundled PostgreSQL instances use 4 × 512 MiB. The math forces consolidation.

The multi-tenant PG pattern looks like:

-- initdb.scripts.init-all-dbs.sql (excerpt)
SELECT 'CREATE DATABASE temporal'
  WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'temporal')\gexec
-- ...repeated for temporal_visibility, mattermost, gitea, etc.

DO $$
BEGIN
  IF NOT EXISTS (SELECT FROM pg_roles WHERE rolname = 'temporal') THEN
    CREATE ROLE temporal WITH LOGIN PASSWORD '<dev-password>';
  END IF;
  -- ...repeated per service
END $$;

GRANT ALL PRIVILEGES ON DATABASE temporal TO temporal;
-- ...repeated

The trade-off accepted: shared PostgreSQL is also the single point of failure. That’s acceptable for dev/homelab and unacceptable for any production posture. Plan the migration to per-service or HA PG before the cluster carries production traffic.

Common failure modes and what they tell you#

The same five failure signatures recur across the surveyed charts. Recognizing the signature shortcuts diagnosis.

CrashLoopBackOff#

Back-off restarting failed container <name> in pod <pod>

Three common causes, each with a distinctive log signature:

Image arch mismatch (Mattermost-class). The pod starts. The runtime aborts deep in Go’s lock-free stack:

runtime: failed to create new OS thread (have N already; errno=22)
fatal error: lfstack.push

There is no fix from the outside. Build a native ARM64 image and reference it.

OOMKilled. The limit is too low for steady-state:

State: Terminated, Reason: OOMKilled, ExitCode: 137

kubectl describe pod confirms the reason. Either raise the limit or find what’s consuming the unexpected memory. PostgreSQL after a schema-change run, Jenkins after a long build queue, Prometheus after a series-cardinality spike — all common culprits.

Config error from a chart bug. Jenkins shows:

apply_config.sh: line N: cp: cannot stat ...

— the broken plugin-copy path triggered by a non-empty installPlugins. Fix: empty the list, pre-bake plugins into the image. Temporal shows:

Persistence.DataStores[default].Cassandra.Hosts: zero value

— the chart-default dockerize leaves {{ .Env.CASSANDRA_HOSTS }} unrendered because the server image uses sprig. Fix: configMapsToMount: "sprig" and setConfigFilePath: true.

ImagePullBackOff and ErrImageNeverPull#

Failed to pull image "your-registry/mattermost-team-edition:10.5.0-arm64": ... not found

For locally-built images on minikube, two causes dominate:

imagePullPolicy not set to Never (or IfNotPresent). Kubernetes tries the registry, gets nothing, fails.
eval $(minikube docker-env) was not run before docker build. The image landed in the host Docker daemon, not minikube’s. docker images from the wrong context confirms it.

Pending pods, no schedulable node#

0/1 nodes are available: 1 Insufficient memory.

Single node + every chart at helm-default = wedge state. Diagnose with:

kubectl describe nodes | grep -A 10 "Allocated resources"

The fix is to lower requests, not raise the cap. Raising the cap pushes the same problem out by one dependency.

Operator silent rejection of alertmanager config#

The smoking gun is in the operator logs, not the alertmanager logs. Alerts never deliver, alertmanager looks healthy, the chart shows deployed. The operator is rejecting the config:

Sync error: failed to apply alertmanager config: unknown field "title" in mattermost_configs

The alertmanager pod runs the previous valid config and accepts no updates. The fix is to remove the offending field — title does not exist in mattermost_configs; fold any title into the text: body. See Prometheus stack alertmanager operations for the deeper dive.

`helm upgrade --reuse-values` silently ignoring `-f`#

helm upgrade <release> ... --reuse-values -f values.yaml   # WRONG: -f is silently ignored
helm upgrade <release> ... -f values.yaml                  # CORRECT

No warning printed. No error. The chart redeploys with the previous values. Always verify with:

helm get values <release> -n <namespace>

This trap accounts for a disproportionate share of “I changed the values and nothing happened” debugging sessions. See helm gotchas: reuse-values, revisions, rollback.

Backup discipline as a per-service problem#

Backup posture across the seven services is uneven, and the unevenness is itself worth naming.

Service	Backup status
Gitea	dedicated cron-driven script (best in fleet)
PostgreSQL	gap — no scheduled dumps; PV snapshot only
Mattermost	gap — file uploads on PV, no off-cluster copy
Jenkins	gap — `JENKINS_HOME` on PV, plugins re-bakeable but jobs are not
Prometheus	acceptable — TSDB recoverable from rules
Temporal	partial — workflow state in PG (covered when PG is)
NATS	n/a — ephemeral

The PostgreSQL gap is the most consequential. Five of the seven services depend on shared PostgreSQL for state. A PG loss takes Temporal workflow history, Mattermost messages, Gitea metadata, application data, and any service-specific data with it. The best-case backup posture across the fleet is exactly as good as PostgreSQL’s, and PG has no scheduled dumps yet.

The general lesson: backups are a per-service discipline, not a per-cluster one. “We snapshot the volumes” papers over the question. Per-service it becomes “what is the recovery procedure for THIS service’s state?” The PV-snapshot answer rarely survives that translation. See single-node Kubernetes disaster recovery for the recovery-procedure side of the same problem.

When to vendor your own image#

Five of the seven services run upstream images. Two — Mattermost and Jenkins — required vendoring. The decision pattern:

Service	Choice	Why
Gitea	upstream (rootless)	publishes ARM64; rootless avoids permission grief on hostPath PVs
Mattermost	vendor own	no ARM64 image upstream; QEMU emulation crashes Go runtime; only path is rebuild from binary tarball
PostgreSQL	upstream (Bitnami)	publishes multi-arch; chart is mature; standalone mode well-supported
kube-prometheus-stack	upstream	massive chart with deep CRD coupling; forking would mean fork-forever
Jenkins	vendor own	bundled plugin-install step in `apply_config.sh` is broken; pre-baking via `jenkins-plugin-cli` is upstream-recommended for prod anyway
Temporal	upstream (pinned to 0.74.0)	chart works after `configMapsToMount: sprig` flip; 1.x major restructure deferred
NATS	upstream	small, simple, just works

The vendor-own decision criterion has two halves: (a) upstream doesn’t ship the architecture you need, OR (b) the chart’s runtime install path is broken in a way that’s not fixable from the outside. Mattermost is case (a). Jenkins is case (b). Both produce a Dockerfile that’s measured in tens of lines, not hundreds:

# Mattermost ARM64 (sketch)
FROM ubuntu:22.04
ARG MM_VERSION=10.5.0
RUN curl -L https://releases.mattermost.com/${MM_VERSION}/mattermost-${MM_VERSION}-linux-arm64.tar.gz \
    | tar xz -C /opt/
# ...user, entrypoint, etc.

# Jenkins with pre-baked plugins
FROM jenkins/jenkins:lts
COPY plugins.txt /usr/share/jenkins/ref/
RUN jenkins-plugin-cli --plugin-file /usr/share/jenkins/ref/plugins.txt

The decision NOT to fork the helm chart matters as much as the decision to vendor the image. Every service except Mattermost and Jenkins fits in fewer than 60 lines of values.yaml. Forking trades a 50-line values file for a chart you now maintain. Chart-version drift outpaces a fork’s value within two or three upstream releases.

Anti-patterns#

A handful of patterns recur often enough to be worth naming as anti-patterns:

“Use the helm chart’s bundled PostgreSQL.” Fine for one service. Deadly across seven. Multi-tenant single PG with initdb creating per-service databases halves storage requests and gives a single backup target.
“Set --reuse-values because -f should be additive.” Silent override. Always verify with helm get values.
“Skip the ARM64 check, QEMU will handle it.” Works for shell utilities. Fails on Go binaries. The crash signature is lfstack.push deep in the Go runtime; there is no application-level fix.
“Install Jenkins plugins at runtime via the helm chart’s installPlugins:.” The chart’s apply_config.sh is broken. Pre-bake plugins into the image.
“Trust the alertmanager config validator.” The operator silently rejects unknown fields. Verify by tailing operator logs after every config change.
“Helm-default resources: are sane defaults.” They’re sane for production multi-node clusters. On a single node they sum to a wedge state.

Quotable lessons#

Every Helm chart needs at least one customization on a single-node cluster. Plan a values file before you helm install.
Helm defaults are written for production multi-node clusters. On a small cluster every “free” HA dependency is a memory tax.
If a service publishes no ARM64 image, you’ll be vendoring your own. There is no QEMU shortcut for Go binaries.
A multi-tenant PostgreSQL with initdb scripts beats N bundled PG instances by an order of magnitude in memory cost.
Backups are a per-service discipline, not a per-cluster one. Track each service’s plan separately or it slips.
When helm upgrade doesn’t take effect, check helm get values first. --reuse-values silently overrides -f.
Pre-bake Jenkins plugins. The Helm chart’s runtime install path is fragile and reduces every deploy to a coin flip.

Where this article fits#

This is the meta-survey: seven services side-by-side, the cross-cutting patterns that show up only when they’re operated together, and the failure modes that span charts. For per-service depth:

Self-hosting Gitea on Kubernetes — chart 1, the rootless image and external-PG pattern.
Building ARM64 container images when upstream doesn’t ship them — chart 2, the Mattermost custom-image build.
Prometheus stack alertmanager operations — chart 4, the alertmanager routing and validator-rejection trap.
Helm gotchas: reuse-values, revisions, rollback — the cross-cutting helm operational patterns.
Kubernetes on Apple Silicon setup gotchas — the substrate this whole survey runs on.
Single-node Kubernetes disaster recovery — the backup-and-recovery posture the gap analysis above demands.

Read this first to understand the shape of the problem; read the per-service articles when a specific chart needs depth.