Folder Structure Strategy#

Grafana folders organize dashboards and control access through permissions. The folder structure you choose determines how teams find dashboards and who can edit them. Three patterns work in practice, each suited to a different organizational shape.

By Team#

When teams own distinct services and rarely need cross-team dashboards:

Platform/
  Node Overview
  Kubernetes Cluster
  Networking
Backend/
  API Gateway
  User Service
  Payment Service
Frontend/
  Web Vitals
  CDN Performance
Data/
  Kafka Pipelines
  ETL Jobs
  Data Quality

Each team gets Editor access to their folder and Viewer access to everything else. This works well when ownership boundaries are clear.

By Environment Layer#

When dashboards naturally group by the layer of the stack they monitor:

Infrastructure/
  Node Exporter
  Disk I/O
  Network Interfaces
Kubernetes/
  Cluster Overview
  Namespace Resources
  Pod Lifecycle
Applications/
  Service RED Metrics
  gRPC Performance
  Background Jobs
Business/
  Revenue Metrics
  User Signups
  Feature Adoption

This pattern works when a platform team owns infrastructure and Kubernetes dashboards, while product teams own application and business dashboards.

By Service (Microservices)#

For microservice architectures where each service team manages their own observability:

auth-service/
  Overview
  Detailed Latency
  Error Breakdown
checkout-service/
  Overview
  Payment Provider Latency
  Cart Abandonment
inventory-service/
  Overview
  Stock Levels
  Supplier API Health

Each service folder maps to a team’s on-call responsibility. When paged, the engineer opens their service folder and has everything they need.

RBAC and Permissions#

Built-in Roles#

Grafana has three organization-level roles:

  • Viewer: Can see dashboards and query data sources but cannot save changes or create dashboards.
  • Editor: Can create and modify dashboards, create folders, and manage alert rules.
  • Admin: Full control including user management, data source configuration, and organization settings.

Permission Hierarchy#

Permissions cascade: Organization level sets the floor, folder level can elevate, and dashboard level can elevate further. You cannot restrict below the org-level grant.

Org Role: Viewer (everyone can view everything)
  Platform/ folder: Editor for platform-team
    Specific dashboard: Admin for platform-lead
  Backend/ folder: Editor for backend-team

Configure folder permissions through the Grafana UI (folder settings) or the API:

# Grant Editor access to a team on a folder via the API
curl -X POST http://localhost:3000/api/folders/platform/permissions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $GRAFANA_TOKEN" \
  -d '{
    "items": [
      {"teamId": 3, "permission": 2}
    ]
  }'
# permission: 1 = Viewer, 2 = Editor, 4 = Admin

Service Accounts#

Service accounts provide API access for automation without tying credentials to a human user. Create them for CI/CD pipelines, provisioning scripts, and external integrations.

# Create a service account
curl -X POST http://localhost:3000/api/serviceaccounts \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"name": "ci-provisioner", "role": "Editor"}'

# Create a token for the service account
curl -X POST http://localhost:3000/api/serviceaccounts/1/tokens \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"name": "ci-token"}'

Team-Based Access#

Create teams that mirror your organizational structure, then assign folder permissions to teams rather than individuals:

# Create a team
curl -X POST http://localhost:3000/api/teams \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"name": "backend-team"}'

# Add a user to the team
curl -X POST http://localhost:3000/api/teams/1/members \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"userId": 5}'

When someone joins or leaves a team, updating team membership automatically adjusts their dashboard access across all folders.

Dashboard Provisioning#

File-Based Provisioning#

Grafana watches a provisioning directory and loads dashboards from JSON files on disk. Configure the provider:

# grafana/provisioning/dashboards/default.yml
apiVersion: 1
providers:
  - name: default
    orgId: 1
    folder: "Infrastructure"
    type: file
    disableDeletion: true
    editable: false        # prevents UI edits to provisioned dashboards
    updateIntervalSeconds: 30
    options:
      path: /var/lib/grafana/dashboards/infrastructure
      foldersFromFilesStructure: true

Setting editable: false makes provisioned dashboards read-only in the UI. This prevents drift between the source-of-truth files and what Grafana displays.

ConfigMap Sidecar in Kubernetes#

With kube-prometheus-stack, a sidecar container watches for ConfigMaps with a specific label and loads their JSON content as dashboards:

# Helm values for kube-prometheus-stack
grafana:
  sidecar:
    dashboards:
      enabled: true
      label: grafana_dashboard
      labelValue: "1"
      folder: /tmp/dashboards
      searchNamespace: ALL    # watch all namespaces
      folderAnnotation: grafana_folder  # use annotation to set folder

Deploy a dashboard as a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: api-gateway-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
  annotations:
    grafana_folder: "Applications"
data:
  api-gateway.json: |-
    {
      "dashboard": {
        "title": "API Gateway",
        "uid": "api-gateway-001",
        "panels": [...]
      }
    }

The grafana_folder annotation controls which Grafana folder the dashboard appears in. Without it, the dashboard lands in the General folder.

Terraform Provider#

Manage dashboards as infrastructure with the Grafana Terraform provider:

resource "grafana_folder" "platform" {
  title = "Platform"
}

resource "grafana_dashboard" "node_overview" {
  folder      = grafana_folder.platform.id
  config_json = file("${path.module}/dashboards/node-overview.json")
}

resource "grafana_folder_permission" "platform_editors" {
  folder_uid = grafana_folder.platform.uid

  permissions {
    team_id    = grafana_team.platform.id
    permission = "Editor"
  }

  permissions {
    role       = "Viewer"
    permission = "Viewer"
  }
}

Grafonnet (Jsonnet)#

Generate dashboards programmatically when you have many similar services that need consistent panels:

local grafana = import 'github.com/grafana/grafonnet/gen/grafonnet-latest/main.libsonnet';

local serviceOverview(name) =
  grafana.dashboard.new('%s Overview' % name)
  + grafana.dashboard.withUid('%s-overview' % std.asciiLower(name))
  + grafana.dashboard.withPanels([
    grafana.panel.timeSeries.new('Request Rate')
    + grafana.panel.timeSeries.queryOptions.withTargets([
      grafana.query.prometheus.new('Prometheus',
        'sum(rate(http_requests_total{service="%s"}[5m]))' % name)
    ])
    + grafana.panel.timeSeries.standardOptions.withUnit('reqps')
    + grafana.panel.timeSeries.gridPos.withW(12) + grafana.panel.timeSeries.gridPos.withH(8),

    grafana.panel.timeSeries.new('Error Rate')
    + grafana.panel.timeSeries.queryOptions.withTargets([
      grafana.query.prometheus.new('Prometheus',
        'sum(rate(http_requests_total{service="%s",status_code=~"5.."}[5m])) / sum(rate(http_requests_total{service="%s"}[5m]))' % [name, name])
    ])
    + grafana.panel.timeSeries.standardOptions.withUnit('percentunit')
    + grafana.panel.timeSeries.gridPos.withW(12) + grafana.panel.timeSeries.gridPos.withH(8) + grafana.panel.timeSeries.gridPos.withX(12),
  ]);

{
  'auth-service.json': serviceOverview('auth-service'),
  'checkout-service.json': serviceOverview('checkout-service'),
  'inventory-service.json': serviceOverview('inventory-service'),
}

Build with: jsonnet -J vendor -m output/ dashboards.jsonnet

Dashboard Versioning and Lifecycle#

Built-in Versioning#

Grafana stores a version history for every dashboard save. Access it through the dashboard settings (gear icon) under “Versions.” You can diff any two versions and restore a previous one. This is a safety net, not a source-of-truth strategy.

GitOps Workflow#

The production-grade approach stores dashboard JSON in Git and provisions from there:

  1. Developer modifies a dashboard in Grafana UI (on a non-provisioned copy or a dev instance).
  2. Export the dashboard JSON via the API or the “Share” menu.
  3. Commit the JSON to the Git repository under the appropriate folder.
  4. CI/CD pipeline deploys the JSON as a ConfigMap (Kubernetes) or copies it to the provisioning directory.
  5. Grafana picks up the change automatically.
# Export a dashboard by UID
curl -s "http://localhost:3000/api/dashboards/uid/api-gateway-001" \
  -H "Authorization: Bearer $GRAFANA_TOKEN" | jq '.dashboard' > dashboards/api-gateway.json

Handling Drift#

When dashboards are provisioned from Git but someone edits one in the UI, the UI copy diverges. Two strategies:

  • Provisioned dashboards (recommended): Set editable: false in the provisioning config. The dashboard is read-only in the UI. Changes must go through Git.
  • Reconciliation: Run a periodic job that exports all dashboards from Grafana and diffs against Git. Flag any drift for review.

Dashboard Maintenance#

Detecting Broken Queries#

After metric renames, label changes, or Prometheus upgrades, panels silently break and show “No data.” Proactively detect these:

# List all dashboards and check for panels with zero data points
# using the Grafana search API
curl -s "http://localhost:3000/api/search?type=dash-db&limit=100" \
  -H "Authorization: Bearer $GRAFANA_TOKEN" | jq '.[].uid'

Automate by querying each panel’s expression against Prometheus and flagging any that return empty results.

Identifying Orphaned Dashboards#

Dashboards nobody views waste cognitive load when people scroll past them and create maintenance burden.

If you have Grafana usage stats enabled, query the built-in SQLite or PostgreSQL database:

SELECT d.title, d.uid, ds.updated_at
FROM dashboard d
LEFT JOIN dashboard_usage_stats ds ON d.id = ds.dashboard_id
WHERE ds.views_last_30_days IS NULL OR ds.views_last_30_days = 0
ORDER BY d.updated_at ASC;

Dashboard Standards#

Establish conventions that every dashboard follows:

  • Variable names: Always use $namespace, $pod, $node, $cluster. Never use $ns or $hostname.
  • Time ranges: Default to “Last 1 hour” for operational dashboards, “Last 24 hours” for capacity dashboards.
  • Color scheme: Green for healthy, yellow for warning, red for critical. Use the same thresholds across all dashboards.
  • Row organization: Put the most important (user-facing) panels at the top. Infrastructure details go in collapsed rows at the bottom.

Multi-Tenancy#

Grafana Organizations#

For hard multi-tenancy (separate teams or customers that must not see each other’s data), use Grafana Organizations. Each org has its own set of dashboards, data sources, users, and teams.

# Create a new organization
curl -X POST http://localhost:3000/api/orgs \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"name": "Team Alpha"}'

Each organization can have its own Prometheus data source pointing to a different Prometheus instance or using different basic auth credentials with Thanos/Mimir multi-tenancy.

Row-Level Security with Variables#

For soft multi-tenancy (teams share a Grafana instance but see only their data), use dashboard variables that filter by a tenant label:

-- Variable query: only show namespaces the user's team owns
label_values(kube_namespace_labels{label_team="$__user.teams"}, namespace)

In practice, enforce this by setting the namespace variable’s default to the team’s namespace and removing the “All” option. This is not a security boundary – a determined user can modify the query – but it prevents accidental cross-team data access.

Alerting: Grafana vs Alertmanager#

Use Alertmanager when alerts are purely Prometheus-based. Alertmanager provides gossip-based HA deduplication, a mature routing tree, and integration with the Prometheus ecosystem.

Use Grafana Alerting when you need to:

  • Alert on Loki log queries (e.g., error log count exceeds threshold).
  • Combine conditions across multiple data sources (e.g., Prometheus metric AND Loki log pattern).
  • Alert on data sources that Alertmanager cannot evaluate (Elasticsearch, CloudWatch, InfluxDB).

Avoid duplicating the same alert condition in both systems. If a Prometheus alerting rule already exists for a condition, do not recreate it in Grafana.

Backup and Recovery#

Database Backup#

Grafana stores dashboard metadata, user accounts, org settings, annotations, and alert state in its database (SQLite by default, PostgreSQL or MySQL for production).

# SQLite backup (if using default)
sqlite3 /var/lib/grafana/grafana.db ".backup /backups/grafana-$(date +%Y%m%d).db"

# PostgreSQL backup
pg_dump -h grafana-db.internal -U grafana grafana_db > /backups/grafana-$(date +%Y%m%d).sql

Dashboard Export Script#

Export all dashboards as JSON for disaster recovery or migration:

#!/bin/bash
GRAFANA_URL="http://localhost:3000"
API_KEY="your-api-key"
OUTPUT_DIR="./grafana-backup/dashboards"

mkdir -p "$OUTPUT_DIR"

# Get all dashboard UIDs
uids=$(curl -s "${GRAFANA_URL}/api/search?type=dash-db&limit=5000" \
  -H "Authorization: Bearer ${API_KEY}" | jq -r '.[].uid')

for uid in $uids; do
  dashboard=$(curl -s "${GRAFANA_URL}/api/dashboards/uid/${uid}" \
    -H "Authorization: Bearer ${API_KEY}")
  folder=$(echo "$dashboard" | jq -r '.meta.folderTitle // "General"')
  title=$(echo "$dashboard" | jq -r '.dashboard.title')

  mkdir -p "${OUTPUT_DIR}/${folder}"
  echo "$dashboard" | jq '.dashboard' > "${OUTPUT_DIR}/${folder}/${title}.json"
  echo "Exported: ${folder}/${title}"
done

Run this script on a schedule (daily via cron or a Kubernetes CronJob) and push the output to a Git repository for version-controlled backups.