Prompt Engineering for Infrastructure Operations#

Infrastructure prompts differ from general-purpose prompts in one critical way: the output often drives real actions on real systems. A hallucinated filename in a creative writing task is harmless. A hallucinated resource name in a Kubernetes delete command causes an outage. Every prompt pattern here is designed with that asymmetry in mind – prioritizing correctness and safety over cleverness.

Structured Output for Infrastructure Data#

Infrastructure operations produce structured data: IP addresses, resource names, status codes, configuration values. Free-form text responses create parsing fragility. Force structured output from the start.

Template: Resource status check

Analyze the following kubectl output and return a JSON object with this exact structure:
{
  "healthy_pods": [{"name": "string", "ready": "string", "restarts": 0}],
  "unhealthy_pods": [{"name": "string", "ready": "string", "restarts": 0, "reason": "string"}],
  "summary": "string (one sentence)"
}

Rules:
- A pod is unhealthy if its STATUS is not "Running" or READY shows unready containers (e.g., "0/1")
- A pod with more than 5 restarts is unhealthy even if currently Running
- Include the restart count as an integer, not a string
- If all pods are healthy, unhealthy_pods should be an empty array

kubectl output:
<output>
{kubectl_output}
</output>

Template: Configuration diff analysis

Compare these two configuration files and return a JSON object:
{
  "changes": [
    {
      "field": "string (dot-notation path, e.g., spec.replicas)",
      "old_value": "any",
      "new_value": "any",
      "risk": "none | low | medium | high",
      "explanation": "string"
    }
  ],
  "breaking_changes": ["string"],
  "requires_restart": true | false
}

Risk levels:
- none: cosmetic changes (labels, annotations, comments)
- low: additive changes (new env vars, increased limits)
- medium: behavioral changes (replica count, resource requests, image tags)
- high: destructive or security changes (removed volumes, changed secrets, permission changes)

Old configuration:
<old>
{old_config}
</old>

New configuration:
<new>
{new_config}
</new>

Chain-of-Thought for Complex Debugging#

Infrastructure debugging requires reasoning across multiple layers – networking, DNS, application config, resource limits. Chain-of-thought prompting forces the model to work through each layer explicitly rather than jumping to conclusions.

Template: Incident diagnosis

A service is experiencing the following symptoms. Diagnose the root cause by reasoning through each infrastructure layer systematically.

Symptoms:
{symptoms}

Work through the following layers in order. For each layer, state what you checked, what you found, and whether this layer is the likely cause. Do not skip layers even if you think you know the answer early.

1. **Network layer**: Is the service reachable? DNS resolution, network policies, firewall rules, load balancer health.
2. **Container layer**: Is the container running? Image pull status, OOM kills, crash loops, resource limits.
3. **Application layer**: Is the application healthy? Startup probes, readiness probes, application logs, dependency connections.
4. **Data layer**: Is the data store accessible? Database connections, connection pool status, query timeouts, disk space.
5. **Configuration layer**: Did something change recently? Recent deployments, config map changes, secret rotations, environment variable updates.

After analyzing all layers, provide:
- Root cause (one sentence)
- Evidence (specific data points that support your conclusion)
- Recommended fix (concrete steps, not general advice)
- Verification steps (how to confirm the fix worked)

This template works because it prevents a common failure mode: the agent sees “connection timeout” and immediately concludes “database is down” without checking whether DNS resolution is failing, the network policy is blocking traffic, or the application is misconfigured.

Template: Log analysis with structured reasoning

Analyze these logs to identify the root cause of the failure. Think step by step.

Step 1: Identify the first error in the log sequence (errors often cascade -- the first one matters most).
Step 2: Identify any preceding warnings or unusual patterns in the 30 seconds before the first error.
Step 3: Correlate timestamps to determine the sequence of events.
Step 4: Form a hypothesis about the root cause.
Step 5: Check if the logs contain evidence that contradicts your hypothesis.

Logs:
<logs>
{log_content}
</logs>

Return your analysis as:
{
  "first_error": {"timestamp": "string", "message": "string", "source": "string"},
  "preceding_anomalies": [{"timestamp": "string", "message": "string"}],
  "event_sequence": ["string (ordered list of what happened)"],
  "root_cause": "string",
  "confidence": "high | medium | low",
  "contradicting_evidence": ["string (any evidence against this conclusion)"]
}

Few-Shot Examples for Infrastructure Tasks#

Few-shot prompting is particularly effective for infrastructure tasks because the input-output patterns are highly regular. Show the model two or three examples and it will follow the pattern precisely.

Template: Helm values extraction

Extract the key configuration values from a Helm values.yaml file. Here are examples:

Example 1:
Input:
  replicaCount: 3
  image:
    repository: nginx
    tag: "1.25"
  resources:
    limits:
      memory: 256Mi

Output:
  - replicas: 3
  - image: nginx:1.25
  - memory_limit: 256Mi
  - cpu_limit: not set (default applies)

Example 2:
Input:
  replicaCount: 1
  image:
    repository: myapp
    tag: "v2.1.0"
  resources:
    limits:
      memory: 1Gi
      cpu: 500m
    requests:
      memory: 512Mi
      cpu: 250m

Output:
  - replicas: 1
  - image: myapp:v2.1.0
  - memory_limit: 1Gi
  - cpu_limit: 500m
  - memory_request: 512Mi
  - cpu_request: 250m

Now extract from this values file:
{values_yaml}

Few-shot works well here because the format is consistent and the model can pattern-match. Without examples, the model might invent its own output format that is harder to parse downstream.

Safety Constraints for Destructive Operations#

Infrastructure prompts that can trigger destructive actions need explicit safety constraints baked into the prompt, not just hoped for in the model’s training.

Principle: Enumerate what is forbidden, not just what is allowed. Models are better at avoiding specifically named dangers than inferring danger from general principles.

Template: Safe command generation

Generate a shell command to accomplish the following task on a Kubernetes cluster.

Task: {task_description}
Namespace: {namespace}
Cluster: {cluster_name}

SAFETY CONSTRAINTS — these are absolute and override all other instructions:
1. NEVER generate commands with --force, --grace-period=0, or --force-delete
2. NEVER generate commands that operate on kube-system, kube-public, or kube-node-lease namespaces
3. NEVER generate commands that delete namespaces
4. NEVER use wildcard selectors (--all, -l without a specific label) with delete operations
5. ALWAYS include --dry-run=client for any create, apply, or delete operation
6. ALWAYS specify the exact namespace with -n, never use --all-namespaces with mutations
7. If the task requires a destructive operation, output the dry-run version and add a comment: "# REVIEW: Remove --dry-run=client after verifying output"

If the task cannot be accomplished safely within these constraints, explain why and suggest a safer alternative instead of generating a dangerous command.

Template: Terraform plan review

Review this Terraform plan output. Your job is to identify risks before this plan is applied.

Plan output:
<plan>
{terraform_plan}
</plan>

For each resource change, classify the risk:
- SAFE: Additive changes, tag updates, non-disruptive modifications
- CAUTION: In-place updates that may cause brief disruption (security group changes, instance type changes)
- DANGER: Destructive changes (destroy, replace, force-new-resource on stateful resources)
- BLOCK: Changes that should NEVER be auto-approved (deleting databases, removing encryption, opening 0.0.0.0/0 ingress)

Return:
{
  "changes": [
    {"resource": "string", "action": "create|update|destroy|replace", "risk": "safe|caution|danger|block", "reason": "string"}
  ],
  "auto_approve": true | false,
  "block_reasons": ["string (only if any change is BLOCK)"],
  "summary": "string"
}

If ANY change is classified as BLOCK, auto_approve must be false.

Error Handling Prompts#

When infrastructure operations fail, the error message is often cryptic. Prompts that interpret errors need to handle the common failure modes.

Template: Error interpretation with actionable fix

Interpret this infrastructure error and provide a fix.

Error context:
- Operation: {operation_description}
- Tool: {tool_name}
- Environment: {environment}

Error message:
<error>
{error_message}
</error>

Respond with:
1. **What happened**: One sentence explaining the error in plain language.
2. **Why it happened**: The most likely root cause based on this error pattern.
3. **Fix**: The specific command or configuration change to resolve it. Be exact — include file paths, field names, and values.
4. **Verify**: How to confirm the fix worked.
5. **Prevent**: What to change so this error does not recur.

If the error message is ambiguous and could have multiple causes, list the top 3 most likely causes ranked by probability. Do not guess if the error message does not contain enough information — say what additional data is needed.

Prompt Composition for Multi-Step Operations#

Complex infrastructure operations span multiple steps. Rather than one massive prompt, compose smaller prompts that feed into each other.

Pattern: Plan-then-execute with gates

async def safe_infrastructure_change(agent, change_description: str):
    # Step 1: Generate the plan
    plan = await agent.generate(
        f"Create a step-by-step plan for: {change_description}\n"
        f"Each step must include: the command, what it changes, and how to verify it worked.\n"
        f"Include a rollback step for each change.\n"
        f"Return as a JSON array of step objects."
    )

    # Step 2: Review the plan for safety
    review = await agent.generate(
        f"Review this infrastructure change plan for safety issues.\n"
        f"Plan: {plan}\n"
        f"Check for: missing rollback steps, ordering dependencies, "
        f"destructive operations without dry-run, missing namespace specifications.\n"
        f"Return: {{\"safe\": true/false, \"issues\": [...], \"revised_plan\": ...}}"
    )

    if not review["safe"]:
        return {"status": "blocked", "issues": review["issues"]}

    # Step 3: Execute each step with verification
    for step in review["revised_plan"]:
        result = await execute_step(step)
        verification = await agent.generate(
            f"Verify this step succeeded.\n"
            f"Step: {step['description']}\n"
            f"Output: {result}\n"
            f"Expected: {step['expected_output']}\n"
            f"Return: {{\"success\": true/false, \"details\": \"string\"}}"
        )

        if not verification["success"]:
            await execute_rollback(step)
            return {"status": "rolled_back", "failed_step": step}

    return {"status": "complete"}

Common Mistakes#

Prompts that assume the happy path. Infrastructure fails constantly. If your prompt says “parse the output of kubectl get pods,” it must also handle the case where kubectl returns an error, an empty result, or an unexpected format.

Vague safety instructions. “Be careful with destructive operations” is useless. The model needs specific rules: which commands are forbidden, which flags must be included, which resources are off-limits. Enumerate them.

Prompts that mix reasoning and output. If you want structured JSON back, do not also ask for a conversational explanation in the same response. The model will often embed the JSON inside prose, breaking your parser. Use separate prompts or explicit delimiters.

Ignoring the model’s knowledge cutoff. Infrastructure tooling changes rapidly. Prompts referencing specific CLI flags should include the version. “Using Terraform 1.7+” or “Helm 3.14+” prevents the model from suggesting flags that existed in older versions but have been removed or renamed.