Structured Output Patterns#

Agents need structured data from LLMs – not free-form text with JSON somewhere inside it. When an agent asks a model to classify a bug as critical/medium/low and gets back a paragraph explaining the classification, the agent cannot act on it programmatically. Structured output is the bridge between LLM reasoning and deterministic code.

Three Approaches#

JSON Mode#

The simplest approach. Tell the API to return valid JSON and describe the shape you want in the prompt.

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[{
        "role": "user",
        "content": """Analyze this error log and return JSON with this exact structure:
        {"severity": "critical|high|medium|low", "category": "string", "summary": "string", "actionable": true|false}

        Error log: Connection refused to database at 10.0.1.5:5432 after 30s timeout"""
    }]
)
result = json.loads(response.choices[0].message.content)

JSON mode guarantees syntactically valid JSON. It does not guarantee the JSON matches your schema. The model might return {"answer": "it's a critical error"} instead of the structure you asked for. You must validate.

Function Calling / Tool Use#

Define the output shape as a function schema. The model returns structured arguments that match the schema.

tools = [{
    "type": "function",
    "function": {
        "name": "report_analysis",
        "description": "Report the analysis results",
        "parameters": {
            "type": "object",
            "properties": {
                "severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
                "category": {"type": "string"},
                "summary": {"type": "string", "maxLength": 200},
                "actionable": {"type": "boolean"}
            },
            "required": ["severity", "category", "summary", "actionable"]
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "report_analysis"}},
    messages=[{"role": "user", "content": f"Analyze this error: {error_log}"}]
)

args = json.loads(response.choices[0].message.tool_calls[0].function.arguments)

Function calling provides stronger schema adherence than JSON mode. The model is trained to emit arguments matching the declared schema. Enums, required fields, and type constraints are generally respected.

Structured Outputs (Strict Mode)#

OpenAI and Anthropic both offer strict structured output modes that guarantee schema compliance at the API level – the response will always match your schema or the request will fail.

# OpenAI strict mode
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "error_analysis",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "severity": {"type": "string", "enum": ["critical", "high", "medium", "low"]},
                    "category": {"type": "string"},
                    "summary": {"type": "string"},
                    "actionable": {"type": "boolean"}
                },
                "required": ["severity", "category", "summary", "actionable"],
                "additionalProperties": False
            }
        }
    },
    messages=[{"role": "user", "content": f"Analyze this error: {error_log}"}]
)

Strict mode is the most reliable option when available. It constrains the model’s token generation to only produce valid schema-compliant output. The tradeoff: schema restrictions apply (no optional fields in some implementations, limited nesting depth).

Validation Pipeline#

Even with strict mode, build a validation layer. Providers change behavior, models get updated, and edge cases appear in production that did not appear in testing.

from pydantic import BaseModel, field_validator

class ErrorAnalysis(BaseModel):
    severity: Literal["critical", "high", "medium", "low"]
    category: str
    summary: str
    actionable: bool

    @field_validator("summary")
    @classmethod
    def summary_not_empty(cls, v: str) -> str:
        if len(v.strip()) < 10:
            raise ValueError("Summary too short to be useful")
        return v.strip()

def parse_llm_response(raw: str) -> ErrorAnalysis:
    try:
        data = json.loads(raw)
        return ErrorAnalysis(**data)
    except json.JSONDecodeError as e:
        raise OutputParsingError(f"Invalid JSON: {e}")
    except ValidationError as e:
        raise OutputParsingError(f"Schema violation: {e}")

In TypeScript, Zod fills the same role:

import { z } from "zod";

const ErrorAnalysis = z.object({
  severity: z.enum(["critical", "high", "medium", "low"]),
  category: z.string().min(1),
  summary: z.string().min(10),
  actionable: z.boolean(),
});

function parseLLMResponse(raw: string) {
  const data = JSON.parse(raw);
  return ErrorAnalysis.parse(data);
}

Handling Malformed Output#

When parsing fails, retry with the error message included in context. This gives the model a concrete signal about what went wrong.

async def get_structured_output(prompt: str, schema: type[BaseModel], max_retries: int = 2):
    messages = [{"role": "user", "content": prompt}]

    for attempt in range(max_retries + 1):
        response = await client.chat.completions.create(
            model="gpt-4o",
            response_format={"type": "json_object"},
            messages=messages,
        )
        raw = response.choices[0].message.content
        try:
            return schema.model_validate_json(raw)
        except ValidationError as e:
            if attempt == max_retries:
                raise
            # Feed the error back so the model can correct itself
            messages.append({"role": "assistant", "content": raw})
            messages.append({
                "role": "user",
                "content": f"That output failed validation:\n{e}\n\nPlease fix and return valid JSON.",
            })

This retry-with-context pattern succeeds on the second attempt roughly 90% of the time for well-defined schemas. The model sees its own broken output and the specific validation error, which is usually enough to self-correct.

Provider Comparison#

Feature	OpenAI	Anthropic	Google	Local (Ollama)
JSON mode	Yes	Yes	Yes	Model-dependent
Function calling	Yes	Yes (tool use)	Yes	Limited
Strict schema	Yes (`strict: true`)	Yes (tool use)	Limited	No
Enum enforcement	Strong	Strong	Moderate	Weak
Nested objects	Full support	Full support	Full support	Inconsistent

For local models, structured output reliability varies widely. Smaller models (7B) frequently break schema constraints. Two mitigations: use function calling when available (the grammar-constrained decoding in llama.cpp and vLLM enforces schemas at the token level), or use a lenient parser that extracts JSON from mixed text output.

Schema Design Tips#

Keep schemas flat. Deeply nested objects increase the chance of malformed output. If you need nested data, consider splitting into multiple LLM calls.

Use enums liberally. "severity": {"enum": ["critical", "high", "medium", "low"]} is far more reliable than "severity": {"type": "string"} with a prompt instruction to use one of four values.

Make descriptions work double duty. The description field in your schema is part of the prompt the model sees. Use it to clarify edge cases: "actionable": {"type": "boolean", "description": "True if a developer should take action. False for informational-only or expected errors."}.

Avoid optional fields when possible. Optional fields create ambiguity – did the model skip the field because it is not applicable, or because it forgot? Use explicit null values or sentinel strings like "not_applicable" so you can distinguish intentional omission from accidental omission.