Skip to content

AI Integration · Development

LLM Structured Outputs in 2026: Reliable JSON Without the Parser Nightmares

Getting a language model to return valid, schema-conforming JSON is harder than it looks. Here's what works in production, from native structured output APIs to library-level validation.

Anurag Verma

Anurag Verma

7 min read

LLM Structured Outputs in 2026: Reliable JSON Without the Parser Nightmares

Sponsored

Share

The first time you ask an LLM to return JSON, it usually works. The hundredth time, you find the edge cases: a trailing comma, a key spelled differently than you specified, a markdown code fence wrapped around what should be a raw object, or a model that decides to explain itself before the JSON starts.

These failures are annoying in development. In production, they silently break data pipelines, crash parsers, and cause user-facing errors that are hard to reproduce because the model won’t always make the same mistake twice.

Structured outputs have matured significantly in the past year. Here’s what actually works.

Why LLMs Struggle With Schema Adherence

Language models generate text token by token, sampling from a probability distribution at each step. There’s nothing in that process that intrinsically prevents a model from generating "price": "ten dollars" when your schema says "price" should be a number.

Instruction following helps — “return valid JSON with this schema” works most of the time — but it’s probabilistic. A well-prompted GPT-4o or Claude will comply most of the time, but “most of the time” is a problem when downstream code has no fallback.

The solutions fall into three approaches:

  1. Constrained generation (guaranteed valid output, provider-side)
  2. Validation with retry (application-level, works with any model)
  3. Schema-aware libraries (abstracts the retry logic)

Constrained Generation

Several providers now support native structured output that enforces schema adherence at the generation layer. The model physically cannot produce a token sequence that would violate the schema.

OpenAI Structured Outputs

OpenAI’s structured outputs (available on gpt-4o and newer) accept a JSON Schema and guarantee compliant output:

from openai import OpenAI
from pydantic import BaseModel

client = OpenAI()

class ProductExtraction(BaseModel):
    name: str
    price: float
    in_stock: bool
    category: str

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Extract product info: 'Blue Widget, $14.99, available in warehouse'"}
    ],
    response_format=ProductExtraction,
)

product = response.choices[0].message.parsed
print(product.name, product.price, product.in_stock)

The .parsed attribute gives you a Pydantic model directly. No JSON parsing, no validation step. If the model returns something that violates the schema, the SDK raises an error rather than silently returning malformed data.

Anthropic Tool Use for Structured Extraction

Claude doesn’t have a structured_output mode in the same form, but tool use reliably produces schema-conforming data because the model must fill in a typed function call:

import anthropic
import json

client = anthropic.Anthropic()

tools = [{
    "name": "extract_product",
    "description": "Extract product information from text",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "in_stock": {"type": "boolean"},
            "category": {"type": "string"}
        },
        "required": ["name", "price", "in_stock", "category"]
    }
}]

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    tools=tools,
    tool_choice={"type": "tool", "name": "extract_product"},
    messages=[
        {"role": "user", "content": "Extract product info: 'Blue Widget, $14.99, available in warehouse'"}
    ]
)

tool_input = response.content[0].input
print(tool_input)
# {'name': 'Blue Widget', 'price': 14.99, 'in_stock': True, 'category': 'hardware'}

tool_choice: {"type": "tool", "name": "extract_product"} forces the model to fill in that specific tool call. Combined with a well-defined input schema, this is reliable enough for production use without retry logic.

Validation With Retry

When you can’t use constrained generation (self-hosted models, providers without structured output support, models too small to follow complex schemas reliably), validation with retry is the fallback:

import json
from jsonschema import validate, ValidationError
import time

PRODUCT_SCHEMA = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "price": {"type": "number"},
        "in_stock": {"type": "boolean"}
    },
    "required": ["name", "price", "in_stock"]
}

def extract_json(text: str) -> dict:
    # Strip markdown code fences if present
    text = text.strip()
    if text.startswith("```"):
        lines = text.split("\n")
        text = "\n".join(lines[1:-1])
    return json.loads(text)

def get_structured_output(prompt: str, schema: dict, max_retries: int = 3) -> dict:
    validation_hint = f"\n\nReturn ONLY valid JSON matching this schema:\n{json.dumps(schema, indent=2)}"
    
    for attempt in range(max_retries):
        response_text = call_llm(prompt + validation_hint)
        
        try:
            data = extract_json(response_text)
            validate(instance=data, schema=schema)
            return data
        except (json.JSONDecodeError, ValidationError) as e:
            if attempt < max_retries - 1:
                prompt += f"\n\nPrevious attempt failed: {str(e)[:200]}. Try again."
                time.sleep(0.5 * (attempt + 1))
            else:
                raise ValueError(f"Failed to get valid output after {max_retries} attempts") from e

The key detail: append the specific error to the retry prompt. “Your previous response failed JSON validation: Missing required property ‘price’” gets a better correction than a generic “try again.”

The instructor Library

instructor (by Jason Liu) is the most widely used library for this pattern. It wraps OpenAI, Anthropic, and several other providers with automatic validation and retry using Pydantic models:

import instructor
from anthropic import Anthropic
from pydantic import BaseModel, field_validator

client = instructor.from_anthropic(Anthropic())

class ProductExtraction(BaseModel):
    name: str
    price: float
    in_stock: bool
    category: str
    
    @field_validator("price")
    @classmethod
    def price_must_be_positive(cls, v):
        if v < 0:
            raise ValueError("Price cannot be negative")
        return v

product, completion = client.chat.completions.create_with_completion(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract product info: 'Blue Widget, $14.99, available'"}
    ],
    response_model=ProductExtraction,
)

instructor handles the retry logic, the prompt injection of the schema, and the Pydantic validation. It also provides usage stats on the completion object so you can track token costs across retries.

The library supports streaming with partial validation, which is useful for long extraction tasks where you want to start processing before the full response arrives.

Schema Design for Better Results

How you define the schema affects compliance rate, not just what gets validated.

Use specific types with descriptions:

# Vague — the model doesn't know what format to use
class Bad(BaseModel):
    date: str  # "January 5th", "2026-01-05", "01/05/26" — all valid strings

# Unambiguous
class Good(BaseModel):
    date: str = Field(description="ISO 8601 date string, e.g. '2026-01-05'")

Break complex nested objects into smaller schemas. A schema with 15 required fields and 3 levels of nesting will see more failures than one with 5 fields. If you need complex data, extract it in stages: first the top-level structure, then nested details for each item.

Add descriptions to fields:

class InvoiceLineItem(BaseModel):
    description: str = Field(description="Short description of the item or service")
    quantity: float = Field(description="Number of units, can be fractional for hourly work")
    unit_price: float = Field(description="Price per unit in USD, not the line total")
    total: float = Field(description="quantity * unit_price")

The description shows up in the generated JSON Schema and gets included in the prompt. Models read it and conform to it.

When to Use Each Approach

SituationBest Option
OpenAI models, schema mattersNative structured outputs
Anthropic models, schema mattersTool use with forced tool_choice
Any model, moderate complexityinstructor library
Self-hosted or fine-tuned modelValidation with retry
Simple extraction, high volumeOne-shot prompt + JSON parse + cheap retry
Complex nested schemaStage it: extract in multiple passes

Common Failure Patterns

The explainer: The model writes “Here is the JSON you requested:” before the JSON. Strip preamble before parsing. Constrained generation prevents this; prompting-only approaches do not.

The approximator: When asked for a number, the model returns "approximately 50" or "~50". Add a field description: “Return as a plain number with no text, e.g. 50”.

The null evader: The model returns an empty string "" instead of null for missing optional fields. Use Optional[str] = None in Pydantic and let instructor catch the validation failure.

The hallucinating enumerator: When you have an enum field (e.g. category from a fixed list), the model invents a category that doesn’t exist. Use Literal["A", "B", "C"] as the type so validation rejects unknown values.

Structured outputs have gotten reliable enough that the parse-and-hope approach from a few years ago is not the right default anymore. For any production feature that depends on structured data from an LLM, use one of the constraint-based approaches and validate with Pydantic. The retry cost is a small fraction of what bad parses cost at scale.

Sponsored

Enjoyed it? Pass it on.

Share this article.

Sponsored

The dispatch

Working notes from
the studio.

A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.

No spam, ever. Unsubscribe anytime.

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored