JSON mode is a setting offered by major LLM APIs (OpenAI, Anthropic, Google, and most major providers) that constrains the model to return only valid JSON. It eliminates the most common failure mode of "respond as JSON" prompts — the model wrapping the JSON in conversational prose ("Here is the JSON you requested:") — and dramatically reduces parse errors.
JSON mode comes in two flavours in 2026:
- Loose JSON mode — the model is biased to return valid JSON but not constrained to any specific schema. The output will parse but the structure is up to the prompt to specify.
- Strict structured output — the model is constrained at the decoding level to follow a specific JSON schema. The output is guaranteed to validate against the schema.
OpenAI distinguishes these as response_format set to type json_object (loose) versus type json_schema with a supplied json_schema object (strict). Anthropic uses tool calling with a schema as the strict path. Google offers both modes via Vertex.
When to use which:
- Strict structured output — anytime the JSON is going to feed downstream code. The guarantee that parsing will succeed is worth the small additional latency.
- Loose JSON mode — for prototyping, exploration, or when the schema genuinely varies per request. Add validation downstream.
- Plain prompting with "respond as JSON" — only for one-off scripts and notebooks; not production-grade.
The practical mechanics for a US engineer in 2026:
- Define the schema in code with Zod (TypeScript) or Pydantic (Python).
- Generate the JSON schema from the type definition.
- Pass it to the LLM through whatever provider's structured output mode.
- Parse the response back through the same Zod/Pydantic schema for type safety.
- Catch and retry on the rare validation failure with a cleaner prompt.
Structured output has reshaped how production LLM apps are built. Three years ago, parsing LLM JSON involved regex hacks and prayer. In 2026, structured output is reliable enough that LLMs are routinely used as deterministic-ish data-extraction services in pipelines that feed databases, search indexes and analytics warehouses. The remaining limitations are around extremely complex schemas (deeply nested or recursive types) and very long outputs (where token limits cut in), neither of which structured output can magically solve.
For a US team shipping LLM features, the operational rule is: structured output by default, always parse on the consumer side, always have a fallback. Free-form text output should be the exception, used only when humans are the consumer and prose actually serves the use case.