Workflow:Guardrails ai Guardrails Structured Data Generation

Knowledge Sources	Guardrails Guardrails Docs Pydantic Docs
Domains	LLMs, Structured_Data, Validation
Last Updated	2026-02-14 12:00 GMT

Overview

End-to-end process for generating validated structured JSON data from LLMs using Pydantic schema definitions and Guardrails' multiple structured output strategies.

Description

This workflow covers how to use Guardrails to coerce LLM outputs into well-typed, validated structured data. Users define their desired output shape as a Pydantic BaseModel, then create a Guard using Guard.for_pydantic. The Guard ensures the LLM response conforms to the schema through one of several strategies: OpenAI-compatible function/tool calling, prompt engineering with JSON schema suffixes, constrained decoding for HuggingFace models, JSON mode, or strict JSON mode. Validators can be attached directly to Pydantic Field definitions for per-field validation. The output is a parsed, validated Python dictionary matching the Pydantic model.

Usage

Execute this workflow when you need an LLM to return data in a specific, predictable structure rather than free-form text. Typical use cases include extracting entities from unstructured text into typed objects, generating form data, building data pipelines that consume LLM output, or any scenario where downstream code expects a specific JSON schema. This workflow is especially valuable when the LLM output must be programmatically processed rather than displayed directly to users.

Execution Steps

Step 1: Define Pydantic Output Model

Create a Pydantic BaseModel class that describes the structure, types, and constraints of the desired LLM output. Each field includes a type annotation and a Field descriptor with a description that helps guide the LLM. Validators from Guardrails Hub can be attached directly to fields for per-field validation.

Key considerations:

Field descriptions are included in the LLM prompt and influence output quality
Nested models, lists, and optional fields are supported
Validators attached to fields run automatically during Guard validation

Step 2: Create Guard from Pydantic Model

Instantiate a Guard using the Guard.for_pydantic factory method, passing the Pydantic model as the output_class. This configures the Guard to generate and validate structured output matching the model schema. Optionally specify an output_formatter for constrained decoding with HuggingFace models.

Key considerations:

Guard.for_pydantic automatically generates the JSON schema from the Pydantic model
The output_formatter parameter enables constrained decoding (e.g., "jsonformer" for HuggingFace)
Additional validators can be chained with .use() after creation

Step 3: Select Structured Output Strategy

Choose the appropriate strategy for obtaining structured JSON from the target LLM. The options are: function/tool calling (for OpenAI-compatible models), prompt engineering with JSON schema suffix templates, constrained decoding (for HuggingFace pipelines), JSON mode (response_format parameter), or strict JSON mode (using response_format_json_schema). The choice depends on the LLM provider's capabilities.

Key considerations:

Function calling provides the most reliable structured output for supported models
Prompt engineering with gr.complete_json_suffix_v2 or v3 works with any model
Constrained decoding guarantees valid JSON structure but only works with local HuggingFace models
Strict JSON mode is the newest approach and provides schema-level enforcement

Step 4: Execute Guard with LLM

Invoke the Guard callable with the LLM model, messages, and any strategy-specific parameters (tools, response_format, prompt_params). The Guard calls the LLM, parses the structured response, validates it against the Pydantic schema and any attached validators, and returns a ValidationOutcome with the parsed output.

Key considerations:

Use prompt_params to inject dynamic values into prompt templates
The Guard handles JSON parsing, type coercion, and schema validation automatically
If validation fails with reask action, the Guard re-prompts the LLM with error details

Step 5: Consume Validated Output

Access the validated_output from the ValidationOutcome, which is a Python dictionary matching the Pydantic model structure. The output has been type-checked, validated against all field-level and guard-level validators, and had any corrective actions applied. It can be directly used by downstream application logic or deserialized into the Pydantic model instance.

Key considerations:

validated_output is a dict, not a Pydantic model instance; cast if needed
If validation failed and on_fail is not exception, validated_output may contain None for failed fields
The validation_passed boolean indicates overall success

Execution Diagram

GitHub URL

Workflow Repository