Workflow:Guardrails ai Guardrails Structured Data Generation
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Structured_Data, Validation |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
End-to-end process for generating validated structured JSON data from LLMs using Pydantic schema definitions and Guardrails' multiple structured output strategies.
Description
This workflow covers how to use Guardrails to coerce LLM outputs into well-typed, validated structured data. Users define their desired output shape as a Pydantic BaseModel, then create a Guard using Guard.for_pydantic. The Guard ensures the LLM response conforms to the schema through one of several strategies: OpenAI-compatible function/tool calling, prompt engineering with JSON schema suffixes, constrained decoding for HuggingFace models, JSON mode, or strict JSON mode. Validators can be attached directly to Pydantic Field definitions for per-field validation. The output is a parsed, validated Python dictionary matching the Pydantic model.
Usage
Execute this workflow when you need an LLM to return data in a specific, predictable structure rather than free-form text. Typical use cases include extracting entities from unstructured text into typed objects, generating form data, building data pipelines that consume LLM output, or any scenario where downstream code expects a specific JSON schema. This workflow is especially valuable when the LLM output must be programmatically processed rather than displayed directly to users.
Execution Steps
Step 1: Define Pydantic Output Model
Create a Pydantic BaseModel class that describes the structure, types, and constraints of the desired LLM output. Each field includes a type annotation and a Field descriptor with a description that helps guide the LLM. Validators from Guardrails Hub can be attached directly to fields for per-field validation.
Key considerations:
- Field descriptions are included in the LLM prompt and influence output quality
- Nested models, lists, and optional fields are supported
- Validators attached to fields run automatically during Guard validation
Step 2: Create Guard from Pydantic Model
Instantiate a Guard using the Guard.for_pydantic factory method, passing the Pydantic model as the output_class. This configures the Guard to generate and validate structured output matching the model schema. Optionally specify an output_formatter for constrained decoding with HuggingFace models.
Key considerations:
- Guard.for_pydantic automatically generates the JSON schema from the Pydantic model
- The output_formatter parameter enables constrained decoding (e.g., "jsonformer" for HuggingFace)
- Additional validators can be chained with .use() after creation
Step 3: Select Structured Output Strategy
Choose the appropriate strategy for obtaining structured JSON from the target LLM. The options are: function/tool calling (for OpenAI-compatible models), prompt engineering with JSON schema suffix templates, constrained decoding (for HuggingFace pipelines), JSON mode (response_format parameter), or strict JSON mode (using response_format_json_schema). The choice depends on the LLM provider's capabilities.
Key considerations:
- Function calling provides the most reliable structured output for supported models
- Prompt engineering with gr.complete_json_suffix_v2 or v3 works with any model
- Constrained decoding guarantees valid JSON structure but only works with local HuggingFace models
- Strict JSON mode is the newest approach and provides schema-level enforcement
Step 4: Execute Guard with LLM
Invoke the Guard callable with the LLM model, messages, and any strategy-specific parameters (tools, response_format, prompt_params). The Guard calls the LLM, parses the structured response, validates it against the Pydantic schema and any attached validators, and returns a ValidationOutcome with the parsed output.
Key considerations:
- Use prompt_params to inject dynamic values into prompt templates
- The Guard handles JSON parsing, type coercion, and schema validation automatically
- If validation fails with reask action, the Guard re-prompts the LLM with error details
Step 5: Consume Validated Output
Access the validated_output from the ValidationOutcome, which is a Python dictionary matching the Pydantic model structure. The output has been type-checked, validated against all field-level and guard-level validators, and had any corrective actions applied. It can be directly used by downstream application logic or deserialized into the Pydantic model instance.
Key considerations:
- validated_output is a dict, not a Pydantic model instance; cast if needed
- If validation failed and on_fail is not exception, validated_output may contain None for failed fields
- The validation_passed boolean indicates overall success