schemas, pydantic, and validation — making the model return real data
Free-form text is not data
The first time you ask Claude to "extract the user's email from this
support ticket," it returns exactly what you wanted: maya@promptdojo.dev.
The second time, it returns Sure! The email is maya@promptdojo.dev.. The
third time, The email address you're looking for is maya@promptdojo.dev.
Three different shapes. Your downstream code expected a string. Now it either has to regex-extract the email out of natural language every time, or fail.
This is why every production AI feature — without exception — uses structured output: you tell the model exactly what JSON shape to return, and you validate the response when it comes back.
The pattern AI ships every time:
import anthropic
from pydantic import BaseModel
class Ticket(BaseModel):
email: str
severity: int # 1-5
summary: str
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=[{"role": "user", "content": "Extract from: <ticket text>"}],
)
raw = response.content[0].text
ticket = Ticket.model_validate_json(raw) # parse + validate in one call
print(ticket.email)
Three pieces. The schema (the BaseModel class), the prompt (asking for
JSON), and the validation step (model_validate_json). All three matter.
Skip the schema and you're back to regex-on-natural-language. Skip the
validation and you'll find out about the model's hallucinated field at
3am from a NoneType error in production.
Browser note: Pydantic isn't bundled with Pyodide. We'll use plain
dictvalidation here — same logic, just spelled out — so you can read and write the pattern. Switching toBaseModellater is a two-line change.
Where AI specifically gets this wrong
- Trusting the model on first try. Models lie. They drop required fields, return strings where you wanted ints, and invent enum values you never defined. Validate every response.
- Forgetting
response_format/ tool use. On OpenAI, the modern canonical way is Structured Outputs:response_format={"type": "json_schema", "json_schema": {...}}— the API guarantees the response conforms to your schema. The older{"type": "json_object"}mode only guarantees valid JSON, not your shape. On Anthropic you typically use a tool definition (or the neweroutput_formatparameter). Without one, the model wraps its JSON in prose. - Catching
ValidationErrortoo broadly. When Pydantic rejects a response, you usually want to retry with the error message back to the model — not silently fall through.
Run the editor. We extract a name and validate the shape by hand.