Web Development · Python
Pydantic v2: The Data Validation Layer Your Python APIs Actually Need
Pydantic v2 shipped with a Rust-based core that made validation 5-50x faster. But the performance is only part of it — the v2 model redesign changes how you structure validation logic for complex APIs.
Anurag Verma
7 min read
Sponsored
Pydantic is the data validation library that most Python developers use without thinking about it. FastAPI uses it for request and response models. SQLModel uses it for database schemas. LangChain uses it for tool definitions. If you’re writing Python APIs in 2026, you’re almost certainly running Pydantic v2 already — and there’s a reasonable chance you’re using about 20% of what it can do.
This is not a migration guide (the Pydantic team has a good one). This is a guide to the parts of v2 that change how you structure validation logic in production code, and that most developers only discover after hitting a problem they didn’t know how to solve.
The Core Model
The basics haven’t changed much from v1:
from pydantic import BaseModel, EmailStr
from datetime import datetime
from typing import Optional
class User(BaseModel):
id: int
email: EmailStr
name: str
created_at: datetime
is_active: bool = True
bio: Optional[str] = None
# Validation runs on instantiation
user = User(id=1, email="user@example.com", name="Jane", created_at="2024-01-01T00:00:00Z")
# Access validated data
print(user.email) # user@example.com
print(user.is_active) # True
print(user.model_dump()) # Dict representation
print(user.model_json_schema()) # JSON Schema
What’s new is the performance. v2’s core is written in Rust via the pydantic-core library. Validation that took 5 microseconds in v1 takes 0.5 microseconds in v2. For an API handling thousands of requests per second, this matters.
Field Validation: The Right Way to Add Constraints
v2 uses Annotated for constraints instead of Field arguments. This is more composable:
from pydantic import BaseModel, Field
from pydantic.functional_validators import field_validator, model_validator
from typing import Annotated
from decimal import Decimal
# Annotated approach — constraints live in the type, not the field definition
PositivePrice = Annotated[Decimal, Field(gt=0, decimal_places=2)]
ShortString = Annotated[str, Field(min_length=1, max_length=100)]
SafeHtml = Annotated[str, Field(pattern=r'^[a-zA-Z0-9\s.,!?-]+$')]
class Product(BaseModel):
name: ShortString
price: PositivePrice
description: SafeHtml
sku: Annotated[str, Field(pattern=r'^[A-Z]{3}-\d{6}$')]
# Reuse the type anywhere
class OrderItem(BaseModel):
product_name: ShortString
quantity: Annotated[int, Field(gt=0, le=100)]
unit_price: PositivePrice
The advantage of Annotated is that you define the constraint once and reuse the type. In v1, you’d repeat min_length=1, max_length=100 on every field. In v2, you define ShortString once.
Custom Validators
When built-in constraints aren’t enough, you need custom validators. v2 changed the API significantly:
from pydantic import BaseModel
from pydantic.functional_validators import field_validator, model_validator
from typing import Self
import re
class SignupRequest(BaseModel):
username: str
password: str
password_confirm: str
phone: str
@field_validator('username')
@classmethod
def username_valid(cls, v: str) -> str:
if not re.match(r'^[a-z0-9_]{3,20}$', v):
raise ValueError('username must be 3-20 characters, lowercase letters, numbers, underscores only')
return v.lower() # Normalize to lowercase
@field_validator('phone')
@classmethod
def phone_valid(cls, v: str) -> str:
# Strip formatting, validate E.164
cleaned = re.sub(r'[\s\-\(\)]', '', v)
if not re.match(r'^\+\d{10,15}$', cleaned):
raise ValueError('phone must be in E.164 format: +1234567890')
return cleaned
@model_validator(mode='after')
def passwords_match(self) -> Self:
if self.password != self.password_confirm:
raise ValueError('passwords do not match')
return self
Key differences from v1:
@field_validatorreplaces@validatorwith@classmethodrequired@model_validator(mode='after')runs after all field validation and gives access toselffor cross-field checks- Return the (possibly modified) value from field validators
The mode='before' variant on model_validator runs before field parsing and receives raw input:
@model_validator(mode='before')
@classmethod
def normalize_input(cls, data: dict) -> dict:
# Normalize all string fields before validation
return {
k: v.strip() if isinstance(v, str) else v
for k, v in data.items()
}
Nested Models and Configuration
Pydantic handles nested validation automatically:
from pydantic import BaseModel, ConfigDict
from typing import list
class Address(BaseModel):
street: str
city: str
country: str = "IN"
postal_code: str
class Order(BaseModel):
model_config = ConfigDict(
str_strip_whitespace=True, # Strip whitespace from all strings
str_to_lower=False,
validate_assignment=True, # Re-validate when attributes are set
frozen=False, # Allow mutation
)
order_id: str
items: list[OrderItem]
shipping_address: Address
billing_address: Address | None = None
def effective_billing_address(self) -> Address:
return self.billing_address or self.shipping_address
validate_assignment=True is useful for models that get updated after creation — it means order.order_id = "new-id" will re-run the order_id validator instead of setting the value directly.
Discriminated Unions for Event Schemas
One of the most useful v2 features for API design is discriminated unions. When you have multiple event types and need to deserialize them correctly based on a type field:
from pydantic import BaseModel
from typing import Literal, Annotated, Union
from pydantic import Field
class OrderCreatedEvent(BaseModel):
event_type: Literal["order.created"]
order_id: str
customer_id: str
total_amount: float
class OrderShippedEvent(BaseModel):
event_type: Literal["order.shipped"]
order_id: str
tracking_number: str
carrier: str
class OrderCancelledEvent(BaseModel):
event_type: Literal["order.cancelled"]
order_id: str
reason: str
refund_amount: float | None = None
WebhookEvent = Annotated[
Union[OrderCreatedEvent, OrderShippedEvent, OrderCancelledEvent],
Field(discriminator='event_type')
]
class WebhookPayload(BaseModel):
event: WebhookEvent
timestamp: str
signature: str
# Pydantic picks the right model based on event_type
raw = {
"event": {
"event_type": "order.shipped",
"order_id": "ORD-123",
"tracking_number": "1Z999AA10123456784",
"carrier": "UPS"
},
"timestamp": "2026-06-13T10:00:00Z",
"signature": "sha256=abc..."
}
payload = WebhookPayload.model_validate(raw)
print(type(payload.event)) # <class 'OrderShippedEvent'>
print(payload.event.carrier) # UPS
Without discriminated unions, you’d write a parser that tries each type in order, or a factory function with isinstance checks. Pydantic handles it cleanly and reports validation errors with the correct field context.
Serialization Control
v2 gives you fine-grained control over what gets serialized and how:
from pydantic import BaseModel, field_serializer
from pydantic import model_serializer
from datetime import datetime
class UserResponse(BaseModel):
id: int
email: str
created_at: datetime
password_hash: str # Never serialize this
internal_notes: str # Serialize only in some contexts
@field_serializer('created_at')
def serialize_created_at(self, dt: datetime) -> str:
return dt.isoformat()
def model_dump(self, **kwargs):
# Always exclude sensitive fields
kwargs.setdefault('exclude', set())
kwargs['exclude'].add('password_hash')
return super().model_dump(**kwargs)
user = UserResponse(
id=1,
email="user@example.com",
created_at=datetime.now(),
password_hash="$2b$12$...",
internal_notes="VIP customer"
)
# Safe for API responses — password_hash excluded
print(user.model_dump())
For API responses where different callers get different fields, use model_dump(include=...) or model_dump(exclude=...) at the call site rather than on the model. Keeping serialization logic in the model gets messy when different consumers need different views.
Performance Profiling
v2’s performance improvements are real, but the gains are most visible when you’re deserializing large lists:
import time
from pydantic import BaseModel
import json
class Item(BaseModel):
id: int
name: str
value: float
# Generate test data
items_json = json.dumps([{"id": i, "name": f"item-{i}", "value": i * 1.5} for i in range(10000)])
items_data = json.loads(items_json)
# TypeAdapter for list validation (faster than [Item.model_validate(x) for x in data])
from pydantic import TypeAdapter
item_list_adapter = TypeAdapter(list[Item])
start = time.perf_counter()
items = item_list_adapter.validate_python(items_data)
elapsed = time.perf_counter() - start
print(f"Validated {len(items)} items in {elapsed*1000:.1f}ms")
TypeAdapter is the right tool for validating lists and other generic types without creating a wrapper model. It’s noticeably faster than constructing a wrapping model with a list field.
What v2 Breaks From v1
If you’re maintaining legacy code, three common v1 patterns that require migration:
@validator becomes @field_validator with @classmethod: Every v1 validator needs the classmethod decorator or it raises at import time.
orm_mode = True becomes from_attributes = True: Config class becomes model_config = ConfigDict(...).
.dict() and .json() become .model_dump() and .model_dump_json(): The old methods still work in v2 but emit deprecation warnings.
The migration is mostly mechanical for simple models. Complex models with custom validators and Config classes take more attention, but the Pydantic migration docs are thorough.
For new code in 2026, none of this matters — you start with v2 patterns. For existing code, run with PYDANTIC_WARNINGS=error in your CI to surface every deprecated usage before it becomes a problem in a future major version.
The most useful thing Pydantic does isn’t the validation itself — it’s the automatic JSON Schema generation. Every model you define can produce its own schema, which you can use to generate OpenAPI docs, validate against in tests, or send to a frontend for form generation. That’s a lot of derived value from keeping your data models in one place.
Sponsored
More from this category
More from Web Development
Sponsored
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored