Skip to content

Web Development · Python

Pydantic v2: The Data Validation Layer Your Python APIs Actually Need

Pydantic v2 shipped with a Rust-based core that made validation 5-50x faster. But the performance is only part of it — the v2 model redesign changes how you structure validation logic for complex APIs.

Anurag Verma

Anurag Verma

7 min read

Pydantic v2: The Data Validation Layer Your Python APIs Actually Need

Sponsored

Share

Pydantic is the data validation library that most Python developers use without thinking about it. FastAPI uses it for request and response models. SQLModel uses it for database schemas. LangChain uses it for tool definitions. If you’re writing Python APIs in 2026, you’re almost certainly running Pydantic v2 already — and there’s a reasonable chance you’re using about 20% of what it can do.

This is not a migration guide (the Pydantic team has a good one). This is a guide to the parts of v2 that change how you structure validation logic in production code, and that most developers only discover after hitting a problem they didn’t know how to solve.

The Core Model

The basics haven’t changed much from v1:

from pydantic import BaseModel, EmailStr
from datetime import datetime
from typing import Optional

class User(BaseModel):
    id: int
    email: EmailStr
    name: str
    created_at: datetime
    is_active: bool = True
    bio: Optional[str] = None

# Validation runs on instantiation
user = User(id=1, email="user@example.com", name="Jane", created_at="2024-01-01T00:00:00Z")

# Access validated data
print(user.email)        # user@example.com
print(user.is_active)    # True
print(user.model_dump()) # Dict representation
print(user.model_json_schema()) # JSON Schema

What’s new is the performance. v2’s core is written in Rust via the pydantic-core library. Validation that took 5 microseconds in v1 takes 0.5 microseconds in v2. For an API handling thousands of requests per second, this matters.

Field Validation: The Right Way to Add Constraints

v2 uses Annotated for constraints instead of Field arguments. This is more composable:

from pydantic import BaseModel, Field
from pydantic.functional_validators import field_validator, model_validator
from typing import Annotated
from decimal import Decimal

# Annotated approach — constraints live in the type, not the field definition
PositivePrice = Annotated[Decimal, Field(gt=0, decimal_places=2)]
ShortString = Annotated[str, Field(min_length=1, max_length=100)]
SafeHtml = Annotated[str, Field(pattern=r'^[a-zA-Z0-9\s.,!?-]+$')]

class Product(BaseModel):
    name: ShortString
    price: PositivePrice
    description: SafeHtml
    sku: Annotated[str, Field(pattern=r'^[A-Z]{3}-\d{6}$')]

# Reuse the type anywhere
class OrderItem(BaseModel):
    product_name: ShortString
    quantity: Annotated[int, Field(gt=0, le=100)]
    unit_price: PositivePrice

The advantage of Annotated is that you define the constraint once and reuse the type. In v1, you’d repeat min_length=1, max_length=100 on every field. In v2, you define ShortString once.

Custom Validators

When built-in constraints aren’t enough, you need custom validators. v2 changed the API significantly:

from pydantic import BaseModel
from pydantic.functional_validators import field_validator, model_validator
from typing import Self
import re

class SignupRequest(BaseModel):
    username: str
    password: str
    password_confirm: str
    phone: str

    @field_validator('username')
    @classmethod
    def username_valid(cls, v: str) -> str:
        if not re.match(r'^[a-z0-9_]{3,20}$', v):
            raise ValueError('username must be 3-20 characters, lowercase letters, numbers, underscores only')
        return v.lower()  # Normalize to lowercase

    @field_validator('phone')
    @classmethod
    def phone_valid(cls, v: str) -> str:
        # Strip formatting, validate E.164
        cleaned = re.sub(r'[\s\-\(\)]', '', v)
        if not re.match(r'^\+\d{10,15}$', cleaned):
            raise ValueError('phone must be in E.164 format: +1234567890')
        return cleaned

    @model_validator(mode='after')
    def passwords_match(self) -> Self:
        if self.password != self.password_confirm:
            raise ValueError('passwords do not match')
        return self

Key differences from v1:

  • @field_validator replaces @validator with @classmethod required
  • @model_validator(mode='after') runs after all field validation and gives access to self for cross-field checks
  • Return the (possibly modified) value from field validators

The mode='before' variant on model_validator runs before field parsing and receives raw input:

@model_validator(mode='before')
@classmethod
def normalize_input(cls, data: dict) -> dict:
    # Normalize all string fields before validation
    return {
        k: v.strip() if isinstance(v, str) else v
        for k, v in data.items()
    }

Nested Models and Configuration

Pydantic handles nested validation automatically:

from pydantic import BaseModel, ConfigDict
from typing import list

class Address(BaseModel):
    street: str
    city: str
    country: str = "IN"
    postal_code: str

class Order(BaseModel):
    model_config = ConfigDict(
        str_strip_whitespace=True,  # Strip whitespace from all strings
        str_to_lower=False,
        validate_assignment=True,   # Re-validate when attributes are set
        frozen=False,               # Allow mutation
    )

    order_id: str
    items: list[OrderItem]
    shipping_address: Address
    billing_address: Address | None = None

    def effective_billing_address(self) -> Address:
        return self.billing_address or self.shipping_address

validate_assignment=True is useful for models that get updated after creation — it means order.order_id = "new-id" will re-run the order_id validator instead of setting the value directly.

Discriminated Unions for Event Schemas

One of the most useful v2 features for API design is discriminated unions. When you have multiple event types and need to deserialize them correctly based on a type field:

from pydantic import BaseModel
from typing import Literal, Annotated, Union
from pydantic import Field

class OrderCreatedEvent(BaseModel):
    event_type: Literal["order.created"]
    order_id: str
    customer_id: str
    total_amount: float

class OrderShippedEvent(BaseModel):
    event_type: Literal["order.shipped"]
    order_id: str
    tracking_number: str
    carrier: str

class OrderCancelledEvent(BaseModel):
    event_type: Literal["order.cancelled"]
    order_id: str
    reason: str
    refund_amount: float | None = None

WebhookEvent = Annotated[
    Union[OrderCreatedEvent, OrderShippedEvent, OrderCancelledEvent],
    Field(discriminator='event_type')
]

class WebhookPayload(BaseModel):
    event: WebhookEvent
    timestamp: str
    signature: str

# Pydantic picks the right model based on event_type
raw = {
    "event": {
        "event_type": "order.shipped",
        "order_id": "ORD-123",
        "tracking_number": "1Z999AA10123456784",
        "carrier": "UPS"
    },
    "timestamp": "2026-06-13T10:00:00Z",
    "signature": "sha256=abc..."
}

payload = WebhookPayload.model_validate(raw)
print(type(payload.event))  # <class 'OrderShippedEvent'>
print(payload.event.carrier)  # UPS

Without discriminated unions, you’d write a parser that tries each type in order, or a factory function with isinstance checks. Pydantic handles it cleanly and reports validation errors with the correct field context.

Serialization Control

v2 gives you fine-grained control over what gets serialized and how:

from pydantic import BaseModel, field_serializer
from pydantic import model_serializer
from datetime import datetime

class UserResponse(BaseModel):
    id: int
    email: str
    created_at: datetime
    password_hash: str  # Never serialize this
    internal_notes: str  # Serialize only in some contexts

    @field_serializer('created_at')
    def serialize_created_at(self, dt: datetime) -> str:
        return dt.isoformat()

    def model_dump(self, **kwargs):
        # Always exclude sensitive fields
        kwargs.setdefault('exclude', set())
        kwargs['exclude'].add('password_hash')
        return super().model_dump(**kwargs)

user = UserResponse(
    id=1,
    email="user@example.com",
    created_at=datetime.now(),
    password_hash="$2b$12$...",
    internal_notes="VIP customer"
)

# Safe for API responses — password_hash excluded
print(user.model_dump())

For API responses where different callers get different fields, use model_dump(include=...) or model_dump(exclude=...) at the call site rather than on the model. Keeping serialization logic in the model gets messy when different consumers need different views.

Performance Profiling

v2’s performance improvements are real, but the gains are most visible when you’re deserializing large lists:

import time
from pydantic import BaseModel
import json

class Item(BaseModel):
    id: int
    name: str
    value: float

# Generate test data
items_json = json.dumps([{"id": i, "name": f"item-{i}", "value": i * 1.5} for i in range(10000)])
items_data = json.loads(items_json)

# TypeAdapter for list validation (faster than [Item.model_validate(x) for x in data])
from pydantic import TypeAdapter

item_list_adapter = TypeAdapter(list[Item])

start = time.perf_counter()
items = item_list_adapter.validate_python(items_data)
elapsed = time.perf_counter() - start

print(f"Validated {len(items)} items in {elapsed*1000:.1f}ms")

TypeAdapter is the right tool for validating lists and other generic types without creating a wrapper model. It’s noticeably faster than constructing a wrapping model with a list field.

What v2 Breaks From v1

If you’re maintaining legacy code, three common v1 patterns that require migration:

@validator becomes @field_validator with @classmethod: Every v1 validator needs the classmethod decorator or it raises at import time.

orm_mode = True becomes from_attributes = True: Config class becomes model_config = ConfigDict(...).

.dict() and .json() become .model_dump() and .model_dump_json(): The old methods still work in v2 but emit deprecation warnings.

The migration is mostly mechanical for simple models. Complex models with custom validators and Config classes take more attention, but the Pydantic migration docs are thorough.

For new code in 2026, none of this matters — you start with v2 patterns. For existing code, run with PYDANTIC_WARNINGS=error in your CI to surface every deprecated usage before it becomes a problem in a future major version.

The most useful thing Pydantic does isn’t the validation itself — it’s the automatic JSON Schema generation. Every model you define can produce its own schema, which you can use to generate OpenAPI docs, validate against in tests, or send to a frontend for form generation. That’s a lot of derived value from keeping your data models in one place.

Sponsored

Sponsored

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored