AI Integration · Developer Productivity
AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast
Bad documentation is usually not a writing problem — it's a maintenance problem. AI tools are changing the equation by making initial doc generation cheap and doc refresh practical at scale.
Anurag Verma
8 min read
Sponsored
Documentation rot is one of the most expensive slow problems in software. A doc written six months ago described the system as it was then. If it’s still in the codebase but describing something that no longer exists, it’s worse than no documentation — it actively misleads the person reading it. Junior engineers burn hours debugging behavior that was changed in a pull request that updated the code but not the comment. Onboarding takes twice as long because the README describes infrastructure that was deprecated in Q3.
The core problem isn’t that engineers are lazy about writing docs. It’s that docs require a separate update path from code, and most teams don’t have a system that enforces that path. Linters catch missing types. Tests catch broken behavior. Nothing catches a docstring that still references config.js when the file was renamed to settings.ts in a refactor six months ago.
AI tools are starting to change this in specific, practical ways — not through magic, but through making the generation and refresh cycle cheap enough to actually happen.
Where AI Helps With Documentation
Initial Docstring and Comment Generation
The clearest ROI is on new code. Writing a function and then writing a good docstring for it is context-switching overhead. AI coding tools (Cursor, Copilot, Claude Code) can generate a draft docstring from the function body that’s usually 80% of the way to accurate.
The output isn’t always perfect. AI-generated docstrings tend to describe the mechanics (what the function does step by step) more than the contract (what the caller should know: preconditions, side effects, error cases). You still need a human to add the nuance, but starting from a draft is faster than starting from a blank line.
For Python, this kind of prompt generates a useful starting point:
def calculate_tiered_discount(base_price: float, quantity: int, customer_tier: str) -> float:
"""
[AI draft]: Calculates a tiered discount on the base price based on quantity
and customer tier. Returns the final price after discount.
Args:
base_price: The original item price in USD
quantity: Number of items being purchased
customer_tier: Customer segment ('standard', 'preferred', 'enterprise')
Returns:
Final price per unit after discount applied
Raises:
ValueError: If customer_tier is not a recognized tier
"""
TIER_MULTIPLIERS = {'standard': 1.0, 'preferred': 0.9, 'enterprise': 0.8}
if customer_tier not in TIER_MULTIPLIERS:
raise ValueError(f"Unknown tier: {customer_tier}")
quantity_discount = min(0.15, (quantity - 1) * 0.01)
return base_price * TIER_MULTIPLIERS[customer_tier] * (1 - quantity_discount)
The human review step: does the docstring describe the actual behavior? Does it mention that the quantity discount caps at 15%? Does it note that the enterprise discount stacks with the quantity discount? The AI got the structure right; the details need verification.
API Reference Documentation
For teams building APIs, keeping reference documentation in sync with actual endpoint behavior is a perennial problem. The approach that works: generate from code, not from prose.
If you use FastAPI or Django Ninja, you get OpenAPI spec generation automatically from your route definitions. The AI role here shifts to enriching that spec with human-readable descriptions:
from fastapi import FastAPI
from pydantic import BaseModel, Field
app = FastAPI()
class OrderRequest(BaseModel):
product_id: str = Field(description="UUID of the product being ordered")
quantity: int = Field(ge=1, le=1000, description="Units to order, 1-1000")
shipping_address_id: str = Field(description="ID from user's saved addresses")
@app.post(
"/orders",
summary="Create a new order",
description="""
Creates an order for the authenticated user. The order is placed immediately
and triggers warehouse allocation. Payment is captured within 24 hours.
If the product is out of stock, returns 409 Conflict with the restock date
in the response body. Orders cannot be cancelled after 30 minutes.
""",
response_description="The created order, including its UUID and initial status",
)
async def create_order(request: OrderRequest) -> OrderResponse:
...
The description field is where AI helps: you can paste the function body and ask for a plain-language explanation of the behavior, then edit it into the description field. The FastAPI spec then stays in sync with the code automatically because it’s derived from it — you’re only maintaining the prose, not the parameter list.
For teams on Express or Hono without schema-first routing, the pattern shifts: write the OpenAPI spec first (or generate it from your route files using a tool like openapi-typescript-codegen), then validate that your implementation matches the spec in CI.
Readme and Architecture Documentation
READMEs decay because they’re written once during initial setup and then only updated when someone notices they’re wrong — which is the worst possible maintenance trigger.
A more durable pattern: treat the README like infrastructure documentation and review it on a schedule tied to major changes, not ad-hoc. AI tools help here by flagging stale references.
Practically, this means running a script in CI that extracts claims from the README and checks them against the codebase:
# A simple check: does the README mention files that no longer exist?
grep -oE '\`[^`]+\.(ts|js|py|json|yaml|yml)\`' README.md \
| tr -d '`' \
| while read file; do
if [ ! -f "$file" ]; then
echo "README references missing file: $file"
fi
done
This doesn’t check every claim, but it catches the most common type of rot: references to files that were renamed or moved. Extend it to check command examples, configuration key names, and environment variable names.
For architecture documentation (diagrams, decision records), the AI-assisted workflow is: generate a first draft from the codebase, review and correct it, then export as code (Mermaid, PlantUML, Structurizr) rather than as an image file. Diagram-as-code in version control at least makes changes auditable.
graph TD
Client["Web Client"] -->|HTTPS| Gateway["API Gateway\n(Hono on CF Workers)"]
Gateway -->|JWT validation| Auth["Auth Service\n(Supabase Auth)"]
Gateway -->|Authorized requests| API["Business Logic\n(Django + FastAPI)"]
API -->|Read/Write| DB["PostgreSQL\n(Supabase)"]
API -->|Cache| Cache["Redis\n(Upstash)"]
API -->|Async tasks| Queue["Task Queue\n(Django Temporal)"]
A Mermaid diagram committed as code can be diffed when the architecture changes. A PNG cannot.
Changelog and Migration Guide Generation
Release notes and migration guides are documentation that’s genuinely painful to write well. AI tools handle this task better than most others because the inputs are structured: a diff of code changes, a list of breaking changes identified during review, and a template for the output format.
The workflow: provide the git diff between versions, specify which changes are breaking (the model will guess, but you should confirm), and ask for a changelog entry and a migration section. The output needs editing but saves the “starting from nothing” friction that makes changelogs deprioritized.
# Generate a structured prompt for changelog generation
git diff v1.4.0...v1.5.0 -- src/api/ | head -500
Then feed that diff to your model of choice with the prompt: “Summarize the API changes between these versions as a changelog entry. Identify breaking changes (removed parameters, changed return types, renamed endpoints) and document the migration steps.”
The result won’t be publication-ready, but it’s a much better starting point than asking a developer to reconstruct changes from memory a week after the code was written.
Where AI Documentation Help Falls Flat
Domain-specific context. A function that calculates SaaS subscription proration has business logic that a language model will approximate incorrectly unless it’s been given the business context. Documentation for this kind of code still requires the human who understands the domain to write the meaningful parts — the AI can only describe the mechanics.
Implicit constraints. Code often enforces constraints that aren’t visible in the function signature. “This function must only be called after initializeSession()” is the kind of precondition that’s obvious to the person who wrote the code and invisible to a model generating documentation from the function body alone.
Historical reasoning. Why does this code exist? Why was it implemented this way rather than the obvious alternative? These “why” questions are what decision records and inline comments should answer, and they’re exactly what AI tools can’t generate — because the reason is context that predates the code.
Building Documentation Discipline Without Heroics
The most useful shift is treating documentation as a type of test: something that runs against the code and fails if it’s wrong, rather than something maintained by individual goodwill.
Three lightweight practices that are actually maintained in real teams:
Doc-string linting. Tools like pydocstyle for Python and ESLint’s valid-jsdoc rule for JavaScript flag missing or malformed docstrings in CI. It doesn’t ensure the docs are accurate, but it ensures they exist. Start with public API functions only.
README staleness checks. The script above — checking for references to nonexistent files — runs in under a second. Add it to CI. It catches 30% of README rot with minimal effort.
Docs-required-for-merge. For APIs and public interfaces, require a documentation update as part of the PR checklist. This doesn’t work for all code, but for the parts that external teams depend on, the short-term overhead is worth the long-term payoff. GitHub PR templates make this frictionless to enforce.
The goal isn’t perfect documentation. Perfect documentation is a vanity project that documentation authors love and no one else reads. The goal is documentation accurate enough that a competent engineer can use your system without asking you questions — which is a substantially lower bar and a substantially more achievable one.
AI tools make the generation step cheaper. The discipline to keep it accurate is still a process design problem, not a model capability problem.
Sponsored
More from this category
More from AI Integration
Sponsored
The dispatch
Working notes from
the studio.
A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored