AI Integration · AI Tooling
The Vercel AI SDK in 2026: Streaming, Tool Calls, and Multi-Step Agents
The Vercel AI SDK has become the default for building AI features in JavaScript apps. Here is what it actually does, how its core primitives work, and where the sharp edges still live.
Anurag Verma
8 min read
Sponsored
When Vercel shipped the AI SDK in 2023, it was a thin wrapper around a few LLM APIs with streaming support for Next.js. In 2026, it’s something substantially more useful: a provider-agnostic abstraction layer with primitives for streaming, tool use, structured outputs, and multi-step agent flows, usable in any JavaScript runtime, not just Next.js.
The library is popular enough that its patterns have become a de facto standard. If you’re building an AI feature in a JavaScript app in 2026 and not using it, you’re probably either on a specialized stack or reinventing things the SDK already does well.
This post covers the core primitives, how they work in practice, and where you’ll still run into friction.
Provider Abstraction
The central design choice is that the SDK abstracts over model providers. You import a provider, construct a model reference, and the SDK handles the API call format, token counting, and error normalization.
// OpenAI
import { openai } from '@ai-sdk/openai';
const model = openai('gpt-4o');
// Anthropic
import { anthropic } from '@ai-sdk/anthropic';
const model = anthropic('claude-opus-4-6');
// Google
import { google } from '@ai-sdk/google';
const model = google('gemini-2.0-flash');
// Mistral, Groq, Cohere, and others have official or community providers
Switching providers is a one-line change at the model construction point. The rest of the code stays the same. This isn’t just convenience — it’s how teams do cost/quality A/B testing without rewriting their prompting logic.
generateText: The Simplest Case
For non-streaming, single-turn text generation:
import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';
const { text, usage } = await generateText({
model: openai('gpt-4o'),
system: 'You are a concise technical writer.',
prompt: 'Explain database indexing in two sentences.',
});
console.log(text);
// "A database index is a data structure that stores a subset of a table's
// columns in a format optimized for quick lookups. Without an index,
// the database scans every row; with one, it jumps directly to matching rows."
console.log(usage);
// { promptTokens: 28, completionTokens: 47, totalTokens: 75 }
The usage return is consistent across providers, which makes token budget tracking feasible without provider-specific instrumentation.
streamText: The Pattern You’ll Use Most
For user-facing AI features, streaming is almost always the right choice. Users see text appearing in real time rather than waiting for the full response, which dramatically improves perceived responsiveness.
import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
const result = await streamText({
model: anthropic('claude-opus-4-6'),
prompt: 'Explain how HTTPS certificates work.',
});
// Result is an async iterable
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
// Or consume the full text after streaming
const finalText = await result.text;
const usage = await result.usage;
In a Next.js route handler (or any Node.js server), the SDK provides helpers to turn the stream into an HTTP response:
// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: openai('gpt-4o'),
messages, // Array of { role: 'user' | 'assistant', content: string }
system: 'You are a helpful assistant.',
});
return result.toDataStreamResponse();
}
On the client, the useChat hook consumes the stream:
'use client';
import { useChat } from 'ai/react';
export function ChatInterface() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
api: '/api/chat',
});
return (
<div>
<div>
{messages.map((m) => (
<div key={m.id} className={m.role === 'user' ? 'user' : 'assistant'}>
{m.content}
</div>
))}
</div>
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} disabled={isLoading} />
<button type="submit" disabled={isLoading}>Send</button>
</form>
</div>
);
}
The hook handles the streaming connection, message history, and loading state. You write the UI.
Tool Calls: Giving the Model Actions
Tool calls (also called function calling) let the model invoke functions you define. The model decides when to call a tool based on context, and the SDK handles the back-and-forth protocol.
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
const result = await generateText({
model: openai('gpt-4o'),
tools: {
getWeather: tool({
description: 'Get the current weather for a city',
parameters: z.object({
city: z.string().describe('The city name'),
unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
}),
execute: async ({ city, unit }) => {
// Call your actual weather API here
return { temperature: 22, condition: 'partly cloudy', unit };
},
}),
searchProducts: tool({
description: 'Search the product catalog',
parameters: z.object({
query: z.string(),
maxResults: z.number().int().min(1).max(20).default(5),
}),
execute: async ({ query, maxResults }) => {
const results = await db.products.search(query, { limit: maxResults });
return results.map((p) => ({ id: p.id, name: p.name, price: p.price }));
},
}),
},
toolChoice: 'auto', // model decides whether to use tools
maxSteps: 5, // allow multi-step (model calls tool, gets result, continues)
prompt: 'What is the weather in Berlin, and find me 3 waterproof jackets under €100?',
});
console.log(result.text);
// "The weather in Berlin is currently 22°C and partly cloudy.
// Here are 3 waterproof jackets under €100: ..."
The maxSteps parameter is important. Without it, the SDK stops after the first tool call and returns the tool result without the model’s final summary. Setting maxSteps to 3-5 allows the model to call tools, incorporate results, and produce a final response — which is usually what you want.
Structured Output: Reliable JSON
For programmatic use cases where you need structured data out, generateObject guarantees a schema-conforming response:
import { generateObject } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { z } from 'zod';
const { object } = await generateObject({
model: anthropic('claude-opus-4-6'),
schema: z.object({
summary: z.string(),
sentiment: z.enum(['positive', 'negative', 'neutral']),
topics: z.array(z.string()).max(5),
confidence: z.number().min(0).max(1),
}),
prompt: 'Analyze this customer review: "Delivery was fast but the packaging was damaged and one item was missing."',
});
console.log(object);
// {
// summary: "Fast delivery but packaging damage and missing item",
// sentiment: "negative",
// topics: ["delivery", "packaging", "missing items"],
// confidence: 0.91
// }
The SDK uses the provider’s structured output mode where available (OpenAI’s JSON mode, Anthropic’s tool-use for structured output) and falls back to prompt-based JSON extraction with validation and retries. You get a typed object, not a string to parse.
The useObject Hook for Streaming Structured Data
When you want to stream structured data to the client as it’s generated (useful for long-form structured content like form population or report generation):
// Server route
import { streamObject } from 'ai';
export async function POST(req: Request) {
const result = streamObject({
model: openai('gpt-4o'),
schema: z.object({
title: z.string(),
sections: z.array(z.object({
heading: z.string(),
content: z.string(),
})),
}),
prompt: req.body.prompt,
});
return result.toTextStreamResponse();
}
// Client
import { experimental_useObject as useObject } from 'ai/react';
function ReportGenerator() {
const { object, submit, isLoading } = useObject({
api: '/api/generate-report',
schema: reportSchema,
});
return (
<div>
<button onClick={() => submit({ prompt: 'Generate Q1 summary' })}>
Generate
</button>
{object?.title && <h1>{object.title}</h1>}
{object?.sections?.map((section, i) => (
<section key={i}>
<h2>{section?.heading}</h2>
<p>{section?.content}</p>
</section>
))}
</div>
);
}
The object appears incrementally as the model generates it. Sections fill in as they’re streamed. This is a better UX than waiting for the full JSON before rendering anything.
Where the Sharp Edges Are
Error handling across providers is inconsistent. The SDK normalizes some errors into APICallError, but rate limit handling, quota errors, and model-specific error codes vary. Build explicit retry logic with exponential backoff rather than relying on the SDK to handle it for you.
Token limits and context windows require manual management for long conversations. useChat keeps the full conversation history in state by default. At 50+ exchanges, you’ll hit token limits for some models. The SDK doesn’t truncate automatically. Implement a window function yourself — keep the last N messages or summarize older ones before sending.
Streaming + tool calls = tricky state. When a stream involves multiple tool calls (maxSteps > 1), the useChat hook’s messages state can get into intermediate states that require careful handling in the UI. The model’s “thinking aloud” tool calls are visible as assistant messages with tool_call content, which you may or may not want to display.
Cost tracking requires external instrumentation. usage is returned per call, but there’s no built-in aggregation or budget enforcement. Integrate with a tool like LangSmith, Braintrust, or your own telemetry pipeline if you need per-user or per-feature cost tracking.
When to Use It
The SDK is well-suited for: any AI feature in a JavaScript/TypeScript app, multi-provider deployments where you want flexibility, streaming chat interfaces, structured extraction pipelines, and simple agent flows with defined tools.
It’s less suited for: Python backends (use the native provider SDKs or LangChain), complex agent orchestration requiring fine-grained control over execution flow (LangGraph or custom state machines give you more control), and applications where per-request observability and replay are critical (the SDK’s telemetry hooks are limited).
The API surface has stabilized enough in version 4.x that it’s reasonable to build on. The major version bumps of 2023-2024 that broke interfaces are behind it. For most teams shipping AI features in JavaScript in 2026, it’s the right starting point.
Sponsored
More from this category
More from AI Integration
AI in E-Commerce: What's Actually Working in 2026
AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast
LLM Hallucination in Production: Mitigation Strategies That Actually Work
Sponsored
The dispatch
Working notes from
the studio.
A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.
Discussion
Join the conversation.
Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.
Sponsored