The Vercel AI SDK in 2026: Streaming, Tool Calls, and Multi-Step Agents

When Vercel shipped the AI SDK in 2023, it was a thin wrapper around a few LLM APIs with streaming support for Next.js. In 2026, it’s something substantially more useful: a provider-agnostic abstraction layer with primitives for streaming, tool use, structured outputs, and multi-step agent flows, usable in any JavaScript runtime, not just Next.js.

The library is popular enough that its patterns have become a de facto standard. If you’re building an AI feature in a JavaScript app in 2026 and not using it, you’re probably either on a specialized stack or reinventing things the SDK already does well.

This post covers the core primitives, how they work in practice, and where you’ll still run into friction.

Provider Abstraction

The central design choice is that the SDK abstracts over model providers. You import a provider, construct a model reference, and the SDK handles the API call format, token counting, and error normalization.

// OpenAI
import { openai } from '@ai-sdk/openai';
const model = openai('gpt-4o');

// Anthropic
import { anthropic } from '@ai-sdk/anthropic';
const model = anthropic('claude-opus-4-6');

// Google
import { google } from '@ai-sdk/google';
const model = google('gemini-2.0-flash');

// Mistral, Groq, Cohere, and others have official or community providers

Switching providers is a one-line change at the model construction point. The rest of the code stays the same. This isn’t just convenience — it’s how teams do cost/quality A/B testing without rewriting their prompting logic.

generateText: The Simplest Case

For non-streaming, single-turn text generation:

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const { text, usage } = await generateText({
  model: openai('gpt-4o'),
  system: 'You are a concise technical writer.',
  prompt: 'Explain database indexing in two sentences.',
});

console.log(text);
// "A database index is a data structure that stores a subset of a table's
//  columns in a format optimized for quick lookups. Without an index,
//  the database scans every row; with one, it jumps directly to matching rows."

console.log(usage);
// { promptTokens: 28, completionTokens: 47, totalTokens: 75 }

The usage return is consistent across providers, which makes token budget tracking feasible without provider-specific instrumentation.

streamText: The Pattern You’ll Use Most

For user-facing AI features, streaming is almost always the right choice. Users see text appearing in real time rather than waiting for the full response, which dramatically improves perceived responsiveness.

import { streamText } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';

const result = await streamText({
  model: anthropic('claude-opus-4-6'),
  prompt: 'Explain how HTTPS certificates work.',
});

// Result is an async iterable
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

// Or consume the full text after streaming
const finalText = await result.text;
const usage = await result.usage;

In a Next.js route handler (or any Node.js server), the SDK provides helpers to turn the stream into an HTTP response:

// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o'),
    messages, // Array of { role: 'user' | 'assistant', content: string }
    system: 'You are a helpful assistant.',
  });

  return result.toDataStreamResponse();
}

On the client, the useChat hook consumes the stream:

'use client';
import { useChat } from 'ai/react';

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div>
      <div>
        {messages.map((m) => (
          <div key={m.id} className={m.role === 'user' ? 'user' : 'assistant'}>
            {m.content}
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

The hook handles the streaming connection, message history, and loading state. You write the UI.

Tool Calls: Giving the Model Actions

Tool calls (also called function calling) let the model invoke functions you define. The model decides when to call a tool based on context, and the SDK handles the back-and-forth protocol.

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const result = await generateText({
  model: openai('gpt-4o'),
  tools: {
    getWeather: tool({
      description: 'Get the current weather for a city',
      parameters: z.object({
        city: z.string().describe('The city name'),
        unit: z.enum(['celsius', 'fahrenheit']).default('celsius'),
      }),
      execute: async ({ city, unit }) => {
        // Call your actual weather API here
        return { temperature: 22, condition: 'partly cloudy', unit };
      },
    }),
    searchProducts: tool({
      description: 'Search the product catalog',
      parameters: z.object({
        query: z.string(),
        maxResults: z.number().int().min(1).max(20).default(5),
      }),
      execute: async ({ query, maxResults }) => {
        const results = await db.products.search(query, { limit: maxResults });
        return results.map((p) => ({ id: p.id, name: p.name, price: p.price }));
      },
    }),
  },
  toolChoice: 'auto', // model decides whether to use tools
  maxSteps: 5, // allow multi-step (model calls tool, gets result, continues)
  prompt: 'What is the weather in Berlin, and find me 3 waterproof jackets under €100?',
});

console.log(result.text);
// "The weather in Berlin is currently 22°C and partly cloudy.
//  Here are 3 waterproof jackets under €100: ..."

The maxSteps parameter is important. Without it, the SDK stops after the first tool call and returns the tool result without the model’s final summary. Setting maxSteps to 3-5 allows the model to call tools, incorporate results, and produce a final response — which is usually what you want.

Structured Output: Reliable JSON

For programmatic use cases where you need structured data out, generateObject guarantees a schema-conforming response:

import { generateObject } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { z } from 'zod';

const { object } = await generateObject({
  model: anthropic('claude-opus-4-6'),
  schema: z.object({
    summary: z.string(),
    sentiment: z.enum(['positive', 'negative', 'neutral']),
    topics: z.array(z.string()).max(5),
    confidence: z.number().min(0).max(1),
  }),
  prompt: 'Analyze this customer review: "Delivery was fast but the packaging was damaged and one item was missing."',
});

console.log(object);
// {
//   summary: "Fast delivery but packaging damage and missing item",
//   sentiment: "negative",
//   topics: ["delivery", "packaging", "missing items"],
//   confidence: 0.91
// }

The SDK uses the provider’s structured output mode where available (OpenAI’s JSON mode, Anthropic’s tool-use for structured output) and falls back to prompt-based JSON extraction with validation and retries. You get a typed object, not a string to parse.

The `useObject` Hook for Streaming Structured Data

When you want to stream structured data to the client as it’s generated (useful for long-form structured content like form population or report generation):

// Server route
import { streamObject } from 'ai';

export async function POST(req: Request) {
  const result = streamObject({
    model: openai('gpt-4o'),
    schema: z.object({
      title: z.string(),
      sections: z.array(z.object({
        heading: z.string(),
        content: z.string(),
      })),
    }),
    prompt: req.body.prompt,
  });

  return result.toTextStreamResponse();
}

// Client
import { experimental_useObject as useObject } from 'ai/react';

function ReportGenerator() {
  const { object, submit, isLoading } = useObject({
    api: '/api/generate-report',
    schema: reportSchema,
  });

  return (
    <div>
      <button onClick={() => submit({ prompt: 'Generate Q1 summary' })}>
        Generate
      </button>
      {object?.title && <h1>{object.title}</h1>}
      {object?.sections?.map((section, i) => (
        <section key={i}>
          <h2>{section?.heading}</h2>
          <p>{section?.content}</p>
        </section>
      ))}
    </div>
  );
}

The object appears incrementally as the model generates it. Sections fill in as they’re streamed. This is a better UX than waiting for the full JSON before rendering anything.

Where the Sharp Edges Are

Error handling across providers is inconsistent. The SDK normalizes some errors into APICallError, but rate limit handling, quota errors, and model-specific error codes vary. Build explicit retry logic with exponential backoff rather than relying on the SDK to handle it for you.

Token limits and context windows require manual management for long conversations. useChat keeps the full conversation history in state by default. At 50+ exchanges, you’ll hit token limits for some models. The SDK doesn’t truncate automatically. Implement a window function yourself — keep the last N messages or summarize older ones before sending.

Streaming + tool calls = tricky state. When a stream involves multiple tool calls (maxSteps > 1), the useChat hook’s messages state can get into intermediate states that require careful handling in the UI. The model’s “thinking aloud” tool calls are visible as assistant messages with tool_call content, which you may or may not want to display.

Cost tracking requires external instrumentation. usage is returned per call, but there’s no built-in aggregation or budget enforcement. Integrate with a tool like LangSmith, Braintrust, or your own telemetry pipeline if you need per-user or per-feature cost tracking.

When to Use It

The SDK is well-suited for: any AI feature in a JavaScript/TypeScript app, multi-provider deployments where you want flexibility, streaming chat interfaces, structured extraction pipelines, and simple agent flows with defined tools.

It’s less suited for: Python backends (use the native provider SDKs or LangChain), complex agent orchestration requiring fine-grained control over execution flow (LangGraph or custom state machines give you more control), and applications where per-request observability and replay are critical (the SDK’s telemetry hooks are limited).

The API surface has stabilized enough in version 4.x that it’s reasonable to build on. The major version bumps of 2023-2024 that broke interfaces are behind it. For most teams shipping AI features in JavaScript in 2026, it’s the right starting point.

The Vercel AI SDK in 2026: Streaming, Tool Calls, and Multi-Step Agents

Provider Abstraction

generateText: The Simplest Case

streamText: The Pattern You’ll Use Most

Tool Calls: Giving the Model Actions

Structured Output: Reliable JSON

The `useObject` Hook for Streaming Structured Data

Where the Sharp Edges Are

When to Use It

Progressive Web Apps in 2026: What Actually Works on iOS and Android

Agency Retainer Models: Pricing Ongoing Work Without Burning Out

More from AI Integration

AI in E-Commerce: What's Actually Working in 2026

AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast

LLM Hallucination in Production: Mitigation Strategies That Actually Work

Working notes from
the studio.

Join the conversation.

Provider Abstraction

generateText: The Simplest Case

streamText: The Pattern You’ll Use Most

Tool Calls: Giving the Model Actions

Structured Output: Reliable JSON

The useObject Hook for Streaming Structured Data

Where the Sharp Edges Are

When to Use It

Progressive Web Apps in 2026: What Actually Works on iOS and Android

Agency Retainer Models: Pricing Ongoing Work Without Burning Out

More from AI Integration

AI in E-Commerce: What's Actually Working in 2026

AI-Assisted Technical Documentation: Keeping Docs Accurate When Code Changes Fast

LLM Hallucination in Production: Mitigation Strategies That Actually Work

Working notes fromthe studio.

Join the conversation.

The `useObject` Hook for Streaming Structured Data

Working notes from
the studio.