pgvector in Practice: Semantic Search in Postgres Without a Separate Vector DB

Adding AI-powered search to an existing app usually means one of two things: either you stand up a separate vector database (Pinecone, Qdrant, Weaviate), or you figure out that your Postgres installation already supports vectors.

The second path is significantly less complicated and handles more production use cases than most developers expect. pgvector is a Postgres extension that adds a vector column type, similarity operators, and indexes optimized for nearest-neighbor search. If you’re already on Postgres, you’re probably closer to semantic search than you think.

What pgvector Actually Does

Semantic search works by converting text into embeddings, which are high-dimensional vectors where similar meanings end up geometrically close to each other. “Fast car” and “speedy automobile” have different words but similar embeddings. pgvector stores those vectors in Postgres and lets you query for the N nearest neighbors efficiently.

The extension adds:

A vector(n) column type for n-dimensional float vectors
Three distance operators: <-> (L2/Euclidean), <#> (negative inner product), <=> (cosine distance)
Two index types: IVFFlat and HNSW

Nothing about this requires a separate database. Your embeddings live in the same rows as the content they describe, with all the relational structure you already have.

Setup

On a fresh Postgres 15+ installation, the extension is usually available through your package manager:

# Ubuntu/Debian
sudo apt install postgresql-16-pgvector

# macOS with Homebrew
brew install pgvector

# Docker
docker run -e POSTGRES_PASSWORD=password ankane/pgvector

Managed cloud Postgres services have broad pgvector support now. AWS RDS Postgres supports it as of Postgres 15.2, Supabase includes it by default, and Neon and PlanetScale have both added it.

Enable it in your database:

CREATE EXTENSION IF NOT EXISTS vector;

Schema Design

For a content search use case, like a knowledge base with articles, the schema looks like:

CREATE TABLE articles (
    id          SERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    category    TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    embedding   vector(1536)  -- dimensions match your embedding model
);

The dimension count must match what your embedding model produces. OpenAI’s text-embedding-3-small produces 1536-dimensional vectors. Cohere’s English model produces 1024. Google’s Gecko produces 768. Pick one model and stick to it; you can’t mix dimensions.

Generating and Storing Embeddings

Using the OpenAI SDK as an example, the pattern for generating and storing embeddings:

import openai
import psycopg2
import json

client = openai.OpenAI()

def embed(text: str) -> list[float]:
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

def store_article(conn, title: str, content: str, category: str):
    # Combine title and content for richer embeddings
    text_to_embed = f"{title}\n\n{content}"
    embedding = embed(text_to_embed)
    
    with conn.cursor() as cur:
        cur.execute(
            """
            INSERT INTO articles (title, content, category, embedding)
            VALUES (%s, %s, %s, %s::vector)
            """,
            (title, content, category, str(embedding))
        )
    conn.commit()

For bulk ingestion, batching the embedding API calls matters:

def store_articles_bulk(conn, articles: list[dict]):
    texts = [f"{a['title']}\n\n{a['content']}" for a in articles]
    
    # Batch API call: much cheaper than individual calls
    response = client.embeddings.create(
        input=texts,
        model="text-embedding-3-small"
    )
    embeddings = [item.embedding for item in response.data]
    
    with conn.cursor() as cur:
        cur.executemany(
            """
            INSERT INTO articles (title, content, category, embedding)
            VALUES (%s, %s, %s, %s::vector)
            """,
            [
                (a["title"], a["content"], a["category"], str(e))
                for a, e in zip(articles, embeddings)
            ]
        )
    conn.commit()

Querying

The basic similarity query uses the cosine distance operator <=>:

-- Find the 10 most similar articles to a query embedding
SELECT
    id,
    title,
    category,
    1 - (embedding <=> '[0.1, 0.2, ...]'::vector) AS similarity
FROM articles
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

Cosine distance is typically the right choice for text embeddings because it’s magnitude-invariant, meaning short and long documents can still match well if the content is similar. Use L2 (<->) for embeddings from models specifically trained with L2 distance (some image embedding models), and inner product (<#>) when your embeddings are normalized and you want maximum query performance.

In application code:

def search(conn, query: str, limit: int = 10, category: str = None) -> list[dict]:
    query_embedding = embed(query)
    
    sql = """
        SELECT id, title, category,
               1 - (embedding <=> %s::vector) AS similarity
        FROM articles
        WHERE (%s IS NULL OR category = %s)
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """
    
    with conn.cursor() as cur:
        cur.execute(sql, (
            str(query_embedding),
            category, category,
            str(query_embedding),
            limit
        ))
        rows = cur.fetchall()
    
    return [
        {"id": r[0], "title": r[1], "category": r[2], "similarity": r[3]}
        for r in rows
    ]

Indexing: IVFFlat vs HNSW

Without an index, pgvector does an exact nearest-neighbor scan, checking every row. This is fine up to maybe 50,000 rows. Beyond that, you need an index.

IVFFlat (Inverted File Flat) clusters vectors into lists and searches only the nearest clusters. It’s fast to build but does approximate search:

-- Build after loading data (not before: the clustering depends on the distribution)
CREATE INDEX ON articles USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);  -- sqrt(row_count) is a reasonable starting point

Set ivfflat.probes to control the tradeoff between accuracy and speed at query time:

SET ivfflat.probes = 10;  -- search 10 of the 100 lists

HNSW (Hierarchical Navigable Small World) builds a graph structure that provides better recall (accuracy) than IVFFlat, especially at high dimensions. It’s slower to build and uses more memory, but query performance is excellent:

CREATE INDEX ON articles USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

The m parameter controls graph connectivity (higher = better recall, more memory), and ef_construction controls build quality (higher = better index, slower build). The defaults work well for most cases.

For most production use cases with under a few million rows, HNSW is worth the build time. Set hnsw.ef_search at query time to tune recall vs. speed:

SET hnsw.ef_search = 100;  -- higher = better recall, slower queries

Combining Vector Search with Regular Filters

The real advantage of pgvector over a standalone vector database shows up when you combine semantic search with relational filtering. A dedicated vector DB makes this awkward; in Postgres it’s just SQL:

-- Find similar articles, but only from the last 6 months and in one category
SELECT id, title, category, created_at,
       1 - (embedding <=> %s::vector) AS similarity
FROM articles
WHERE category = 'engineering'
  AND created_at > NOW() - INTERVAL '6 months'
ORDER BY embedding <=> %s::vector
LIMIT 10;

This query uses a regular B-tree index on category and created_at to filter rows before the vector search, which is much faster than searching all vectors and filtering after.

When to Move to a Dedicated Vector Database

pgvector handles most use cases well, but there are situations where a dedicated vector store makes more sense:

Volume above ~10 million vectors. pgvector’s HNSW index works well at this scale, but memory requirements become significant. A dedicated system like Qdrant or Weaviate handles this more efficiently.

Multi-modal or multi-index search. If you need to search across text embeddings, image embeddings, and structured filters with complex hybrid ranking, systems designed for this are more capable.

Your data team is already running a vector DB for other use cases. Don’t introduce a second system if one already exists.

High-throughput concurrent writes with real-time search. pgvector’s HNSW index takes write locks during updates, which can be a bottleneck in write-heavy workloads.

For everything else (a knowledge base, product catalog search, document similarity, customer support ticket routing), pgvector on your existing Postgres is almost certainly sufficient.

A Practical Starting Point

The minimum viable setup for a new project:

# requirements.txt
openai>=1.0.0
psycopg2-binary>=2.9
pgvector>=0.3.0  # Python client that handles vector serialization

from pgvector.psycopg2 import register_vector
import psycopg2

conn = psycopg2.connect("postgresql://user:pass@localhost/mydb")
register_vector(conn)  # handles vector type serialization automatically

# Now you can insert and query vectors directly as Python lists
with conn.cursor() as cur:
    cur.execute(
        "INSERT INTO articles (title, embedding) VALUES (%s, %s)",
        ("My Article", [0.1, 0.2, 0.3, ...])
    )

Semantic search doesn’t require a new database. If your product runs on Postgres, you’re one extension away from it.

pgvector in Practice: Semantic Search in Postgres Without a Separate Vector DB

What pgvector Actually Does

Setup

Schema Design

Generating and Storing Embeddings

Querying

Indexing: IVFFlat vs HNSW

Combining Vector Search with Regular Filters

When to Move to a Dedicated Vector Database

A Practical Starting Point

Database Connection Pooling in 2026: PgBouncer, Supabase, and Prisma Accelerate

Python 3.14: What's New and What Actually Matters

More from AI Integration

AI-Assisted Code Review: What the Tools Catch and Where Humans Still Win

LLM Observability in 2026: What to Track and Which Tools to Use

LLM Evals in Practice: Testing AI Features Before They Go Wrong

Working notes from
the studio.

Join the conversation.

What pgvector Actually Does

Setup

Schema Design

Generating and Storing Embeddings

Querying

Indexing: IVFFlat vs HNSW

Combining Vector Search with Regular Filters

When to Move to a Dedicated Vector Database

A Practical Starting Point

Database Connection Pooling in 2026: PgBouncer, Supabase, and Prisma Accelerate

Python 3.14: What's New and What Actually Matters

More from AI Integration

AI-Assisted Code Review: What the Tools Catch and Where Humans Still Win

LLM Observability in 2026: What to Track and Which Tools to Use

LLM Evals in Practice: Testing AI Features Before They Go Wrong

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.