Building a Natural Language Database Query Tool — The QueryLytic Case Study

Every organization has a data bottleneck, and it is almost never the database itself. It is the gap between the people who have the questions and the people who know SQL. A marketing manager wants to know which campaign drove the most signups last quarter. A finance lead needs a breakdown of vendor payments by category. A product manager wants to see user retention by cohort. Each of these questions takes a SQL-literate engineer 2 minutes to answer, but the request queue, context switching, and back-and-forth clarifications turn that 2-minute query into a 2-day turnaround.

That bottleneck is exactly what QueryLytic was built to eliminate.

A B2B SaaS company approached CODERCOPS with a challenge: their 120-person organization had 8 data analysts serving 80+ business stakeholders. The analysts were drowning in ad-hoc query requests, spending 60% of their time writing one-off SQL queries instead of doing actual analysis. The company wanted a tool that would let anyone in the organization query their databases using plain English.

At CODERCOPS, we built QueryLytic in 10 weeks. This is the full case study.

Turning natural language into accurate SQL queries requires more than just calling an LLM

The Problem Space: Text-to-SQL

Text-to-SQL is one of the most active research areas in applied NLP. The idea is straightforward: take a natural language question, understand its intent, map it to the relevant database schema, generate a syntactically correct SQL query, execute it, and present the results in a human-readable format.

The reality is significantly harder. Consider this seemingly simple question:

“Show me our top 10 customers by revenue last year”

To generate the correct SQL, the system needs to:

Understand that “customers” maps to a customers table (or maybe accounts, or clients)
Know that “revenue” means SUM(order_total) from an orders table, joined on customer_id
Interpret “last year” as a date range filter: WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'
Apply GROUP BY, ORDER BY DESC, and LIMIT 10
Handle edge cases: should cancelled orders be excluded? Is revenue gross or net?

Academic benchmarks like Spider and WikiSQL show that even state-of-the-art models struggle with complex joins, subqueries, and ambiguous column references. Our job was to build a production system that handles these edge cases reliably for real business users.

Architecture Overview

+---------------------------------------------------------------+
|                      FRONTEND (React + TypeScript)              |
|                                                                 |
|  +------------------+  +------------------+  +---------------+  |
|  | Query Input      |  | Result Viewer    |  | Schema        |  |
|  | (Natural Language |  | (Table + Chart)  |  | Explorer      |  |
|  |  + Suggestions)  |  |                  |  |               |  |
|  +--------+---------+  +--------+---------+  +-------+-------+  |
|           |                     ^                    |           |
+-----------+---------------------+--------------------+-----------+
            |                     |                    |
            | REST API            | Results            | Schema API
            |                     |                    |
+-----------v---------------------+--------------------v-----------+
|                      BACKEND (Python + FastAPI)                   |
|                                                                   |
|  +------------------+  +------------------+  +-----------------+  |
|  | Query Engine     |  | Schema Manager   |  | Auth + RBAC     |  |
|  | (NLP Pipeline)   |  | (Auto-Discovery) |  | (JWT + Roles)   |  |
|  +--------+---------+  +--------+---------+  +-----------------+  |
|           |                     |                                  |
|  +--------v---------+  +-------v----------+                       |
|  | OpenAI GPT-4     |  | Schema Cache     |                       |
|  | (SQL Generation) |  | (Redis)          |                       |
|  +--------+---------+  +------------------+                       |
|           |                                                       |
|  +--------v---------+                                             |
|  | Query Validator   |                                            |
|  | + Sanitizer       |                                            |
|  +--------+---------+                                             |
|           |                                                       |
+-----------+-------------------------------------------------------+
            |
            | Validated SQL
            |
+-----------v-------------------------------------------------------+
|                    DATABASE CONNECTORS                              |
|                                                                     |
|  +----------------+  +--------------+  +------------------------+   |
|  | PostgreSQL     |  | MySQL        |  | MongoDB                |   |
|  | Connector      |  | Connector    |  | Connector (Aggregation |   |
|  | (psycopg2)     |  | (PyMySQL)    |  |  Pipeline Translation) |   |
|  +----------------+  +--------------+  +------------------------+   |
|                                                                     |
+---------------------------------------------------------------------+

The system has three major components: a React frontend for input and visualization, a Python/FastAPI backend that orchestrates the NLP pipeline, and a set of database connectors that handle multi-database support.

Frontend: React + TypeScript

The frontend was built with React 18 and TypeScript, bundled with Vite. We chose React over Next.js because QueryLytic is a single-page application with no SEO requirements. Server-side rendering would have added complexity without benefit.

The interface has three main panels:

Query Input Panel

The input is a single text field with autocomplete suggestions. As the user types, the system offers context-aware suggestions based on the connected database schema. If the user types “show me all orders from,” the autocomplete suggests table names and column values that could complete the sentence.

interface QuerySuggestion {
  text: string;
  type: 'table' | 'column' | 'value' | 'function';
  confidence: number;
  source: string; // which table/column this came from
}

const useQuerySuggestions = (
  input: string,
  schema: DatabaseSchema
): QuerySuggestion[] => {
  const [suggestions, setSuggestions] = useState<QuerySuggestion[]>([]);

  useEffect(() => {
    const debounced = debounce(async () => {
      if (input.length < 3) return;

      const tokens = tokenize(input);
      const lastToken = tokens[tokens.length - 1];

      // Match against schema elements
      const matches = schema.tables.flatMap(table => {
        const tableMatch = fuzzyMatch(lastToken, table.name);
        const columnMatches = table.columns.map(col => ({
          text: col.name,
          type: 'column' as const,
          confidence: fuzzyMatch(lastToken, col.name),
          source: table.name
        }));

        return [
          { text: table.name, type: 'table' as const,
            confidence: tableMatch, source: table.name },
          ...columnMatches
        ];
      });

      setSuggestions(
        matches
          .filter(m => m.confidence > 0.3)
          .sort((a, b) => b.confidence - a.confidence)
          .slice(0, 8)
      );
    }, 200);

    debounced();
    return () => debounced.cancel();
  }, [input, schema]);

  return suggestions;
};

Result Viewer

Query results are displayed in a tabular format by default, with one-click switching to chart visualizations. We integrated Recharts for bar charts, line charts, pie charts, and scatter plots. The system auto-detects the most appropriate chart type based on the result shape:

Result Shape	Auto-Selected Chart
1 text column + 1 numeric column	Bar chart
1 date column + 1 numeric column	Line chart
1 text column + 1 numeric column (< 8 rows)	Pie chart
2 numeric columns	Scatter plot
3+ columns or no clear pattern	Table only

Schema Explorer

A sidebar panel shows the connected database schema in a tree view: databases, tables, columns with types, and relationships. Users can click on any table or column to insert it as context into their query. This helps non-technical users understand what data is available without needing to read documentation.

The Query Translation Engine

The core of QueryLytic is the NLP pipeline that converts natural language to SQL. This is where the real engineering challenge lives.

Step 1: Schema Context Preparation

Before any query is sent to the LLM, we prepare a compressed schema representation. Sending the entire database schema (which can be hundreds of tables with thousands of columns) in every API call would be expensive and slow. Instead, we use a two-stage retrieval approach:

class SchemaContextBuilder:
    def __init__(self, schema: DatabaseSchema, embeddings_cache: dict):
        self.schema = schema
        self.embeddings = embeddings_cache

    def build_context(self, user_query: str, max_tables: int = 10) -> str:
        # Stage 1: Semantic search for relevant tables
        query_embedding = get_embedding(user_query)

        table_scores = []
        for table in self.schema.tables:
            # Embed table name + column names + sample values
            table_text = (
                f"{table.name}: "
                f"{', '.join(c.name for c in table.columns)}"
            )
            table_emb = self.embeddings.get(
                table.name, get_embedding(table_text)
            )

            similarity = cosine_similarity(query_embedding, table_emb)
            table_scores.append((table, similarity))

        # Stage 2: Include related tables (foreign keys)
        relevant_tables = sorted(
            table_scores, key=lambda x: x[1], reverse=True
        )
        selected = set()
        for table, score in relevant_tables[:max_tables]:
            selected.add(table.name)
            for fk in table.foreign_keys:
                selected.add(fk.referenced_table)

        # Build compressed schema string
        context_lines = []
        for table in self.schema.tables:
            if table.name in selected:
                cols = ", ".join(
                    f"{c.name} ({c.type}"
                    f"{'*' if c.is_primary_key else ''})"
                    for c in table.columns
                )
                context_lines.append(f"TABLE {table.name}: {cols}")
                for fk in table.foreign_keys:
                    context_lines.append(
                        f"  FK: {fk.column} -> "
                        f"{fk.referenced_table}.{fk.referenced_column}"
                    )

        return "\n".join(context_lines)

This typically reduces the schema context from thousands of tokens to a few hundred, making the API call faster and cheaper while keeping the relevant information intact.

Step 2: SQL Generation via LLM

The compressed schema context and user query are sent to OpenAI’s GPT-4 with a carefully engineered system prompt:

SQL_GENERATION_PROMPT = """
You are a SQL query generator. Given a database schema and a natural
language question, generate the correct SQL query.

RULES:
1. Generate ONLY the SQL query. No explanations, no markdown.
2. Use ONLY tables and columns that exist in the provided schema.
3. Always use explicit JOIN syntax (never implicit joins in WHERE).
4. Always alias tables in multi-table queries.
5. Use parameterized placeholders ($1, $2) for user-supplied values.
6. Default to PostgreSQL syntax unless told otherwise.
7. For ambiguous terms, prefer the most common business interpretation:
   - "revenue" = SUM of order/transaction amounts
   - "last month" = previous calendar month
   - "active users" = users with activity in last 30 days
   - "top N" = ORDER BY DESC LIMIT N
8. If the query cannot be answered from the schema, respond with:
   CANNOT_ANSWER: [reason]
9. Never generate DROP, DELETE, UPDATE, INSERT, ALTER, or TRUNCATE.
10. Always include a LIMIT clause (default 1000) to prevent
    runaway queries.

DATABASE SCHEMA:
{schema_context}

USER QUESTION:
{user_query}

SQL:
"""

Rule 9 is critical. QueryLytic is a read-only tool. We enforce this at multiple layers (prompt, validator, database permissions), but defense in depth is essential when AI is generating SQL that will be executed against production databases.

Step 3: Query Validation and Sanitization

The generated SQL goes through a validation pipeline before execution:

class QueryValidator:
    FORBIDDEN_KEYWORDS = {
        'DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER',
        'TRUNCATE', 'CREATE', 'GRANT', 'REVOKE', 'EXEC',
        'EXECUTE', 'xp_', 'sp_', 'INTO OUTFILE',
        'INTO DUMPFILE', 'LOAD_FILE'
    }

    def validate(
        self, sql: str, schema: DatabaseSchema
    ) -> ValidationResult:
        errors = []
        warnings = []

        # 1. Forbidden keyword check
        sql_upper = sql.upper()
        for keyword in self.FORBIDDEN_KEYWORDS:
            if keyword in sql_upper:
                errors.append(
                    f"Forbidden keyword detected: {keyword}"
                )

        # 2. Parse and validate table references
        parsed = sqlparse.parse(sql)[0]
        referenced_tables = extract_table_names(parsed)
        valid_tables = {t.name for t in schema.tables}

        for table in referenced_tables:
            if table not in valid_tables:
                errors.append(f"Unknown table: {table}")

        # 3. Check for LIMIT clause
        if 'LIMIT' not in sql_upper:
            warnings.append(
                "No LIMIT clause. Adding LIMIT 1000."
            )
            sql = sql.rstrip(';') + ' LIMIT 1000;'

        # 4. Estimate query cost (EXPLAIN)
        cost_estimate = self.estimate_cost(sql)
        if cost_estimate > 10000:
            warnings.append(
                f"High estimated cost ({cost_estimate}). "
                "Query may be slow."
            )

        # 5. Check for Cartesian products
        if self.has_cartesian_product(parsed):
            errors.append(
                "Potential Cartesian product detected."
            )

        return ValidationResult(
            is_valid=len(errors) == 0,
            errors=errors,
            warnings=warnings,
            sanitized_sql=sql
        )

If validation fails, the error is sent back to the LLM with a correction prompt, and the system retries up to 2 times. In practice, the first-attempt success rate is 87%, and after one retry it rises to 96%.

Step 4: Execution and Result Formatting

Validated queries are executed against the target database using a read-only database user with SELECT-only permissions. Results are returned as JSON and rendered in the frontend’s result viewer.

Multi-Database Support

One of QueryLytic’s key features is support for multiple database engines. The client’s organization used PostgreSQL for their primary application database, MySQL for a legacy billing system, and MongoDB for event tracking data.

PostgreSQL and MySQL

SQL generation for PostgreSQL and MySQL is handled by the same LLM pipeline, with dialect-specific adjustments in the system prompt:

Feature	PostgreSQL	MySQL
String concatenation	`\|\|` operator	`CONCAT()` function
Date functions	`DATE_TRUNC()`	`DATE_FORMAT()`
Boolean type	Native `BOOLEAN`	`TINYINT(1)`
LIMIT offset	`LIMIT n OFFSET m`	`LIMIT m, n`
JSON access	`->`, `->>` operators	`JSON_EXTRACT()`

The system detects the database type from the connection configuration and adjusts the prompt accordingly.

MongoDB: The Harder Problem

MongoDB does not use SQL. Queries are expressed as aggregation pipelines, a fundamentally different paradigm. We built a separate translation path for MongoDB:

Natural Language -> LLM -> MongoDB Aggregation Pipeline (JSON)

The MongoDB prompt generates an aggregation pipeline as a JSON array instead of a SQL string:

MONGO_GENERATION_PROMPT = """
You are a MongoDB aggregation pipeline generator. Given a
collection schema and a natural language question, generate the
correct aggregation pipeline.

Output ONLY a valid JSON array of pipeline stages.

Example:
Question: "How many orders per customer last month?"
Output:
[
  {
    "$match": {
      "created_at": {
        "$gte": "2026-01-01",
        "$lt": "2026-02-01"
      }
    }
  },
  {"$group": {"_id": "$customer_id", "count": {"$sum": 1}}},
  {"$sort": {"count": -1}},
  {"$limit": 100}
]

COLLECTION SCHEMA:
{schema_context}

QUESTION:
{user_query}

PIPELINE:
"""

MongoDB queries go through a similar validation step (checking for forbidden operations like $out, $merge, $delete) before execution.

Schema Auto-Learning

When a new database is connected, QueryLytic automatically discovers and indexes the schema:

Schema Discovery Flow:

1. Connect to database
         |
         v
2. Query information_schema (SQL) or
   listCollections + sample (MongoDB)
         |
         v
3. Extract: tables, columns, types,
   primary keys, foreign keys, indexes
         |
         v
4. Sample 100 rows per table to detect:
   - Enum-like columns (< 50 distinct values)
   - Date formats
   - Null percentages
   - Value distributions
         |
         v
5. Generate embeddings for each table
   (table name + column names + sample values)
         |
         v
6. Cache schema + embeddings in Redis
   (TTL: 1 hour, manual refresh available)
         |
         v
7. Build business glossary suggestions
   ("revenue" likely maps to orders.total_amount)

The business glossary is a key usability feature. Administrators can define aliases that map business terms to specific columns or expressions:

{
  "revenue": "SUM(orders.total_amount)",
  "active users": "users WHERE last_login > NOW() - INTERVAL '30 days'",
  "churn rate": "(churned_users / total_users) * 100",
  "MRR": "SUM(subscriptions.monthly_amount) WHERE status = 'active'",
  "CAC": "SUM(marketing_spend) / COUNT(new_customers)"
}

These aliases are injected into the LLM prompt context, dramatically improving accuracy for domain-specific queries.

Security and Access Control

QueryLytic handles database credentials, which makes security non-negotiable. We implemented multiple layers:

Role-Based Access Control (RBAC)

Role	Permissions
Viewer	Run queries against pre-approved tables only
Analyst	Run queries against any table, save queries, create dashboards
Admin	All analyst permissions + manage database connections + manage users
Super Admin	All admin permissions + view audit logs + configure security policies

Row-Level Security

For sensitive tables (e.g., employee salaries, customer PII), administrators can define row-level filters that are automatically appended to every query:

# Example: Sales reps can only see their own customer data
row_filters = {
    "customers": {
        "role:sales_rep": (
            "WHERE assigned_rep_id = {current_user_id}"
        )
    },
    "employee_salaries": {
        "role:hr_manager": "",  # No filter for HR
        "role:*": "WHERE 1=0"  # Block everyone else
    }
}

Audit Logging

Every query executed through QueryLytic is logged with:

Timestamp
User ID and role
Natural language input
Generated SQL
Execution time
Row count returned
IP address

Audit logs are stored in a separate database and retained for 12 months. This was a compliance requirement from the client’s security team.

Credential Management

Database credentials are encrypted at rest using AES-256 and stored in Azure Key Vault. The application retrieves credentials at runtime and never logs or exposes them. Connection strings are never sent to the LLM — only the schema structure.

Results Visualization

QueryLytic does not just return raw data. It generates context-aware visualizations:

Result Analysis Flow:

Raw Query Results
       |
       v
Column Type Detection
(numeric, categorical, temporal, text)
       |
       v
Shape Analysis
(rows x columns, cardinality, distributions)
       |
       v
Chart Type Selection
(rules-based, see table above)
       |
       v
Render with Recharts
(interactive, exportable as PNG/CSV)

Users can override the auto-selected chart type, customize colors and labels, and export results as CSV, Excel, or PNG. Saved queries can be pinned to personal dashboards for repeated access.

Prompt Engineering: What We Learned

Building a production text-to-SQL system taught us hard lessons about prompt engineering:

Lesson 1: Few-shot examples are essential

The base prompt with rules alone achieved 72% accuracy on our test suite. Adding 5 diverse few-shot examples (covering joins, aggregations, date filtering, subqueries, and NULL handling) jumped accuracy to 89%. Adding 10 examples pushed it to 93%.

Lesson 2: Schema compression matters more than you think

Sending the full schema (200+ tables) caused the model to hallucinate non-existent columns. Our semantic search-based schema compression reduced hallucinations by 78%.

Lesson 3: Error recovery prompts need specificity

When a generated query fails validation, the retry prompt must include the specific error:

Bad:  "The previous query was incorrect. Try again."

Good: "The previous query referenced table 'customer' which
       does not exist. Available tables containing customer
       data: 'customers', 'customer_orders',
       'customer_profiles'. Generate a corrected query."

The specific prompt achieved 91% correction rate on retry. The vague prompt achieved only 34%.

Lesson 4: Ambiguity handling requires explicit defaults

Business language is inherently ambiguous. “Last quarter” could mean Q4 2025 or Q1 2026 depending on context and the user’s fiscal year. We established explicit defaults documented in the business glossary, and the system shows users what assumptions were made:

Your query: "Show me revenue by region last quarter"
Interpreted as: Q4 2025 (Oct 1 - Dec 31, 2025)
Change interpretation? [Q3 2025] [Q1 2026] [Custom range]

Performance Metrics

After 4 months in production, here are the numbers:

Metric	Result
Total queries processed	34,000+
First-attempt accuracy	87%
Accuracy after retry	96%
Average query generation time	2.1 seconds
Average end-to-end time (generation + execution)	3.4 seconds
Ad-hoc SQL requests to data team	Reduced 73%
Average time-to-insight (before)	4-48 hours
Average time-to-insight (after)	Under 30 seconds
Monthly active users	94 of 120 employees (78%)
Cost per query (OpenAI API)	~$0.03
Monthly API cost	~$850

The 73% reduction in ad-hoc SQL requests freed the data team to focus on complex analyses, dashboards, and data infrastructure improvements — the work they were actually hired to do.

Cost Analysis

Cost Component	Monthly
OpenAI API (GPT-4)	$850
Azure App Service (Backend)	$120
Azure Cache for Redis	$55
Azure Key Vault	$15
Monitoring (Datadog)	$45
Total	$1,085

For context, the 8 data analysts each spend approximately 60% of their time on ad-hoc requests. At an average fully loaded cost of Rs.1,50,000/month per analyst, that is Rs.7,20,000/month (roughly $8,600) in analyst time spent on queries that QueryLytic can now handle. The ROI is clear.

Limitations and Honest Assessment

We believe in transparent case studies. QueryLytic is not perfect:

Complex multi-step analytical queries (e.g., “What percentage of customers who bought Product A in Q1 also bought Product B in Q2?”) sometimes require 2-3 attempts or manual SQL refinement.
MongoDB support lags behind SQL databases in accuracy (81% first-attempt vs. 89% for PostgreSQL).
Very large result sets (100K+ rows) cause frontend rendering lag. We addressed this with virtualized scrolling, but chart rendering with 100K data points remains slow.
The system cannot write data. This is by design, but some users have requested INSERT/UPDATE capabilities for simple data corrections. We have resisted this for security reasons.

Conclusion

QueryLytic demonstrates that the text-to-SQL problem, while not fully solved in the academic sense, is production-ready for real business environments when you combine good prompt engineering, robust validation, and thoughtful UX that handles uncertainty gracefully.

The key insight from this project is that the AI does not need to be perfect. It needs to be fast, mostly right, and transparent about its assumptions. A 96% accuracy rate with clear assumption display and easy retry is dramatically more useful than waiting 2 days for a 100% accurate response from a busy data analyst.

At CODERCOPS, we have seen firsthand how AI-powered data access tools transform the relationship between business teams and their data. If your organization is sitting on valuable data that is locked behind SQL expertise, talk to us about building your own QueryLytic.

Building a Natural Language Database Query Tool — The QueryLytic Case Study

The Problem Space: Text-to-SQL

Architecture Overview

Frontend: React + TypeScript

Query Input Panel

Result Viewer

Schema Explorer

The Query Translation Engine

Step 1: Schema Context Preparation

Step 2: SQL Generation via LLM

Step 3: Query Validation and Sanitization

Step 4: Execution and Result Formatting

Multi-Database Support

PostgreSQL and MySQL

MongoDB: The Harder Problem

Schema Auto-Learning

Security and Access Control

Role-Based Access Control (RBAC)

Row-Level Security

Audit Logging

Credential Management

Results Visualization

Prompt Engineering: What We Learned

Lesson 1: Few-shot examples are essential

Lesson 2: Schema compression matters more than you think

Lesson 3: Error recovery prompts need specificity

Lesson 4: Ambiguity handling requires explicit defaults

Performance Metrics

Cost Analysis

Limitations and Honest Assessment

Conclusion

WebAssembly Is Quietly Taking Over — 5 Production Use Cases Beyond the Browser

The EU AI Act Hits Full Force in August. Here Is What Developers Actually Need to Do.

More from AI Integration

Google Workspace Gets a Major AI Boost: Gemini Now Powers Docs, Sheets, Slides & Drive

Agentic AI in 2026: Inside Google's Agent Leap Report and the Rise of Autonomous AI

Best AI Coding Models Compared: The Definitive February 2026 Guide

Working notes from
the studio.

Join the conversation.

The Problem Space: Text-to-SQL

Architecture Overview

Frontend: React + TypeScript

Query Input Panel

Result Viewer

Schema Explorer

The Query Translation Engine

Step 1: Schema Context Preparation

Step 2: SQL Generation via LLM

Step 3: Query Validation and Sanitization

Step 4: Execution and Result Formatting

Multi-Database Support

PostgreSQL and MySQL

MongoDB: The Harder Problem

Schema Auto-Learning

Security and Access Control

Role-Based Access Control (RBAC)

Row-Level Security

Audit Logging

Credential Management

Results Visualization

Prompt Engineering: What We Learned

Lesson 1: Few-shot examples are essential

Lesson 2: Schema compression matters more than you think

Lesson 3: Error recovery prompts need specificity

Lesson 4: Ambiguity handling requires explicit defaults

Performance Metrics

Cost Analysis

Limitations and Honest Assessment

Conclusion

WebAssembly Is Quietly Taking Over — 5 Production Use Cases Beyond the Browser

The EU AI Act Hits Full Force in August. Here Is What Developers Actually Need to Do.

More from AI Integration

Google Workspace Gets a Major AI Boost: Gemini Now Powers Docs, Sheets, Slides & Drive

Agentic AI in 2026: Inside Google's Agent Leap Report and the Rise of Autonomous AI

Best AI Coding Models Compared: The Definitive February 2026 Guide

Working notes fromthe studio.

Join the conversation.

Working notes from
the studio.