Every organization has a data bottleneck, and it is almost never the database itself. It is the gap between the people who have the questions and the people who know SQL. A marketing manager wants to know which campaign drove the most signups last quarter. A finance lead needs a breakdown of vendor payments by category. A product manager wants to see user retention by cohort. Each of these questions takes a SQL-literate engineer 2 minutes to answer, but the request queue, context switching, and back-and-forth clarifications turn that 2-minute query into a 2-day turnaround.
That bottleneck is exactly what QueryLytic was built to eliminate.
A B2B SaaS company approached CODERCOPS with a challenge: their 120-person organization had 8 data analysts serving 80+ business stakeholders. The analysts were drowning in ad-hoc query requests, spending 60% of their time writing one-off SQL queries instead of doing actual analysis. The company wanted a tool that would let anyone in the organization query their databases using plain English.
At CODERCOPS, we built QueryLytic in 10 weeks. This is the full case study.
Turning natural language into accurate SQL queries requires more than just calling an LLM
The Problem Space: Text-to-SQL
Text-to-SQL is one of the most active research areas in applied NLP. The idea is straightforward: take a natural language question, understand its intent, map it to the relevant database schema, generate a syntactically correct SQL query, execute it, and present the results in a human-readable format.
The reality is significantly harder. Consider this seemingly simple question:
"Show me our top 10 customers by revenue last year"
To generate the correct SQL, the system needs to:
- Understand that "customers" maps to a
customerstable (or maybeaccounts, orclients) - Know that "revenue" means
SUM(order_total)from anorderstable, joined oncustomer_id - Interpret "last year" as a date range filter:
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01' - Apply
GROUP BY,ORDER BY DESC, andLIMIT 10 - Handle edge cases: should cancelled orders be excluded? Is revenue gross or net?
Academic benchmarks like Spider and WikiSQL show that even state-of-the-art models struggle with complex joins, subqueries, and ambiguous column references. Our job was to build a production system that handles these edge cases reliably for real business users.
Architecture Overview
+---------------------------------------------------------------+
| FRONTEND (React + TypeScript) |
| |
| +------------------+ +------------------+ +---------------+ |
| | Query Input | | Result Viewer | | Schema | |
| | (Natural Language | | (Table + Chart) | | Explorer | |
| | + Suggestions) | | | | | |
| +--------+---------+ +--------+---------+ +-------+-------+ |
| | ^ | |
+-----------+---------------------+--------------------+-----------+
| | |
| REST API | Results | Schema API
| | |
+-----------v---------------------+--------------------v-----------+
| BACKEND (Python + FastAPI) |
| |
| +------------------+ +------------------+ +-----------------+ |
| | Query Engine | | Schema Manager | | Auth + RBAC | |
| | (NLP Pipeline) | | (Auto-Discovery) | | (JWT + Roles) | |
| +--------+---------+ +--------+---------+ +-----------------+ |
| | | |
| +--------v---------+ +-------v----------+ |
| | OpenAI GPT-4 | | Schema Cache | |
| | (SQL Generation) | | (Redis) | |
| +--------+---------+ +------------------+ |
| | |
| +--------v---------+ |
| | Query Validator | |
| | + Sanitizer | |
| +--------+---------+ |
| | |
+-----------+-------------------------------------------------------+
|
| Validated SQL
|
+-----------v-------------------------------------------------------+
| DATABASE CONNECTORS |
| |
| +----------------+ +--------------+ +------------------------+ |
| | PostgreSQL | | MySQL | | MongoDB | |
| | Connector | | Connector | | Connector (Aggregation | |
| | (psycopg2) | | (PyMySQL) | | Pipeline Translation) | |
| +----------------+ +--------------+ +------------------------+ |
| |
+---------------------------------------------------------------------+The system has three major components: a React frontend for input and visualization, a Python/FastAPI backend that orchestrates the NLP pipeline, and a set of database connectors that handle multi-database support.
Frontend: React + TypeScript
The frontend was built with React 18 and TypeScript, bundled with Vite. We chose React over Next.js because QueryLytic is a single-page application with no SEO requirements. Server-side rendering would have added complexity without benefit.
The interface has three main panels:
Query Input Panel
The input is a single text field with autocomplete suggestions. As the user types, the system offers context-aware suggestions based on the connected database schema. If the user types "show me all orders from," the autocomplete suggests table names and column values that could complete the sentence.
interface QuerySuggestion {
text: string;
type: 'table' | 'column' | 'value' | 'function';
confidence: number;
source: string; // which table/column this came from
}
const useQuerySuggestions = (
input: string,
schema: DatabaseSchema
): QuerySuggestion[] => {
const [suggestions, setSuggestions] = useState<QuerySuggestion[]>([]);
useEffect(() => {
const debounced = debounce(async () => {
if (input.length < 3) return;
const tokens = tokenize(input);
const lastToken = tokens[tokens.length - 1];
// Match against schema elements
const matches = schema.tables.flatMap(table => {
const tableMatch = fuzzyMatch(lastToken, table.name);
const columnMatches = table.columns.map(col => ({
text: col.name,
type: 'column' as const,
confidence: fuzzyMatch(lastToken, col.name),
source: table.name
}));
return [
{ text: table.name, type: 'table' as const,
confidence: tableMatch, source: table.name },
...columnMatches
];
});
setSuggestions(
matches
.filter(m => m.confidence > 0.3)
.sort((a, b) => b.confidence - a.confidence)
.slice(0, 8)
);
}, 200);
debounced();
return () => debounced.cancel();
}, [input, schema]);
return suggestions;
};Result Viewer
Query results are displayed in a tabular format by default, with one-click switching to chart visualizations. We integrated Recharts for bar charts, line charts, pie charts, and scatter plots. The system auto-detects the most appropriate chart type based on the result shape:
| Result Shape | Auto-Selected Chart |
|---|---|
| 1 text column + 1 numeric column | Bar chart |
| 1 date column + 1 numeric column | Line chart |
| 1 text column + 1 numeric column (< 8 rows) | Pie chart |
| 2 numeric columns | Scatter plot |
| 3+ columns or no clear pattern | Table only |
Schema Explorer
A sidebar panel shows the connected database schema in a tree view: databases, tables, columns with types, and relationships. Users can click on any table or column to insert it as context into their query. This helps non-technical users understand what data is available without needing to read documentation.
The Query Translation Engine
The core of QueryLytic is the NLP pipeline that converts natural language to SQL. This is where the real engineering challenge lives.
Step 1: Schema Context Preparation
Before any query is sent to the LLM, we prepare a compressed schema representation. Sending the entire database schema (which can be hundreds of tables with thousands of columns) in every API call would be expensive and slow. Instead, we use a two-stage retrieval approach:
class SchemaContextBuilder:
def __init__(self, schema: DatabaseSchema, embeddings_cache: dict):
self.schema = schema
self.embeddings = embeddings_cache
def build_context(self, user_query: str, max_tables: int = 10) -> str:
# Stage 1: Semantic search for relevant tables
query_embedding = get_embedding(user_query)
table_scores = []
for table in self.schema.tables:
# Embed table name + column names + sample values
table_text = (
f"{table.name}: "
f"{', '.join(c.name for c in table.columns)}"
)
table_emb = self.embeddings.get(
table.name, get_embedding(table_text)
)
similarity = cosine_similarity(query_embedding, table_emb)
table_scores.append((table, similarity))
# Stage 2: Include related tables (foreign keys)
relevant_tables = sorted(
table_scores, key=lambda x: x[1], reverse=True
)
selected = set()
for table, score in relevant_tables[:max_tables]:
selected.add(table.name)
for fk in table.foreign_keys:
selected.add(fk.referenced_table)
# Build compressed schema string
context_lines = []
for table in self.schema.tables:
if table.name in selected:
cols = ", ".join(
f"{c.name} ({c.type}"
f"{'*' if c.is_primary_key else ''})"
for c in table.columns
)
context_lines.append(f"TABLE {table.name}: {cols}")
for fk in table.foreign_keys:
context_lines.append(
f" FK: {fk.column} -> "
f"{fk.referenced_table}.{fk.referenced_column}"
)
return "\n".join(context_lines)This typically reduces the schema context from thousands of tokens to a few hundred, making the API call faster and cheaper while keeping the relevant information intact.
Step 2: SQL Generation via LLM
The compressed schema context and user query are sent to OpenAI's GPT-4 with a carefully engineered system prompt:
SQL_GENERATION_PROMPT = """
You are a SQL query generator. Given a database schema and a natural
language question, generate the correct SQL query.
RULES:
1. Generate ONLY the SQL query. No explanations, no markdown.
2. Use ONLY tables and columns that exist in the provided schema.
3. Always use explicit JOIN syntax (never implicit joins in WHERE).
4. Always alias tables in multi-table queries.
5. Use parameterized placeholders ($1, $2) for user-supplied values.
6. Default to PostgreSQL syntax unless told otherwise.
7. For ambiguous terms, prefer the most common business interpretation:
- "revenue" = SUM of order/transaction amounts
- "last month" = previous calendar month
- "active users" = users with activity in last 30 days
- "top N" = ORDER BY DESC LIMIT N
8. If the query cannot be answered from the schema, respond with:
CANNOT_ANSWER: [reason]
9. Never generate DROP, DELETE, UPDATE, INSERT, ALTER, or TRUNCATE.
10. Always include a LIMIT clause (default 1000) to prevent
runaway queries.
DATABASE SCHEMA:
{schema_context}
USER QUESTION:
{user_query}
SQL:
"""Rule 9 is critical. QueryLytic is a read-only tool. We enforce this at multiple layers (prompt, validator, database permissions), but defense in depth is essential when AI is generating SQL that will be executed against production databases.
Step 3: Query Validation and Sanitization
The generated SQL goes through a validation pipeline before execution:
class QueryValidator:
FORBIDDEN_KEYWORDS = {
'DROP', 'DELETE', 'UPDATE', 'INSERT', 'ALTER',
'TRUNCATE', 'CREATE', 'GRANT', 'REVOKE', 'EXEC',
'EXECUTE', 'xp_', 'sp_', 'INTO OUTFILE',
'INTO DUMPFILE', 'LOAD_FILE'
}
def validate(
self, sql: str, schema: DatabaseSchema
) -> ValidationResult:
errors = []
warnings = []
# 1. Forbidden keyword check
sql_upper = sql.upper()
for keyword in self.FORBIDDEN_KEYWORDS:
if keyword in sql_upper:
errors.append(
f"Forbidden keyword detected: {keyword}"
)
# 2. Parse and validate table references
parsed = sqlparse.parse(sql)[0]
referenced_tables = extract_table_names(parsed)
valid_tables = {t.name for t in schema.tables}
for table in referenced_tables:
if table not in valid_tables:
errors.append(f"Unknown table: {table}")
# 3. Check for LIMIT clause
if 'LIMIT' not in sql_upper:
warnings.append(
"No LIMIT clause. Adding LIMIT 1000."
)
sql = sql.rstrip(';') + ' LIMIT 1000;'
# 4. Estimate query cost (EXPLAIN)
cost_estimate = self.estimate_cost(sql)
if cost_estimate > 10000:
warnings.append(
f"High estimated cost ({cost_estimate}). "
"Query may be slow."
)
# 5. Check for Cartesian products
if self.has_cartesian_product(parsed):
errors.append(
"Potential Cartesian product detected."
)
return ValidationResult(
is_valid=len(errors) == 0,
errors=errors,
warnings=warnings,
sanitized_sql=sql
)If validation fails, the error is sent back to the LLM with a correction prompt, and the system retries up to 2 times. In practice, the first-attempt success rate is 87%, and after one retry it rises to 96%.
Step 4: Execution and Result Formatting
Validated queries are executed against the target database using a read-only database user with SELECT-only permissions. Results are returned as JSON and rendered in the frontend's result viewer.
Multi-Database Support
One of QueryLytic's key features is support for multiple database engines. The client's organization used PostgreSQL for their primary application database, MySQL for a legacy billing system, and MongoDB for event tracking data.
PostgreSQL and MySQL
SQL generation for PostgreSQL and MySQL is handled by the same LLM pipeline, with dialect-specific adjustments in the system prompt:
| Feature | PostgreSQL | MySQL |
|---|---|---|
| String concatenation | || operator |
CONCAT() function |
| Date functions | DATE_TRUNC() |
DATE_FORMAT() |
| Boolean type | Native BOOLEAN |
TINYINT(1) |
| LIMIT offset | LIMIT n OFFSET m |
LIMIT m, n |
| JSON access | ->, ->> operators |
JSON_EXTRACT() |
The system detects the database type from the connection configuration and adjusts the prompt accordingly.
MongoDB: The Harder Problem
MongoDB does not use SQL. Queries are expressed as aggregation pipelines, a fundamentally different paradigm. We built a separate translation path for MongoDB:
Natural Language -> LLM -> MongoDB Aggregation Pipeline (JSON)The MongoDB prompt generates an aggregation pipeline as a JSON array instead of a SQL string:
MONGO_GENERATION_PROMPT = """
You are a MongoDB aggregation pipeline generator. Given a
collection schema and a natural language question, generate the
correct aggregation pipeline.
Output ONLY a valid JSON array of pipeline stages.
Example:
Question: "How many orders per customer last month?"
Output:
[
{
"$match": {
"created_at": {
"$gte": "2026-01-01",
"$lt": "2026-02-01"
}
}
},
{"$group": {"_id": "$customer_id", "count": {"$sum": 1}}},
{"$sort": {"count": -1}},
{"$limit": 100}
]
COLLECTION SCHEMA:
{schema_context}
QUESTION:
{user_query}
PIPELINE:
"""MongoDB queries go through a similar validation step (checking for forbidden operations like $out, $merge, $delete) before execution.
Schema Auto-Learning
When a new database is connected, QueryLytic automatically discovers and indexes the schema:
Schema Discovery Flow:
1. Connect to database
|
v
2. Query information_schema (SQL) or
listCollections + sample (MongoDB)
|
v
3. Extract: tables, columns, types,
primary keys, foreign keys, indexes
|
v
4. Sample 100 rows per table to detect:
- Enum-like columns (< 50 distinct values)
- Date formats
- Null percentages
- Value distributions
|
v
5. Generate embeddings for each table
(table name + column names + sample values)
|
v
6. Cache schema + embeddings in Redis
(TTL: 1 hour, manual refresh available)
|
v
7. Build business glossary suggestions
("revenue" likely maps to orders.total_amount)The business glossary is a key usability feature. Administrators can define aliases that map business terms to specific columns or expressions:
{
"revenue": "SUM(orders.total_amount)",
"active users": "users WHERE last_login > NOW() - INTERVAL '30 days'",
"churn rate": "(churned_users / total_users) * 100",
"MRR": "SUM(subscriptions.monthly_amount) WHERE status = 'active'",
"CAC": "SUM(marketing_spend) / COUNT(new_customers)"
}These aliases are injected into the LLM prompt context, dramatically improving accuracy for domain-specific queries.
Security and Access Control
QueryLytic handles database credentials, which makes security non-negotiable. We implemented multiple layers:
Role-Based Access Control (RBAC)
| Role | Permissions |
|---|---|
| Viewer | Run queries against pre-approved tables only |
| Analyst | Run queries against any table, save queries, create dashboards |
| Admin | All analyst permissions + manage database connections + manage users |
| Super Admin | All admin permissions + view audit logs + configure security policies |
Row-Level Security
For sensitive tables (e.g., employee salaries, customer PII), administrators can define row-level filters that are automatically appended to every query:
# Example: Sales reps can only see their own customer data
row_filters = {
"customers": {
"role:sales_rep": (
"WHERE assigned_rep_id = {current_user_id}"
)
},
"employee_salaries": {
"role:hr_manager": "", # No filter for HR
"role:*": "WHERE 1=0" # Block everyone else
}
}Audit Logging
Every query executed through QueryLytic is logged with:
- Timestamp
- User ID and role
- Natural language input
- Generated SQL
- Execution time
- Row count returned
- IP address
Audit logs are stored in a separate database and retained for 12 months. This was a compliance requirement from the client's security team.
Credential Management
Database credentials are encrypted at rest using AES-256 and stored in Azure Key Vault. The application retrieves credentials at runtime and never logs or exposes them. Connection strings are never sent to the LLM -- only the schema structure.
Results Visualization
QueryLytic does not just return raw data. It generates context-aware visualizations:
Result Analysis Flow:
Raw Query Results
|
v
Column Type Detection
(numeric, categorical, temporal, text)
|
v
Shape Analysis
(rows x columns, cardinality, distributions)
|
v
Chart Type Selection
(rules-based, see table above)
|
v
Render with Recharts
(interactive, exportable as PNG/CSV)Users can override the auto-selected chart type, customize colors and labels, and export results as CSV, Excel, or PNG. Saved queries can be pinned to personal dashboards for repeated access.
Prompt Engineering: What We Learned
Building a production text-to-SQL system taught us hard lessons about prompt engineering:
Lesson 1: Few-shot examples are essential
The base prompt with rules alone achieved 72% accuracy on our test suite. Adding 5 diverse few-shot examples (covering joins, aggregations, date filtering, subqueries, and NULL handling) jumped accuracy to 89%. Adding 10 examples pushed it to 93%.
Lesson 2: Schema compression matters more than you think
Sending the full schema (200+ tables) caused the model to hallucinate non-existent columns. Our semantic search-based schema compression reduced hallucinations by 78%.
Lesson 3: Error recovery prompts need specificity
When a generated query fails validation, the retry prompt must include the specific error:
Bad: "The previous query was incorrect. Try again."
Good: "The previous query referenced table 'customer' which
does not exist. Available tables containing customer
data: 'customers', 'customer_orders',
'customer_profiles'. Generate a corrected query."The specific prompt achieved 91% correction rate on retry. The vague prompt achieved only 34%.
Lesson 4: Ambiguity handling requires explicit defaults
Business language is inherently ambiguous. "Last quarter" could mean Q4 2025 or Q1 2026 depending on context and the user's fiscal year. We established explicit defaults documented in the business glossary, and the system shows users what assumptions were made:
Your query: "Show me revenue by region last quarter"
Interpreted as: Q4 2025 (Oct 1 - Dec 31, 2025)
Change interpretation? [Q3 2025] [Q1 2026] [Custom range]Performance Metrics
After 4 months in production, here are the numbers:
| Metric | Result |
|---|---|
| Total queries processed | 34,000+ |
| First-attempt accuracy | 87% |
| Accuracy after retry | 96% |
| Average query generation time | 2.1 seconds |
| Average end-to-end time (generation + execution) | 3.4 seconds |
| Ad-hoc SQL requests to data team | Reduced 73% |
| Average time-to-insight (before) | 4-48 hours |
| Average time-to-insight (after) | Under 30 seconds |
| Monthly active users | 94 of 120 employees (78%) |
| Cost per query (OpenAI API) | ~$0.03 |
| Monthly API cost | ~$850 |
The 73% reduction in ad-hoc SQL requests freed the data team to focus on complex analyses, dashboards, and data infrastructure improvements -- the work they were actually hired to do.
Cost Analysis
| Cost Component | Monthly |
|---|---|
| OpenAI API (GPT-4) | $850 |
| Azure App Service (Backend) | $120 |
| Azure Cache for Redis | $55 |
| Azure Key Vault | $15 |
| Monitoring (Datadog) | $45 |
| Total | $1,085 |
For context, the 8 data analysts each spend approximately 60% of their time on ad-hoc requests. At an average fully loaded cost of Rs.1,50,000/month per analyst, that is Rs.7,20,000/month (roughly $8,600) in analyst time spent on queries that QueryLytic can now handle. The ROI is clear.
Limitations and Honest Assessment
We believe in transparent case studies. QueryLytic is not perfect:
- Complex multi-step analytical queries (e.g., "What percentage of customers who bought Product A in Q1 also bought Product B in Q2?") sometimes require 2-3 attempts or manual SQL refinement.
- MongoDB support lags behind SQL databases in accuracy (81% first-attempt vs. 89% for PostgreSQL).
- Very large result sets (100K+ rows) cause frontend rendering lag. We addressed this with virtualized scrolling, but chart rendering with 100K data points remains slow.
- The system cannot write data. This is by design, but some users have requested INSERT/UPDATE capabilities for simple data corrections. We have resisted this for security reasons.
Conclusion
QueryLytic demonstrates that the text-to-SQL problem, while not fully solved in the academic sense, is production-ready for real business environments when you combine good prompt engineering, robust validation, and thoughtful UX that handles uncertainty gracefully.
The key insight from this project is that the AI does not need to be perfect. It needs to be fast, mostly right, and transparent about its assumptions. A 96% accuracy rate with clear assumption display and easy retry is dramatically more useful than waiting 2 days for a 100% accurate response from a busy data analyst.
At CODERCOPS, we have seen firsthand how AI-powered data access tools transform the relationship between business teams and their data. If your organization is sitting on valuable data that is locked behind SQL expertise, talk to us about building your own QueryLytic.
Comments