← Back to portfolio

Project Case Study

TwinQuery

An agentic assistant for querying building-stock digital twin datasets in plain English, without SQL expertise.

Building-stock datasets for energy retrofit planning are rich but inaccessible. The data lives in PostgreSQL with PostGIS geometry — useful for engineers and data scientists, opaque for everyone else. Non-technical users working with retrofit programs, municipal planners, and community organizations have legitimate questions: which buildings in a neighbourhood are the worst-performing? How many pre-1980 homes lack adequate insulation? What does the retrofit guidance say about that building type?

TwinQuery closes that gap. It accepts a plain-English question, routes it through a safe Text-to-SQL pipeline backed by deterministic guardrails, executes PostGIS queries over Ottawa building-footprint geometry, and synthesizes the answer with retrieved retrofit guidance from a local RAG system. The result is a response grounded in both database rows and document sources, with the SQL shown so users can verify what was actually run.

  • Python
  • LangGraph
  • FastAPI
  • Streamlit
  • PostgreSQL/PostGIS
  • Ollama
  • RAG
  • Docker
TwinQuery interface showing map query mode with 3D building extrusion over Ottawa and generated SQL
Local Fully offline inference via Ollama — no API keys, no data sent to third parties
PostGIS Geospatial queries over Ottawa building-footprint polygons with 2D/3D PyDeck visualization
Hybrid Answers grounded in both database rows and retrieved retrofit guidance documents
Transparent Generated SQL shown to users alongside every answer for auditability

Problem

Retrofit planning tools are built for specialists. A planner who wants to know which census tracts have the highest concentration of pre-1980 residential buildings cannot easily get that answer without writing a PostGIS spatial join. A community advocate asking about insulation upgrade pathways for a specific building archetype needs to search retrofit documents manually. Neither person should have to learn SQL or dig through PDFs.

The technical challenge is not just generating SQL from text. It is generating SQL that is safe to run, correct against a specific schema, and paired with document context that makes the answer actually useful. LLMs hallucinate table names, invent columns, and produce queries that fail silently or return misleading results. Any system intended for non-technical users has to treat SQL generation as an untrusted output that needs validation.

Accessibility gap

Building-stock datasets are in relational databases with spatial geometry. Non-technical users cannot query them without specialized tools or SQL knowledge.

Hallucination risk

LLMs generating SQL against unfamiliar schemas produce plausible-looking queries that reference wrong columns, wrong tables, or apply incorrect filters.

Context gap

Database rows answer what. Retrofit guidance documents answer why and how. A useful response needs both, synthesized coherently.

Privacy and cost

Building-stock data can contain sensitive property information. Local inference removes the need to send queries or data to external APIs.

Agent Architecture

TwinQuery uses LangGraph to orchestrate a multi-step agent pipeline. Rather than a single prompt-to-answer flow, the pipeline separates query interpretation, SQL generation, validation, execution, RAG retrieval, and synthesis into discrete nodes. This makes each step inspectable and replaceable independently.

  • Intent routing: the pipeline first classifies whether the question needs structured data, document retrieval, or both. Questions about building counts or geometry go through the SQL path; questions about retrofit strategies or policy go through RAG; most questions use both.
  • Schema injection: the SQL generation node receives the full schema definition as context, including column names, types, and PostGIS geometry fields. This grounds generation against the actual database rather than a hallucinated schema.
  • Validation gate: generated SQL is validated before execution. The guardrail checks for prohibited operations, verifies referenced tables and columns exist, and rejects queries that do not conform to the allowed pattern.
  • Parallel retrieval: while the SQL path executes, the RAG path retrieves relevant retrofit guidance chunks. Both results are passed to the synthesis node together.
  • Grounded synthesis: the final response cites which database rows and document passages informed the answer, rather than producing a fluent summary with no traceable source.

Text-to-SQL Pipeline

SQL generation for non-technical users requires more than a prompt. The pipeline implements several layers of defense against the common failure modes of LLM-generated SQL.

Deterministic guardrails

Before any query reaches the database, a rule-based validator checks that it is a SELECT statement, references only schema-known tables and columns, and does not contain destructive clauses. Queries failing validation are rejected with an explanation rather than silently returning empty results.

Fallback templates

For common question patterns — nearest buildings, buildings by type within an area, retrofit priority ranking — the pipeline matches against a library of pre-validated query templates with slot-filled parameters. Template hits bypass LLM generation entirely.

PostGIS execution

Validated queries run against a PostgreSQL database with PostGIS extension over Ottawa building-footprint geometry. Spatial queries use ST_Within, ST_DWithin, and ST_Intersects to answer location-relative questions.

SQL transparency

Every answer surfaces the generated SQL in an expandable panel. Non-technical users can ignore it; technical users can verify it. This makes the system auditable without forcing users to engage with the query.

Local RAG Over Retrofit Guidance

The RAG component retrieves relevant passages from retrofit guidance documents and building-code references. Everything runs locally through Ollama, with no external embedding or inference API calls.

  • Document ingestion: retrofit guidance PDFs and policy materials are chunked, embedded with a local embedding model via Ollama, and stored in a vector index for retrieval.
  • Hybrid retrieval: the RAG node combines vector similarity search with keyword fallback to handle both semantic queries and specific terminology lookups that embeddings sometimes miss.
  • Source grounding: retrieved chunks are passed to the synthesis node with their document and page provenance. The final response attributes claims to specific source passages rather than synthesizing anonymously.
  • Local inference: the full pipeline — embedding, retrieval, SQL generation, and synthesis — runs through Ollama with locally hosted models. No user query or building data leaves the deployment environment.
TwinQuery hybrid mode showing synthesized answer with building data and recommended HVAC retrofits from local guidance

Interface and Geospatial Visualization

The frontend is a Streamlit application backed by a FastAPI service layer. The architecture keeps the agent pipeline independent of the interface so the API can be consumed by other frontends or integrated into broader platforms.

TwinQuery interface showing map query mode with 3D building extrusion over Ottawa and generated SQL

FastAPI service layer

The agent pipeline is exposed through a FastAPI backend with typed request and response schemas. Endpoint contracts are tested independently of the Streamlit interface so the API remains a stable integration surface.

2D/3D PyDeck maps

Building query results are rendered on a PyDeck map with togglable 2D polygon and 3D extruded views. Buildings matching the query are highlighted so users can immediately see the spatial distribution of results.

Result transparency

The interface surfaces the generated SQL, retrieved document sources, and the synthesized answer as separate panels. Users control how much of the pipeline they want to inspect.

Docker deployment

The full stack — PostgreSQL/PostGIS, Ollama, FastAPI, and Streamlit — is containerized with Docker Compose. A single command starts the complete environment with no external dependencies.

Engineering Takeaways

The core design tension in TwinQuery is between LLM flexibility and deterministic safety. The resolution is not to trust the LLM less, but to constrain its authority to the parts of the pipeline where flexibility is actually needed.

  • Treat SQL generation as untrusted output: a validator that checks schema conformance and blocks destructive operations is not optional when the query target is a real database with real data.
  • Template matching beats generation for common patterns: pre-validated query templates are faster, safer, and more predictable than LLM-generated SQL for the subset of questions they cover.
  • Grounding requires explicit source tracking: hybrid synthesis only works if the synthesis step knows which claims came from database rows and which came from documents. Mixing sources anonymously produces confident-sounding answers that are hard to audit.
  • Local inference is a design choice, not a constraint: running Ollama locally makes the system deployable in environments where data sovereignty or cost rules out cloud APIs. The architecture is the same either way.
  • API-first enables integration: building the agent as a FastAPI service before writing the Streamlit frontend meant the pipeline could be tested independently and is ready for integration into a larger digital twin platform.

Limitations

TwinQuery is a research prototype aimed at demonstrating agentic Text-to-SQL and RAG for building-stock data. Several limitations apply before it would be production-ready for a live municipal platform.

  • SQL guardrails are rule-based and may not catch all adversarial or malformed LLM-generated queries. A production deployment would benefit from a formal query allow-list approach.
  • Local embedding quality depends on the chosen Ollama model. Smaller models may miss semantic matches that a larger hosted embedding model would catch.
  • The RAG system does not yet implement re-ranking. Retrieved chunks are returned by similarity score, which may surface less relevant passages for complex queries.
  • Evaluation of answer quality is manual. A systematic RAG evaluation framework such as RAGAS or DeepEval would provide more rigorous accuracy and faithfulness measurement.
  • The Ottawa building-footprint dataset is the primary test case. Generalization to other municipal datasets would require schema adaptation and template updates.