Query builder using LangChain, LangSmith using gpt-4o-mini LLM

October 19, 2025

Open AI model gpt-4o-mini is one my favourite because of it's low cost and less effective time to get the job done.

Below code is to analyze question/query and transform them into meaning full SQL statement through LangSmith and LangChain. I'm not using Pydantic directly here and let the Lang team to handle the query builder part.

One simple quetion - how many employee do we have?

Let's see, how the below code act and what exactly SQL it is able to generate:

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from typing_extensions import Annotated, TypedDict
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langgraph.graph import START, StateGraph


os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "Your key"

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = "Your key"

#LLM model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

#Class declaration
class State(TypedDict):
    question: str
    query: str
    result: str
    answer: str

system_message = """
Given an input question, create a syntactically correct {dialect} query to
run to help find the answer. Pay attention to limit your query to
at most {top_k} results.

Never query for all the columns from a specific table, only ask for a the
few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema
description. 

Only use the following tables:
{table_info}
"""

#Part 1 - query builder

user_prompt = "Question: {input}"

query_prompt_template = ChatPromptTemplate(
    [("system", system_message), ("user", user_prompt)]
)

class QueryOutput(TypedDict):
    """Generated SQL query."""

    query: Annotated[str, ..., "Syntactically valid SQL query."]

def write_query(state: State):
    """Generate SQL query to fetch information."""
    prompt = query_prompt_template.invoke(
        {
            "dialect": "DRIVER={ODBC Driver 17 for SQL Server};SERVER=TUKAI\\MSSQLSERVER01;DATABASE=EasyDate;Trusted_Connection=yes;",
            "top_k": 10,
            "table_info": "dbo.Users",
            "input": state["question"],
        }
    )
    structured_llm = llm.with_structured_output(QueryOutput)
    result = structured_llm.invoke(prompt)
    return {"query": result["query"]}

response_query = write_query({"question": "How many Employees are there?"})

print(response_query)

To me, this is like a over confidence call :) .  First of all,  my Employee table do not have any column Role. Well, I understand inteligence applied here, but unfortunetly not clicked.

Okay, let me change my query now, and let's see how it works now. Now, I'd like to list our all employee information.

response_query = write_query({"question": "Display all employee details"})

print(response_query)

This time the result comeup with most common column names in the SELECT query, however, the columns not exist as well. So, we need to finetune the model, and plug them know the right way to select right columns. We'll go through this in upcomming post :)

Search This Blog

Azure, AI, SP, .NET, Microsoft - Knowledge Base

Query builder using LangChain, LangSmith using gpt-4o-mini LLM

Comments

Post a Comment

Labels

Popular posts from this blog

SharePoint Framework Reference Guide

SharePoint Admin Center

How to fix Azure DevOps error MSB4126