Query builder using LangChain, LangSmith using gpt-4o-mini LLM
Open AI model gpt-4o-mini is one my favourite because of it's low cost and less effective time to get the job done.
Below code is to analyze question/query and transform them into meaning full SQL statement through LangSmith and LangChain. I'm not using Pydantic directly here and let the Lang team to handle the query builder part.
One simple quetion - how many employee do we have?
Let's see, how the below code act and what exactly SQL is able to generate:
import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from typing_extensions import Annotated, TypedDict
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDatabaseTool
from langgraph.graph import START, StateGraph
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "Your key"
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = "Your key"
#LLM model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
#Class declaration
class State(TypedDict):
question: str
query: str
result: str
answer: str
system_message = """
Given an input question, create a syntactically correct {dialect} query to
run to help find the answer. Pay attention to limit your query to
at most {top_k} results.
Never query for all the columns from a specific table, only ask for a the
few relevant columns given the question.
Pay attention to use only the column names that you can see in the schema
description.
Only use the following tables:
{table_info}
"""
#Part 1 - query builder
user_prompt = "Question: {input}"
query_prompt_template = ChatPromptTemplate(
[("system", system_message), ("user", user_prompt)]
)
class QueryOutput(TypedDict):
"""Generated SQL query."""
query: Annotated[str, ..., "Syntactically valid SQL query."]
def write_query(state: State):
"""Generate SQL query to fetch information."""
prompt = query_prompt_template.invoke(
{
"dialect": "DRIVER={ODBC Driver 17 for SQL Server};SERVER=TUKAI\\MSSQLSERVER01;DATABASE=EasyDate;Trusted_Connection=yes;",
"top_k": 10,
"table_info": "dbo.Users",
"input": state["question"],
}
)
structured_llm = llm.with_structured_output(QueryOutput)
result = structured_llm.invoke(prompt)
return {"query": result["query"]}
response_query = write_query({"question": "How many Employees are there?"})
print(response_query)
To me, this is like a over confidence call :) . First of all, my Employee table do not have any column Role. Well, I understand inteligence applied here, but unfortunetly not clicked.
Okay, let me change my query now, and let's see how it works now. Now, I'd like to list our all employee information.
response_query = write_query({"question": "Display all employee details"})
print(response_query)
This time the result comeup with most common column names in the SELECT query, however, the columns not exist as well. So, we need to finetune the model, and plug them know the right way to select right columns. We'll go through this is upcomming post :)
Comments
Post a Comment