Build hybrid data analytics agents that combine SQL + Meilisearch with LLM function calling. Handle structured queries, semantic search, and visualization in one intelligent workflow.
Let's face it: those traditional BI tools and static dashboards just aren't cutting it anymore. They're like a rigid rulebook when you really need a conversation. You can't ask a nuanced question, and they get totally lost when you try to mix and match different kinds of data.
This is where we're going to build something smarter.
We're creating an intelligent agent that uses LLM function calling to understand what you're asking, just like a person would. But the real power comes from its hybrid approach. Our agent doesn't just rely on one tool; it intelligently orchestrates a combination of two powerful engines (SQL + Meilisearch) to explore complex e-commerce data:
This is the engine for all your detailed, analytical queries. When you ask a complex question like, "What was the average order value for our top-selling products in the last quarter?", the agent knows to fire up a robust SQL query to get a precise, structured answer from your database.
This is where the magic happens for unstructured and semantic searches. If you're looking for customer feedback and ask, "Show me reviews mentioning a 'durable and stylish backpack,' " the agent will use Meilisearch to quickly scan thousands of reviews and pull the most relevant results.
By combining these two powerful tools, our agent overcomes the limitations of older systems. It can seamlessly handle everything from complex analytical queries to unstructured, natural language searches. It’s a flexible and powerful way to explore your data, turning your questions into actionable insights without the headache.
The demand for more intuitive and powerful data analysis tools is growing. Imagine asking your BI tool, "How many new users bought a Samsung phone in the last quarter?" A traditional dashboard might not have a pre-configured widget for this. It requires:
This is where analytics agents shine. They provide:
Here, for our agent's architecture, we are using a multi-layered system designed for robustness and flexibility.
A well-defined schema is the foundation of our analytical capabilities. We will use a relational database for core transactional data and a search engine for fast, flexible lookups. You can see the dataset here:
def generate_users(num_users=200):
return users
def generate_products(num_products=100):
return products
def generate_transactions(users, products, num_transactions=1000):
return transactions
PostgreSQL will store our primary, structured e-commerce data. This schema supports complex joins and aggregations.
class User(Base):
__tablename__ = 'users'
user_id = Column(String, primary_key=True)
name = Column(String)
email = Column(String, unique=True)
location = Column(String)
registration_date = Column(Date)
class Product(Base):
__tablename__ = 'products'
product_id = Column(String, primary_key=True)
name = Column(String)
category = Column(String)
brand = Column(String)
price = Column(Numeric)
class Transaction(Base):
__tablename__ = 'transactions'
order_id = Column(String, primary_key=True)
user_id = Column(String, ForeignKey('users.user_id'))
product_id = Column(String, ForeignKey('products.product_id'))
amount = Column(Numeric)
timestamp = Column(DateTime)
status = Column(String)
Meilisearch will be used for fast, typo-tolerant searching of product names and user details. We will sync our products and users tables to Meilisearch.
Indexing Strategies:
The LLM is the core of our system, and we must define its capabilities using a Function Calling API. We will create three core tools.
Each tool is defined with a schema that the LLM can understand, allowing it to generate the correct function calls with appropriate parameters. This example uses a JSON schema format compatible with services like OpenAI's API.
tools = [
{
"type": "function",
"function": {
"name": "meilisearch_query",
"description": "Searches for products or users in Meilisearch. Use this for free-text search, fuzzy matching, or combined with filters on indexed attributes like category, brand, price for products, or location, registration_date, email for users. Index names are 'products' and 'users'.",
"parameters": {
"type": "object",
"properties": {
"index_name": {
"type": "string",
"description": "The name of the Meilisearch index to query. Must be 'products' or 'users'.",
"enum": ["products", "users"]
},
"query": {
"type": "string",
"description": "The free-text search query string. Optional.",
"default": ""
},
"filters": {
"type": "string",
"description": "A Meilisearch filter string for structured filtering (e.g., 'category = \"Electronics\" AND price < 500'). Attributes: products (category, brand, price), users (location, registration_date, email). Use `CONTAINS` or `STARTS WITH` for partial string matches (e.g., 'email CONTAINS \".com\"').",
"default": ""
}
},
"required": ["index_name"]
}
}
},
{
"type": "function",
"function": {
"name": "execute_sql_query",
"description": "Executes a SQL query against the PostgreSQL database. Use this for analytical queries, aggregations, joins, or when precise numerical or date-based filtering/grouping is needed across multiple tables (products, users, transactions).",
"parameters": {
"type": "object",
"properties": {
"sql_query": {
"type": "string",
"description": "The full SQL query to execute, including SELECT, FROM, WHERE, GROUP BY, ORDER BY, etc."
}
},
"required": ["sql_query"]
}
}
},
{
"type": "function",
"function": {
"name": "generate_chart",
"description": "Generates a visual chart (e.g., bar chart, line chart) from provided tabular data. Use this when the user explicitly asks for a chart, graph, or visualization. Requires data, chart type, and columns for X and Y axes.",
"parameters": {
"type": "object",
"properties": {
"data": {
"type": "array",
"items": {"type": "object"},
"description": "The tabular data as a list of dictionaries (e.g., the 'data' field from an execute_sql_query output). Each dictionary is a row."
},
"chart_type": {
"type": "string",
"enum": ["bar", "line"],
"description": "The type of chart to generate ('bar' for categorical comparisons, 'line' for trends over time)."
},
"x_column": {
"type": "string",
"description": "The name of the column from the 'data' to use for the X-axis (e.g., 'month', 'category')."
},
"y_column": {
"type": "string",
"description": "The name of the column from the 'data' to use for the Y-axis (e.g., 'total_sales_amount', 'average_price')."
},
"title": {
"type": "string",
"description": "The title of the chart."
},
"x_label": {
"type": "string",
"description": "Optional label for the X-axis."
},
"y_label": {
"type": "string",
"description": "Optional label for the Y-axis."
},
"filename": {
"type": "string",
"description": "Optional filename for the saved chart image (e.g., 'sales_by_month.png'). If not provided, a unique name will be generated."
}
},
"required": ["data", "chart_type", "x_column", "y_column", "title"]
}
}
}
]
We'll use SQLAlchemy, a robust SQL toolkit for Python, to handle database connections and protect against SQL injection.
def execute_sql_query(sql_query: str):
"""
Executes a given SQL query against the PostgreSQL database.
Includes validation to ensure only safe SELECT queries are executed.
Returns the results as a dictionary or an error message.
"""
# --- SQL Query Validation---
normalized_query = sql_query.strip().upper() # Normalize for case-insensitive checks
# 1. Check for allowed operations (must start with SELECT)
if not normalized_query.startswith("SELECT"):
error_msg = f"SQL Validation Error: Only SELECT queries are allowed. Detected: '{sql_query[:50]}...'"
logger.warning(error_msg)
return {"success": False, "message": error_msg, "data": None}
# 2. Prevent destructive DDL/DML operations
forbidden_keywords = [
"INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE", "TRUNCATE",
"GRANT", "REVOKE", "RENAME", "ATTACH", "DETACH", "PRAGMA", "VACUUM",
";--", "--", "/*", "*/", # Basic comment/injection prevention
"UNION ALL SELECT", "UNION SELECT", # Common in SQL injection, though not destructive on their own
"OR 1=1", "OR '1'='1'" # Common injection patterns
]
for keyword in forbidden_keywords:
if keyword in normalized_query:
error_msg = f"SQL Validation Error: Forbidden keyword '{keyword}' detected in query. Only analytical SELECT queries are permitted."
logger.warning(error_msg)
return {"success": False, "message": error_msg, "data": None}
# --- End SQL Query Validation ---
db_url = os.getenv("DATABASE_URL")
try:
engine = create_engine(db_url)
logger.info(f"Executing SQL: {sql_query}")
with engine.connect() as connection:
result = connection.execute(text(sql_query))
connection.commit()
if result.returns_rows:
rows = []
for row in result:
row_dict = {}
for key, value in row._mapping.items():
row_dict[key] = value
rows.append(row_dict)
return {"success": True, "message": "SQL query executed successfully.", "data": rows}
else:
return {"success": True, "message": f"SQL command executed. Rows affected: {result.rowcount}", "data": None}
except Exception as e:
logger.error(f"Error executing SQL query: {e}")
return {"success": False, "message": f"Error executing SQL query: {e}", "data": None}
For real-time data, our Meilisearch instance needs to be kept in sync with our PostgreSQL tables. This can be achieved with an ETL job that runs periodically or, for more dynamic updates, using database triggers that push changes to Meilisearch.
def meilisearch_query(index_name: str, query: str = None, filters: str = None, limit: int = 10, offset: int = 0) -> str:
"""
Performs a search query against a specified Meilisearch index using the Meilisearch Python SDK.
Args:
index_name (str): The UID of the Meilisearch index to search (e.g., "products" or "users").
query (str, optional): The search query string.
filters (str, optional): A Meilisearch filter string.
limit (int, optional): The maximum number of results to return.
offset (int, optional): The number of results to skip.
Returns:
str: JSON string representing the search results or error message.
"""
if index_name not in ["products", "users"]:
error_msg = f"Invalid index_name. Must be 'products' or 'users'. Got: {index_name}"
logger.warning(error_msg)
return json.dumps({"error": error_msg})
try:
index = meili_client.index(index_name)
# All search options (except the main query string) go into this dictionary
search_options = {
"limit": limit,
"offset": offset
}
if filters:
search_options["filter"] = filters
logger.info(f"Performing Meilisearch query on index '{index_name}' with query='{query}', options={search_options}")
# Pass the query string as the first positional argument,
# and search_options as the second (unpacked) argument.
results = index.search(query, search_options)
return json.dumps({
"hits": results.get("hits", []),
"estimatedTotalHits": results.get("estimatedTotalHits", 0)
}, indent=2)
except meilisearch.errors.MeilisearchApiError as e:
logger.error(f"Meilisearch API error during query: {e}")
return json.dumps({
"error": "Meilisearch API error",
"code": e.code,
"message": e.message,
"type": e.type,
"link": e.link
})
except Exception as e:
logger.error(f"Unexpected error during Meilisearch query: {e}", exc_info=True)
return json.dumps({
"error": "Unexpected error",
"message": str(e),
"type": type(e).__name__,
"details": f"Occurred while searching index '{index_name}'"
})
The LLM's system prompt is the key to this decision. We instruct it to:
A common and powerful pattern is to use Meilisearch to get an ID and then pass that ID to a SQL query.
Example Query: "What's the total sales amount for all Apple products?"
Agent's Execution:
Final Response: The SQL tool returns the total sales amount, which the LLM then formats into a final answer.
The sql_query_executor tool is the backbone of our analytics capabilities. It enables the agent to handle complex data questions.
The LLM can translate a request like "Show me sales trends for MacBooks over the last 12 months" into a complex SQL query that groups by month.
The LLM can generate SQL queries for grouping data. For example, "Compare average order value between Bangalore and Mumbai" would translate to: <SQL Query>
For simple results, the LLM can provide a natural language summary. For larger datasets, the agent uses the third tool (e.g., a chart generator) to produce a visualization from the tabular SQL data. For this we are going to use matplotlib to generate the charts.
Before executing any SQL, the agent should perform a basic validation. The LLM can be instructed to never use DROP, DELETE, or UPDATE in its queries, effectively making the tool read-only.
The system prompt is key to the agent's performance. It should be meticulously engineered with examples and clear instructions on when to use each tool, what to expect as output, and how to chain them together.
To ensure safety and reliability, implement a layered guardrail system:
LLM-level: Provide clear, strict instructions in the system prompt to prevent dangerous queries.
Here are a few examples of how this agent would process user queries:
A common setup for this architecture would be:
To see the full code click here
Building a hybrid analytics agent is a powerful step towards creating more intuitive and intelligent data tools. The combination of an LLM's reasoning with specialized backends provides a robust and scalable solution for modern data challenges.
To learn more, speak to us.