This blog walks through using Reasoning AI with DuckDB and DeepSeek to build explainable loan approval systems, showing step-by-step logic, JSON outputs, and audit-ready decisions.
In finance, a decision isn’t useful unless you can show why it was made. “Reasoning” AI models help you do exactly that. Instead of giving a one-line yes/no, they walk through the factors—income, requested loan amount, credit history, number of dependents—and explain how each one affected the result.
In this guide, we set up a simple decision policy (think: a rulebook) inside the model’s system prompt. The model then follows those rules and thresholds every time, so you get a clear outcome plus a short audit trail you can share with risk, compliance, or regulators.
DuckDB is a small, fast analytics database that runs inside your app—no separate server to manage. Think of it as “SQLite for analytics.” It’s great for slicing and dicing data, running big joins, and computing metrics for dashboards or models.
Bottom line: DuckDB gives your team warehouse-style analytics speed without warehouse-level complexity or cost—perfect for prototypes, notebooks, and embedded analytics.
A reasoning model is an AI that “shows its work.” It lays out the steps it took to reach a conclusion, so you can trust the result and review it later. That makes it well-suited to finance, where decisions must be explainable and consistent.
For this project we use DeepSeek-R1-Distill-LLaMA-70B:
You need Python 3.10+ and two libraries:
Install (in a notebook use the !, in a terminal omit it):
! pip install duckdb groq
We’re using a public loan prediction dataset (~614 rows). Columns include applicant income, co-applicant income, loan amount/term, credit history, demographics, and the final label (approved or not). It’s a simple, realistic sandbox to practice explainable lending decisions.
Why DuckDB here? It lets you query a CSV as if it were a table—fast and zero setup. The snippet below previews five rows so you can eyeball types and missing values.
import duckdb
duckdb.read_csv("train.csv")
con = duckdb.connect()
df = con.sql("SELECT * FROM 'train.csv' LIMIT 5").df()
print(df)
This prompt is the policy the model must follow. It spells out the factors and thresholds, so the model “shows its work” instead of giving a black-box yes/no.
Key ideas
Credit history carries the most weight (poor credit → usually decline).
Always return a structured JSON with verdict, score, reasons, and recommendations.
SYSTEM = """You are a senior loan eligibility officer at Dream Housing Finance.
Your task: decide if a customer is eligible for a home loan based ONLY on the provided case JSON.
Consider the following factors logically:
- Applicant's income (higher income → higher eligibility)
- Co-applicant income (combined income matters)
- Loan amount (must be reasonable compared to income)
- Loan term (longer terms lower monthly burden)
- Credit history (1 = good, 0 = poor; poor credit usually leads to decline)
- Education, marital status, and dependents can influence stability, but DO NOT use them unfairly--only as supportive signals.
Policy guidelines:
- If credit history = 0, high risk → usually DECLINE unless strong income/low loan.
- If loan amount / total income ratio is very high (> 5), mark DECLINE.
- If ratio is moderate (2-5), mark REVIEW.
- Otherwise, APPROVE.
- Return JSON with: verdict (APPROVE/REVIEW/DECLINE), risk_score_0_100, reasons, and recommendations.
Be clear and logical in your reasoning, showing step-by-step analysis of the applicant's profile.
"""
Reality check:
The demo uses a simple loan-amount ÷ income proxy for affordability. In common loan datasets, ApplicantIncome is monthly and LoanAmount is often in thousands, so raw division mismatches units and can understate the ratio. For realistic outcomes, either use your operational EMI/Income (FOIR/DTI) policy band (e.g., 30–45%), or normalize both values to the same time unit and scale before applying thresholds.
query = "SELECT * FROM 'train.csv' LIMIT 8"
sample = con.execute(query).df().to_dict(orient="records")
from groq import Groq
client = Groq(api_key="YOUR_GROQ_API_KEY")
completion = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[
{
"role": "system",
"content": SYSTEM
},
{
"role": "user",
"content": f"Here is some loan application data:\n{sample}\n\nWho is most at risk of default and why?"
}
],
temperature=0.6,
max_completion_tokens=4096,
top_p=0.95,
stream=True
)
for chunk in completion:
print(chunk.choices[0].delta.content or "", end="")
Tip: or executive demos, show only the final structured output (verdict, score, reasons, recommendations) rather than intermediate thought streams.
Present an audit-friendly result—either a concise narrative summary, a compact JSON, or both. Avoid publishing internal chain-of-thought.
The model flags LP001014 as the most at-risk applicant. While their combined income (5540) and loan amount (158) produce a low loan-to-income proxy (~0.0285), the credit history of 0 is the decisive factor. In lending, a poor credit track record is a strong predictor of default and outweighs positive signals like a manageable loan size or adequate income. Supportive context—such as 3+ dependents, which can add financial strain—reinforces the risk. Together, these lead to a verdict of DECLINE with a high risk score of 90/100. This step-by-step explanation shows how the model weighs multiple inputs but ultimately prioritizes credit history, providing a transparent, audit-friendly trail of the decision.
<think>
Okay, so I need to figure out who is most at risk of default based on the provided loan applications. Let's start by understanding the key factors that determine loan eligibility and risk. The main factors mentioned are income (both applicant and co-applicant), loan amount, loan term, credit history, education, marital status, and dependents.
First, I'll go through each application one by one and analyze these factors.
Starting with LP001002:
- Loan_ID: LP001002
- Gender: Male
- Married: False
- Dependents: 0
- Education: Graduate
- Self_Employed: False
- ApplicantIncome: 5849
- CoapplicantIncome: 0.0
- LoanAmount: None
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Urban
- Loan_Status: Y
Since LoanAmount is None, it's unclear, but the income is 5849, which is decent. Credit history is good (1). No co-applicant income. Since loan amount isn't provided, it's hard to assess the ratio, but with good income and credit, risk is lower.
Next, LP001003:
- Loan_ID: LP001003
- Gender: Male
- Married: True
- Dependents: 1
- Education: Graduate
- Self_Employed: False
- ApplicantIncome: 4583
- CoapplicantIncome: 1508.0
- LoanAmount: 128
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Rural
- Loan_Status: N
Combined income is 4583 + 1508 = 6091. Loan amount is 128. The ratio is 128 / 6091 ≈ 0.021, which is very low. Credit history is good. So, low risk. But loan status is 'N', so maybe other factors, but based on given data, seems okay.
LP001005:
- Loan_ID: LP001005
- Gender: Male
- Married: True
- Dependents: 0
- Education: Graduate
- Self_Employed: True
- ApplicantIncome: 3000
- CoapplicantIncome: 0.0
- LoanAmount: 66
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Urban
- Loan_Status: Y
Self-employed, which might be a risk, but income is 3000, loan amount 66. Ratio is 66/3000 ≈ 0.022. Good credit. So, low risk.
LP001006:
- Loan_ID: LP001006
- Gender: Male
- Married: True
- Dependents: 0
- Education: Not Graduate
- Self_Employed: False
- ApplicantIncome: 2583
- CoapplicantIncome: 2358.0
- LoanAmount: 120
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Urban
- Loan_Status: Y
Combined income: 2583 + 2358 = 4941. Loan amount 120. Ratio ≈ 0.024. Good credit. Low risk.
LP001008:
- Loan_ID: LP001008
- Gender: Male
- Married: False
- Dependents: 0
- Education: Graduate
- Self_Employed: False
- ApplicantIncome: 6000
- CoapplicantIncome: 0.0
- LoanAmount: 141
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Urban
- Loan_Status: Y
Income is 6000, loan 141. Ratio ≈ 0.0235. Good credit. Low risk.
LP001011:
- Loan_ID: LP001011
- Gender: Male
- Married: True
- Dependents: 2
- Education: Graduate
- Self_Employed: True
- ApplicantIncome: 5417
- CoapplicantIncome: 4196.0
- LoanAmount: 267
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Urban
- Loan_Status: Y
Combined income: 5417 + 4196 = 9613. Loan amount 267. Ratio ≈ 0.0278. Good credit. Self-employed, but ratio is okay. Moderate risk.
LP001013:
- Loan_ID: LP001013
- Gender: Male
- Married: True
- Dependents: 0
- Education: Not Graduate
- Self_Employed: False
- ApplicantIncome: 2333
- CoapplicantIncome: 1516.0
- LoanAmount: 95
- Loan_Amount_Term: 360
- Credit_History: 1
- Property_Area: Urban
- Loan_Status: Y
Combined income: 2333 + 1516 = 3849. Loan 95. Ratio ≈ 0.0246. Good credit. Low risk.
LP001014:
- Loan_ID: LP001014
- Gender: Male
- Married: True
- Dependents: 3+
- Education: Graduate
- Self_Employed: False
- ApplicantIncome: 3036
- CoapplicantIncome: 2504.0
- LoanAmount: 158
- Loan_Amount_Term: 360
- Credit_History: 0
- Property_Area: Semiurban
- Loan_Status: N
Combined income: 3036 + 2504 = 5540. Loan amount 158. Ratio ≈ 0.0284. But credit history is 0, which is poor. This is a red flag. Even though the ratio is low, poor credit history usually leads to decline unless strong income and low loan, which is the case here, but credit history is a major factor.
So, LP001014 has the highest risk because of poor credit history despite a manageable loan amount. The credit history being 0 indicates a high risk of default. Even though the income is sufficient and the loan amount isn't too high, the bad credit history is a significant negative factor. Therefore, this applicant is the most at risk of defaulting on the loan.
</think>
The applicant most at risk of default is LP001014. Here's the breakdown:
**Loan_ID:** LP001014
**Reasons for High Risk:**
1. **Poor Credit History:** The credit history score is 0, indicating a high risk of default.
2. **Combined Income and Loan Amount:** Although the combined income (5540) and loan amount (158) ratio is manageable (≈0.0284), the poor credit history outweighs these positive factors.
3. **Dependents:** Having 3+ dependents may add financial strain, potentially affecting loan repayment capabilities.
**Conclusion:** Despite a manageable loan-to-income ratio, the poor credit history significantly increases the risk of default, making LP001014 the highest risk applicant.
**Risk Assessment:**
- **Verdict:** DECLINE
- **Risk Score:** 90/100 (High Risk)
- **Reasons:** Poor credit history (0), despite manageable loan amount and sufficient income.
- **Recommendations:** Improve credit score, consider a co-applicant with a better credit history, or apply for a smaller loan amount.
The complete source code for this project is available on GitHub.
Reasoning models replace black-box lending with explainable decisions. Instead of a simple approve/decline, the model evaluates income, loan size, term, credit history, and supportive signals against clear rules, then returns a verdict with reasons and recommendations. The outcome: decisions that are consistent and auditable for risk and compliance, easier for customers to understand, and fairer across the portfolio.
Want to explore this for your portfolio? Let’s talk.