Analyst Agent#
Tech Stack Used#
Tech |
Purpose |
|---|---|
LangChain ( |
LLM calls in |
Ollama / OpenAI |
LLM provider via |
Apollo API ( |
Company enrichment (employee count) + contact finding |
Hunter API |
Contact/email finding for decision-makers |
SQLAlchemy ORM |
All DB reads/writes — |
JSON seed file ( |
Benchmark data for spend estimation — loaded once, cached in memory |
Python |
HTTP calls to Apollo + Hunter APIs |
File-by-File Breakdown#
1. agents/analyst/analyst_agent.py — Coordinator#
Entry point: run(company_ids, db_session, run_id, on_progress) at line 75
Loops over each company ID, calls process_one_company(), tracks progress via on_progress callback (used by UI for live updates), writes to agent_run_logs after each company.
Full pipeline per company — process_one_company() at line 176:
1. gather_company_data() → enrichment loop (crawl → Apollo → LLM → re-enrich)
2. spend_calculator → utility + telecom spend estimates
3. savings_calculator → low/mid/high savings range
4. score_engine.compute_score() → 0–100 composite score
5. score_engine.assign_tier() → high / medium / low
6. llm_inspector.generate_score_narrative() → 1-sentence human explanation
7. save_features() → writes CompanyFeature row
8. save_score() → writes LeadScore row
9. company.status = "scored" → updates Company row
2. agents/analyst/analyst_agent.py — gather_company_data() at line 279#
Agentic concept: Adaptive Re-enrichment Loop
This is the intelligent data-gathering phase. It doesn’t just crawl once — it inspects what’s missing and decides whether to try again:
Step 1: website_crawler.crawl_company_site() ← only if site_count or employee_count missing
Step 2: enrichment_client.enrich_company_data() ← Apollo API fallback if employee_count still 0
Step 3: llm_inspector.inspect_company() ← LLM decides: "score_now" OR "enrich_before_scoring"
Step 4: Re-enrichment loop (max 2 attempts) ← only if LLM says action="enrich_before_scoring"
LLM is skipped entirely if industry is known AND employee_count > 0 AND site_count > 0 — no tokens wasted when data is already complete.
3. agents/analyst/llm_inspector.py — Two LLM Jobs#
Agentic concept: LLM as Data Quality Judge + Narrative Generator
Job 1 — inspect_company() at line 81:
Sends company name, website, industry, employee count, site count, and crawled text excerpt (600 chars) to LLM via LangChain
HumanMessageLLM returns structured JSON:
{
"inferred_industry": "healthcare",
"data_gaps": ["employee_count"],
"action": "enrich_before_scoring",
"confidence": "high"
}
If
inferred_industryis returned and DB value was"unknown", it overwrites itaction = "enrich_before_scoring"triggers the re-enrichment loop ingather_company_data()Falls back to
{"action": "score_now", "confidence": "low"}on any LLM failure
Job 2 — generate_score_narrative() at line 183:
Sends score, tier, savings estimate, industry, employee count, sites, state, deregulated flag to LLM
LLM writes a single sentence (max 25 words) explaining why this company scored the way it did
Example output: “5-site healthcare group in NY’s deregulated market with $420k in recoverable utility savings.”
Falls back to
_fallback_narrative()(rule-based template) if LLM fails
Both use LangChain HumanMessage → Ollama or OpenAI via _call_llm() at line 38.
4. agents/analyst/enrichment_client.py — Apollo + Hunter#
Two jobs:
enrich_company_data(domain) at line 64 — Apollo organization enrichment:
POST https://api.apollo.io/api/v1/organizations/enrichwith{"domain": "example.com"}Returns
employee_count,city,stateSilently returns
{}ifAPOLLO_API_KEYmissing or domain unknown
find_contacts(company_name, domain, db) — Hunter + Apollo people search:
Targets decision-maker titles: CFO, VP Finance, Director of Facilities, Energy Manager, etc.
Title priority ranking: CFO=1 → VP Finance=2 → Director of Facilities=3 → VP Operations=4
Module-level flags
_hunter_blocked/_apollo_blockedskip providers for the rest of the run if rate-limited
5. agents/analyst/spend_calculator.py — Spend Estimation#
No LLM. Pure math from benchmark data.
utility_spend = site_count × avg_sqft_per_site × kwh_per_sqft_per_year × electricity_rate
telecom_spend = employee_count × telecom_per_employee
total_spend = utility_spend + telecom_spend
Key functions:
Function |
Line |
Purpose |
|---|---|---|
|
13 |
Site count × sqft × kWh × rate |
|
25 |
Employee count × telecom benchmark |
|
32 |
Sums both |
All benchmark values come from benchmarks_loader.get_benchmark(industry, state).
6. agents/analyst/benchmarks_loader.py — Benchmark Data#
Loads
database/seed_data/industry_benchmarks.jsononce at startup, caches in_BENCHMARKS_CACHEget_benchmark(industry, state)at line 34 — returnsavg_sqft_per_site,kwh_per_sqft_per_year,telecom_per_employee,electricity_rateget_electricity_rate(state)at line 65 — state-level $/kWh rates; defaults to0.12if state unknownrefresh_benchmarks()at line 74 — clears cache to force reload (used in tests)
7. agents/analyst/savings_calculator.py — Savings Range#
Three-bracket estimate from total spend:
Bracket |
Rate |
Formula |
|---|---|---|
Low |
10% |
|
Mid |
13.5% |
|
High |
17% |
|
calculate_tb_revenue(savings_mid) — multiplies savings_mid × TB_CONTINGENCY_FEE (default 24%) to get Troy & Banks expected revenue.
8. agents/analyst/score_engine.py — Composite Scoring#
compute_score() at line 38 — weighted 0–100 formula:
Component |
Max Points |
Driver |
|---|---|---|
Recovery (savings_mid) |
40 pts |
≥\(2M=100pts, ≥\)1M=85, ≥\(500k=70, ≥\)250k=55, below=40 |
Industry fit |
25 pts |
healthcare/hospitality/manufacturing/retail=90, public_sector/office=70, unknown=45 |
Multi-site |
20 pts |
≥20 sites=20, ≥10=17, ≥5=13, ≥2=8, 1 site=3 |
Data quality |
15 pts |
0–10 score → mapped to 1/4/8/12/15 pts |
Weights are configurable via settings.SCORE_WEIGHT_RECOVERY, SCORE_WEIGHT_INDUSTRY, etc.
assign_tier() at line 61:
Score |
Tier |
|---|---|
≥ |
|
≥ |
|
Below |
|
assess_data_quality() at line 106 — 0–10 quality signal:
Signal |
Points |
|---|---|
Has website |
+2 |
Has locations page |
+2 |
site_count > 0 |
+2 |
employee_count > 0 |
+2 |
Contact found in DB |
+2 |
Key Agentic Concepts Used#
Concept |
Tool / Tech |
Where |
|---|---|---|
Adaptive Re-enrichment Loop |
|
|
LLM as Data Quality Judge |
LangChain + Ollama/OpenAI |
|
LLM Narrative Generation |
LangChain + Ollama/OpenAI |
|
Benchmark-driven Spend Estimation |
JSON seed file + |
|
Contact Targeting by Title Priority |
Apollo + Hunter APIs |
|
Progress Streaming |
|
|
What Gets Written to DB#
Table |
Written by |
Contents |
|---|---|---|
|
|
site_count, utility_spend, telecom_spend, savings low/mid/high, industry_fit_score, deregulated_state, data_quality_score |
|
|
score (0–100), tier, score_reason (LLM narrative), |
|
|
status → |
|
|
decision-maker emails from Hunter/Apollo |
|
|
per-company action log for UI |
Full Data Flow#
run(company_ids)
└─ for each company_id:
process_one_company()
│
├─ gather_company_data()
│ ├─ website_crawler.crawl_company_site() ← reuses Scout's crawler
│ ├─ enrichment_client.enrich_company_data() ← Apollo API (employee_count)
│ ├─ llm_inspector.inspect_company() ← LangChain → Ollama/OpenAI
│ └─ re-enrichment loop (max 2x) if LLM says "enrich_before_scoring"
│
├─ spend_calculator.calculate_utility_spend() ← benchmark JSON × site_count
├─ spend_calculator.calculate_telecom_spend() ← benchmark JSON × employee_count
├─ savings_calculator.calculate_all_savings() ← 10% / 13.5% / 17% of total_spend
│
├─ score_engine.compute_score() ← weighted 4-component formula
├─ score_engine.assign_tier() ← high / medium / low
│
├─ llm_inspector.generate_score_narrative() ← LangChain → 1-sentence explanation
│
├─ save_features() → CompanyFeature DB row
├─ save_score() → LeadScore DB row
└─ company.status = "scored"