Agentic Design#
How and why this platform uses LLM-based reasoning instead of fixed automation rules.
What “Agentic” Means Here#
A traditional automation system executes fixed steps in a fixed order:
Automation: User → fixed code → fixed query → fixed formula → result
An agentic system uses an LLM as a reasoning and decision layer on top of deterministic tools (APIs, DB queries, math):
Agentic: User → LLM reasons about intent
→ decides which tools to call and in what order
→ executes tools (APIs, DB, math — deterministic)
→ evaluates result quality
→ loops if result is not good enough
→ returns result
Key principle: LLM never does math. LLM never calls APIs directly. LLM classifies, infers, decides, evaluates, and generates text. Everything else is deterministic code.
Agentic Concepts Implemented#
Concept |
Plain English |
Where Used |
|---|---|---|
Tool Use |
LLM picks and calls tools based on intent |
Chat Agent |
ReAct Loop |
Reason → Act → Observe → repeat until done |
Chat Agent |
Confidence-Gated Routing |
LLM rates its own certainty — low confidence triggers clarification |
Chat Agent |
Context Carry-Forward |
Multi-turn memory across chat messages |
Chat Agent |
Dynamic Query Planning |
LLM generates search variants, not one fixed string |
Scout |
Waterfall with Graceful Degradation |
Try provider A → B → C → fallback, never crash |
Enrichment |
Context-Aware Generation |
LLM reads company signals, reasons before writing |
Writer |
Self-Critique / Reflection |
Agent evaluates its own output on a rubric |
Writer + Critic |
Iterative Refinement |
Rewrite loop — max 2 attempts with targeted feedback |
Writer |
Uncertainty Flagging |
Agent signals when it can’t reach confidence threshold |
Writer |
Human-in-the-Loop (HITL) |
Human approval gates before irreversible actions |
Leads + Emails |
Observable Execution |
Live counters, run tracking, completion notifications |
All agents |
Learning from Feedback |
Historical reply rates bias future angle selection |
Writer + Tracker |
Full Pipeline — Where Agentic Behaviors Fire#
Scout
│ [Tool Use] — Chat LLM picks search_companies tool from user intent
│ [Dynamic Query] — LLM builds 3–5 query variants + quality retry loop
▼
companies table
Analyst
│ [LLM Scoring] — LLM narrates score_reason from company signals
│ [Data Inference] — Apollo fallback when employee_count missing
▼
lead_scores + company_features
┌─── [HITL Gate 1] ─────────────────────────────────────────┐
│ Human reviews leads on /leads page │
│ Approve → company.status = "approved" │
│ Reject → company excluded from Writer │
└───────────────────────────────────────────────────────────┘
Contact Enrichment
│ [Waterfall] — Hunter → Apollo → Scraper → Serper →
│ Snov → Prospeo → ZeroBounce → Permutation
│ [Skip Flags] — _hunter_blocked / _apollo_blocked skip after 429/403
│ [Quality Gates] — placeholder filter, domain integrity check
▼
contacts table
Writer
│ [Learning] — get_best_angle() reads email_win_rate
│ [Context-Aware Generation] — LLM reasons about angle before writing
│ [Self-Critique] — Critic LLM scores draft 0–10
│ [Iterative Refinement] — rewrite loop (max 2)
│ [Uncertainty Flagging] — low_confidence=True if never reaches 7/10
│ [Observable Execution] — AgentRun.drafts_created counter
▼
email_drafts table
┌─── [HITL Gate 2] ─────────────────────────────────────────┐
│ Human reviews drafts on /emails page │
│ Approve → email sent immediately via SendGrid │
│ Reject → draft deleted, company reset to "approved" │
│ Edit+Approve → human edits, then sends │
│ Regenerate → new Writer+Critic cycle │
└───────────────────────────────────────────────────────────┘
Outreach
│ Send approved first emails + 3-touch follow-up sequence (Day 3/7/14)
│ [Guardrails] — unsubscribe block, daily send cap, no double-send
▼
outreach_events table
Tracker
│ [Reply Classification] — LLM intent + rule-based fallback
│ [Alert] — hot-lead notification to sales team on positive reply
│ [Learning Write] — win rate updated on reply → feeds back to Writer
▼
email_win_rate (feeds back to Writer on next run)
Concept Deep Dives#
Tool Use + ReAct Loop (Chat Agent)#
Concept: LangChain ReAct (Reason + Act) — the LLM reasons about user intent, picks a tool, observes the result, and decides whether to call another tool or respond.
Tech: LangChain create_agent, @tool decorator, SystemMessage / HumanMessage
File: agents/chat_agent.py
User: "find 10 healthcare companies in Buffalo"
↓
LLM reasons: "user wants to discover companies → call search_companies"
↓
Calls: search_companies(industry="healthcare", location="Buffalo NY", count=10)
↓
Observes: 10 companies saved to DB
↓
LLM responds: "Found 10 healthcare companies in Buffalo"
Three-tier routing decides how much LLM power to use:
Tier 1 — Conversational: “thanks” → direct reply, no tools
Tier 2 — Intent parser: simple data query → Python extracts filters, calls tool directly
Tier 3 — Agent loop: complex/multi-step → full LangChain ReAct
Why: Tier 1 and 2 avoid unnecessary LLM calls — faster, cheaper, more reliable.
Confidence-Gated Routing: _extract_intent() makes one LLM call per message and returns a confidence value (0.0–1.0). If confidence < 0.65, the agent asks the user to clarify instead of guessing. This prevents wrong tool calls from ambiguous messages.
Waterfall with Graceful Degradation (Contact Enrichment)#
Concept: A sequence of providers where each failure silently falls through to the next. No single provider crashing stops the pipeline.
Tech: Python try/except per step, module-level skip flags, ordered fallback chain
File: agents/analyst/enrichment_client.py
try:
contacts = find_via_hunter(domain)
except Exception:
pass # fall through to Apollo
try:
contacts = find_via_apollo(domain)
except Exception:
pass # fall through to scraper
# ... 8 steps total
Skip flags (_hunter_blocked, _apollo_blocked): once a provider returns 429/403, all remaining companies in the run skip that provider — no wasted API calls.
The 8-step waterfall:
Step |
Source |
Notes |
|---|---|---|
1 |
Domain email search — 50 searches/month |
|
2 |
People search by company domain |
|
3 |
Website scraper |
|
4 |
Serper |
|
5 |
Company domain bulk search |
|
6 |
Prospeo |
LinkedIn-backed two-step lookup |
7 |
ZeroBounce |
Email verification + contact |
8 |
Permutation |
|
Why: Hunter 429 and Apollo 403 used to crash the entire enrichment run. Now they fail silently — every company gets a best-effort result from whatever provider is available.
Context-Aware Generation (Writer)#
Concept: Instead of filling template placeholders, the LLM reads company context and decides the angle, tone, and content before writing a single word.
File: agents/writer/writer_agent.py — _WRITER_PROMPT, _write_draft()
The prompt contains:
Company name, industry, city, state, site count
Estimated annual savings (low / mid / high from
company_features)Whether the state is deregulated
score_reason— the Analyst’s narrative (e.g. “3-site healthcare org in deregulated NY, ~$180k annual savings potential”)Contact first name and title
Win rate hint: which angle has the highest reply rate in this industry
The LLM outputs REASONING: → ANGLE: → SUBJECT: → BODY::
REASONING: Healthcare company with 3 sites in deregulated NY — electricity cost
is the primary lever. VP of Operations will care about budget
predictability more than raw savings. Lead with audit offer.
ANGLE: audit_offer
SUBJECT: Free energy audit for Regional General Hospital — 3 sites, $180k opportunity
BODY: Hi Sarah, ...
Why: Template-filling produces the same email for every company. Context-aware generation produces emails that read like someone actually researched the company.
Self-Critique + Iterative Refinement (Writer + Critic)#
Concept: A second LLM call evaluates the first LLM’s output on a rubric. If quality is below threshold, the writer rewrites with the specific feedback. Max 2 loops.
Files: agents/writer/critic_agent.py, agents/writer/writer_agent.py
Write draft (LLM call 1)
↓
Critic evaluates (LLM call 2):
score=6, feedback="No savings figure. Subject too generic."
↓
Score < 7 → rewrite with feedback (LLM call 3)
↓
Critic re-evaluates (LLM call 4):
score=8, passed=True → save draft
Critic rubric (5 criteria × 0–2 points = 10 max):
Criterion |
Pass condition |
|---|---|
Personalised |
Mentions company name or a specific detail |
Specific number |
Contains a dollar figure, not just “significant savings” |
Clear CTA |
One specific ask — call, audit, or reply |
Sounds human |
Reads like a person wrote it |
Subject line |
Specific to this company, not generic |
Threshold: ≥7 to pass. <7 → rewrite. Still <7 after 2 rewrites → low_confidence=True.
Why: Without evaluation, whatever the LLM produces on the first try gets sent. The Critic catches generic or vague drafts before a human wastes time reviewing them. In practice, ~60–70% of drafts pass on first try.
Uncertainty Flagging (Writer)#
Concept: When the agent cannot reach its own quality threshold after all retries, it explicitly signals uncertainty rather than silently passing a low-quality result.
Tech: low_confidence boolean on EmailDraft, warning banner in UI
File: agents/writer/writer_agent.py
low_confidence = not critic_result["passed"] # True if never passed after 2 rewrites
The Email Review page shows a red warning banner on low_confidence=True drafts so the human knows to review carefully.
Why: Silently saving a 5/10 draft misleads the reviewer. Flagging it says: “the AI struggled here — pay extra attention or regenerate.”
Human-in-the-Loop (HITL) Gates#
Concept: The pipeline pauses at two checkpoints for human review before irreversible actions.
Tech: approved_human field on LeadScore and EmailDraft, HumanApprovalRequest table, SendGrid notification emails
Gate 1 — Lead Approval (after Analyst scores):
Human reviews leads on
/leadspage — score bar, tier badge, savings estimateApprove →
company.status='approved',lead_scores.approved_human=TrueReject →
company.status='archived'Notification email sent to
ALERT_EMAILwith scored lead list + link
Gate 2 — Draft Approval (after Writer generates):
Human reviews drafts on
/emailspage — Approve & Send | Edit + Approve | Reject | RegenerateApprove triggers immediate send via SendGrid
Reject → draft deleted, company resets to
approved(re-enters Writer queue)Notification email sent to
ALERT_EMAILwith draft list + Critic scores + link
Why: LLM output is probabilistic — it can produce confident but wrong results. A misaddressed email or incorrect facts could damage a sales relationship. Human gates ensure no external action happens without sign-off.
Learning from Feedback (Writer + Tracker)#
Concept: The system tracks which email angles generate replies and biases future generation toward what has worked.
Tech: email_win_rate table, get_best_angle() in Writer, Tracker updates on reply events
Files: agents/writer/writer_agent.py — get_best_angle(), agents/tracker/tracker_agent.py — process_event()
Writer picks angle "audit_offer" for Healthcare company
↓
email_drafts.template_used = "audit_offer"
↓
Prospect replies positively
↓
Tracker: email_win_rate (industry="Healthcare", template_id="audit_offer")
→ replies_received += 1
→ reply_rate recalculated
↓
Next Writer run for Healthcare:
get_best_angle("Healthcare") returns "audit_offer"
→ win rate hint injected into prompt
→ LLM prefers this angle unless company signals suggest otherwise
Cold start protection: Minimum 5 emails sent before win rate data is trusted (_WIN_RATE_MIN_SENT = 5). Below this, the hint is omitted and LLM picks freely.
The 5 trackable angles:
Angle |
Lead with |
|---|---|
|
Dollar savings estimate up front |
|
Free no-commitment energy audit |
|
Utility cost volatility / budget risk |
|
Multi-location efficiency opportunity |
|
Open energy market / supplier switch |
Observable Execution (All Agents)#
Concept: Every agent run is tracked in the database with live counters so humans can see progress and diagnose failures.
Tech: agent_runs + agent_run_logs tables, /pipeline/run/{run_id} endpoint
What gets tracked per run:
status:writer_running→writer_awaiting_approval(orfailed)current_stage: updated at each stepcompanies_found,companies_scored,drafts_created: live counterserror_message: populated on failureagent_run_logs: one row per tool call with duration_ms and output_summary
Why: Without run tracking, the frontend can only say “running…” or “done”. With agent_runs, the Triggers page shows “3/8 drafts created” in real time and failed runs can be diagnosed.
What Is NOT Agentic (Intentionally)#
Some parts deliberately stay rule-based and deterministic:
Component |
Why it stays rule-based |
|---|---|
Score formula |
|
DB queries |
SQL is deterministic and auditable — LLM-generated SQL is unpredictable |
Email sending |
No reasoning needed — if approved, send |
Score thresholds |
Business rules (≥70 = high) — not a judgment call |
Phone lookup |
Waterfall of structured API responses — no reasoning needed |
Principle: Use LLM where reasoning, inference, or language generation adds value. Use deterministic code where correctness and auditability matter more.
Technology Stack for Agentic Features#
Feature |
Library / Tool |
|---|---|
Chat ReAct loop |
LangChain |
Writer + Critic LLM calls |
|
LLM (default) |
Ollama + |
LLM (optional) |
OpenAI |
Run tracking |
SQLAlchemy + |
HITL notifications |
SendGrid ( |
Learning memory |
PostgreSQL |
Waterfall state |
Module-level flags ( |
Agentic Maturity by Agent#
Agent |
Agentic Capabilities |
|---|---|
Chat |
Full ReAct loop, tool use, confidence-gated routing, multi-turn context carry-forward |
Writer |
Context-aware generation, Critic self-critique loop, win-rate learning, uncertainty flagging |
Analyst |
LLM industry inference, data gap detection, re-enrichment loop (max 2), score narration |
Scout |
LLM query planning (3–5 variants), multi-source search, LLM deduplication, source quality learning |
Tracker |
LLM + rule-based reply classification, sales alerts, daily stuck-lead health checks |
Outreach |
3-touch follow-up sequence, daily cap guardrail, unsubscribe guard, LLM follow-up polish |
Orchestrator |
Pipeline sequencing, task dispatch + retry, HITL gate enforcement, health monitoring |