Agentic Design#

How and why this platform uses LLM-based reasoning instead of fixed automation rules.


What “Agentic” Means Here#

A traditional automation system executes fixed steps in a fixed order:

Automation:   User → fixed code → fixed query → fixed formula → result

An agentic system uses an LLM as a reasoning and decision layer on top of deterministic tools (APIs, DB queries, math):

Agentic:   User → LLM reasons about intent
                → decides which tools to call and in what order
                → executes tools (APIs, DB, math — deterministic)
                → evaluates result quality
                → loops if result is not good enough
                → returns result

Key principle: LLM never does math. LLM never calls APIs directly. LLM classifies, infers, decides, evaluates, and generates text. Everything else is deterministic code.


Agentic Concepts Implemented#

Concept

Plain English

Where Used

Tool Use

LLM picks and calls tools based on intent

Chat Agent

ReAct Loop

Reason → Act → Observe → repeat until done

Chat Agent

Confidence-Gated Routing

LLM rates its own certainty — low confidence triggers clarification

Chat Agent

Context Carry-Forward

Multi-turn memory across chat messages

Chat Agent

Dynamic Query Planning

LLM generates search variants, not one fixed string

Scout

Waterfall with Graceful Degradation

Try provider A → B → C → fallback, never crash

Enrichment

Context-Aware Generation

LLM reads company signals, reasons before writing

Writer

Self-Critique / Reflection

Agent evaluates its own output on a rubric

Writer + Critic

Iterative Refinement

Rewrite loop — max 2 attempts with targeted feedback

Writer

Uncertainty Flagging

Agent signals when it can’t reach confidence threshold

Writer

Human-in-the-Loop (HITL)

Human approval gates before irreversible actions

Leads + Emails

Observable Execution

Live counters, run tracking, completion notifications

All agents

Learning from Feedback

Historical reply rates bias future angle selection

Writer + Tracker


Full Pipeline — Where Agentic Behaviors Fire#

Scout
  │  [Tool Use] — Chat LLM picks search_companies tool from user intent
  │  [Dynamic Query] — LLM builds 3–5 query variants + quality retry loop
  ▼
companies table

Analyst
  │  [LLM Scoring] — LLM narrates score_reason from company signals
  │  [Data Inference] — Apollo fallback when employee_count missing
  ▼
lead_scores  +  company_features

    ┌─── [HITL Gate 1] ─────────────────────────────────────────┐
    │  Human reviews leads on /leads page                       │
    │  Approve → company.status = "approved"                    │
    │  Reject → company excluded from Writer                    │
    └───────────────────────────────────────────────────────────┘

Contact Enrichment
  │  [Waterfall] — Hunter → Apollo → Scraper → Serper →
  │               Snov → Prospeo → ZeroBounce → Permutation
  │  [Skip Flags] — _hunter_blocked / _apollo_blocked skip after 429/403
  │  [Quality Gates] — placeholder filter, domain integrity check
  ▼
contacts table

Writer
  │  [Learning] — get_best_angle() reads email_win_rate
  │  [Context-Aware Generation] — LLM reasons about angle before writing
  │  [Self-Critique] — Critic LLM scores draft 0–10
  │  [Iterative Refinement] — rewrite loop (max 2)
  │  [Uncertainty Flagging] — low_confidence=True if never reaches 7/10
  │  [Observable Execution] — AgentRun.drafts_created counter
  ▼
email_drafts table

    ┌─── [HITL Gate 2] ─────────────────────────────────────────┐
    │  Human reviews drafts on /emails page                     │
    │  Approve → email sent immediately via SendGrid            │
    │  Reject → draft deleted, company reset to "approved"      │
    │  Edit+Approve → human edits, then sends                   │
    │  Regenerate → new Writer+Critic cycle                     │
    └───────────────────────────────────────────────────────────┘

Outreach
  │  Send approved first emails + 3-touch follow-up sequence (Day 3/7/14)
  │  [Guardrails] — unsubscribe block, daily send cap, no double-send
  ▼
outreach_events table

Tracker
  │  [Reply Classification] — LLM intent + rule-based fallback
  │  [Alert] — hot-lead notification to sales team on positive reply
  │  [Learning Write] — win rate updated on reply → feeds back to Writer
  ▼
email_win_rate  (feeds back to Writer on next run)

Concept Deep Dives#

Tool Use + ReAct Loop (Chat Agent)#

Concept: LangChain ReAct (Reason + Act) — the LLM reasons about user intent, picks a tool, observes the result, and decides whether to call another tool or respond.

Tech: LangChain create_agent, @tool decorator, SystemMessage / HumanMessage

File: agents/chat_agent.py

User: "find 10 healthcare companies in Buffalo"
  ↓
LLM reasons: "user wants to discover companies → call search_companies"
  ↓
Calls: search_companies(industry="healthcare", location="Buffalo NY", count=10)
  ↓
Observes: 10 companies saved to DB
  ↓
LLM responds: "Found 10 healthcare companies in Buffalo"

Three-tier routing decides how much LLM power to use:

  • Tier 1 — Conversational: “thanks” → direct reply, no tools

  • Tier 2 — Intent parser: simple data query → Python extracts filters, calls tool directly

  • Tier 3 — Agent loop: complex/multi-step → full LangChain ReAct

Why: Tier 1 and 2 avoid unnecessary LLM calls — faster, cheaper, more reliable.

Confidence-Gated Routing: _extract_intent() makes one LLM call per message and returns a confidence value (0.0–1.0). If confidence < 0.65, the agent asks the user to clarify instead of guessing. This prevents wrong tool calls from ambiguous messages.


Waterfall with Graceful Degradation (Contact Enrichment)#

Concept: A sequence of providers where each failure silently falls through to the next. No single provider crashing stops the pipeline.

Tech: Python try/except per step, module-level skip flags, ordered fallback chain

File: agents/analyst/enrichment_client.py

try:
    contacts = find_via_hunter(domain)
except Exception:
    pass  # fall through to Apollo

try:
    contacts = find_via_apollo(domain)
except Exception:
    pass  # fall through to scraper
# ... 8 steps total

Skip flags (_hunter_blocked, _apollo_blocked): once a provider returns 429/403, all remaining companies in the run skip that provider — no wasted API calls.

The 8-step waterfall:

Step

Source

Notes

1

Hunter.io

Domain email search — 50 searches/month

2

Apollo.io

People search by company domain

3

Website scraper

mailto: + regex on /contact, /about

4

Serper

"CFO site:{domain}" Google result

5

Snov.io

Company domain bulk search

6

Prospeo

LinkedIn-backed two-step lookup

7

ZeroBounce

Email verification + contact

8

Permutation

firstname.lastname@domain patterns

Why: Hunter 429 and Apollo 403 used to crash the entire enrichment run. Now they fail silently — every company gets a best-effort result from whatever provider is available.


Context-Aware Generation (Writer)#

Concept: Instead of filling template placeholders, the LLM reads company context and decides the angle, tone, and content before writing a single word.

File: agents/writer/writer_agent.py_WRITER_PROMPT, _write_draft()

The prompt contains:

  • Company name, industry, city, state, site count

  • Estimated annual savings (low / mid / high from company_features)

  • Whether the state is deregulated

  • score_reason — the Analyst’s narrative (e.g. “3-site healthcare org in deregulated NY, ~$180k annual savings potential”)

  • Contact first name and title

  • Win rate hint: which angle has the highest reply rate in this industry

The LLM outputs REASONING:ANGLE:SUBJECT:BODY::

REASONING: Healthcare company with 3 sites in deregulated NY — electricity cost
           is the primary lever. VP of Operations will care about budget
           predictability more than raw savings. Lead with audit offer.
ANGLE: audit_offer
SUBJECT: Free energy audit for Regional General Hospital — 3 sites, $180k opportunity
BODY: Hi Sarah, ...

Why: Template-filling produces the same email for every company. Context-aware generation produces emails that read like someone actually researched the company.


Self-Critique + Iterative Refinement (Writer + Critic)#

Concept: A second LLM call evaluates the first LLM’s output on a rubric. If quality is below threshold, the writer rewrites with the specific feedback. Max 2 loops.

Files: agents/writer/critic_agent.py, agents/writer/writer_agent.py

Write draft (LLM call 1)
  ↓
Critic evaluates (LLM call 2):
  score=6, feedback="No savings figure. Subject too generic."
  ↓
Score < 7 → rewrite with feedback (LLM call 3)
  ↓
Critic re-evaluates (LLM call 4):
  score=8, passed=True → save draft

Critic rubric (5 criteria × 0–2 points = 10 max):

Criterion

Pass condition

Personalised

Mentions company name or a specific detail

Specific number

Contains a dollar figure, not just “significant savings”

Clear CTA

One specific ask — call, audit, or reply

Sounds human

Reads like a person wrote it

Subject line

Specific to this company, not generic

Threshold: ≥7 to pass. <7 → rewrite. Still <7 after 2 rewrites → low_confidence=True.

Why: Without evaluation, whatever the LLM produces on the first try gets sent. The Critic catches generic or vague drafts before a human wastes time reviewing them. In practice, ~60–70% of drafts pass on first try.


Uncertainty Flagging (Writer)#

Concept: When the agent cannot reach its own quality threshold after all retries, it explicitly signals uncertainty rather than silently passing a low-quality result.

Tech: low_confidence boolean on EmailDraft, warning banner in UI

File: agents/writer/writer_agent.py

low_confidence = not critic_result["passed"]  # True if never passed after 2 rewrites

The Email Review page shows a red warning banner on low_confidence=True drafts so the human knows to review carefully.

Why: Silently saving a 5/10 draft misleads the reviewer. Flagging it says: “the AI struggled here — pay extra attention or regenerate.”


Human-in-the-Loop (HITL) Gates#

Concept: The pipeline pauses at two checkpoints for human review before irreversible actions.

Tech: approved_human field on LeadScore and EmailDraft, HumanApprovalRequest table, SendGrid notification emails

Gate 1 — Lead Approval (after Analyst scores):

  • Human reviews leads on /leads page — score bar, tier badge, savings estimate

  • Approve → company.status='approved', lead_scores.approved_human=True

  • Reject → company.status='archived'

  • Notification email sent to ALERT_EMAIL with scored lead list + link

Gate 2 — Draft Approval (after Writer generates):

  • Human reviews drafts on /emails page — Approve & Send | Edit + Approve | Reject | Regenerate

  • Approve triggers immediate send via SendGrid

  • Reject → draft deleted, company resets to approved (re-enters Writer queue)

  • Notification email sent to ALERT_EMAIL with draft list + Critic scores + link

Why: LLM output is probabilistic — it can produce confident but wrong results. A misaddressed email or incorrect facts could damage a sales relationship. Human gates ensure no external action happens without sign-off.


Learning from Feedback (Writer + Tracker)#

Concept: The system tracks which email angles generate replies and biases future generation toward what has worked.

Tech: email_win_rate table, get_best_angle() in Writer, Tracker updates on reply events

Files: agents/writer/writer_agent.pyget_best_angle(), agents/tracker/tracker_agent.pyprocess_event()

Writer picks angle "audit_offer" for Healthcare company
  ↓
email_drafts.template_used = "audit_offer"
  ↓
Prospect replies positively
  ↓
Tracker: email_win_rate (industry="Healthcare", template_id="audit_offer")
  → replies_received += 1
  → reply_rate recalculated
  ↓
Next Writer run for Healthcare:
  get_best_angle("Healthcare") returns "audit_offer"
  → win rate hint injected into prompt
  → LLM prefers this angle unless company signals suggest otherwise

Cold start protection: Minimum 5 emails sent before win rate data is trusted (_WIN_RATE_MIN_SENT = 5). Below this, the hint is omitted and LLM picks freely.

The 5 trackable angles:

Angle

Lead with

cost_savings

Dollar savings estimate up front

audit_offer

Free no-commitment energy audit

risk_reduction

Utility cost volatility / budget risk

multi_site_savings

Multi-location efficiency opportunity

deregulation_opportunity

Open energy market / supplier switch


Observable Execution (All Agents)#

Concept: Every agent run is tracked in the database with live counters so humans can see progress and diagnose failures.

Tech: agent_runs + agent_run_logs tables, /pipeline/run/{run_id} endpoint

What gets tracked per run:

  • status: writer_runningwriter_awaiting_approval (or failed)

  • current_stage: updated at each step

  • companies_found, companies_scored, drafts_created: live counters

  • error_message: populated on failure

  • agent_run_logs: one row per tool call with duration_ms and output_summary

Why: Without run tracking, the frontend can only say “running…” or “done”. With agent_runs, the Triggers page shows “3/8 drafts created” in real time and failed runs can be diagnosed.


What Is NOT Agentic (Intentionally)#

Some parts deliberately stay rule-based and deterministic:

Component

Why it stays rule-based

Score formula

Score = (Recovery × 0.40) + ... — LLM would hallucinate numbers

DB queries

SQL is deterministic and auditable — LLM-generated SQL is unpredictable

Email sending

No reasoning needed — if approved, send

Score thresholds

Business rules (≥70 = high) — not a judgment call

Phone lookup

Waterfall of structured API responses — no reasoning needed

Principle: Use LLM where reasoning, inference, or language generation adds value. Use deterministic code where correctness and auditability matter more.


Technology Stack for Agentic Features#

Feature

Library / Tool

Chat ReAct loop

LangChain create_agent

Writer + Critic LLM calls

llm_connector.py wrapping Ollama / OpenAI

LLM (default)

Ollama + llama3.2 (local — zero cost, data stays on-machine)

LLM (optional)

OpenAI gpt-4o-mini — faster, ~$0.0015/email

Run tracking

SQLAlchemy + agent_runs table

HITL notifications

SendGrid (email_notifier.py)

Learning memory

PostgreSQL email_win_rate table

Waterfall state

Module-level flags (_hunter_blocked) — process-scoped, zero overhead


Agentic Maturity by Agent#

Agent

Agentic Capabilities

Chat

Full ReAct loop, tool use, confidence-gated routing, multi-turn context carry-forward

Writer

Context-aware generation, Critic self-critique loop, win-rate learning, uncertainty flagging

Analyst

LLM industry inference, data gap detection, re-enrichment loop (max 2), score narration

Scout

LLM query planning (3–5 variants), multi-source search, LLM deduplication, source quality learning

Tracker

LLM + rule-based reply classification, sales alerts, daily stuck-lead health checks

Outreach

3-touch follow-up sequence, daily cap guardrail, unsubscribe guard, LLM follow-up polish

Orchestrator

Pipeline sequencing, task dispatch + retry, HITL gate enforcement, health monitoring