Delectable AI: LLM Dependency Levers

Not Everything Needs
an LLM Call.

Our architecture is intentionally designed with levers to dial LLM dependency up or down — optimizing for speed, reducing latency, cutting cost, and improving reliability by using deterministic algorithms where an LLM adds no value.

The LLM is reserved for what it does best: natural language understanding, creative meal planning, and conversational flow. Everything else — ranking, caching, unit conversion, package optimization — is handled by purpose-built, auditable, fast code.

The LLM Dependency Spectrum

Every operation falls somewhere on a spectrum from "pure LLM reasoning" to "fully deterministic." We've deliberately shifted most operations left — toward speed and reliability.

Deterministic / Fast

LLM-Dependent / Flexible

⚡

Caching

Eliminate redundant LLM and API calls. Stale-while-revalidate ensures users never wait.

<1ms

cache hit

vs

800ms

BQ query

🔧

Deterministic Algorithms

Replace LLM reasoning with auditable, optimal solutions. ILP solver, unit tables, ranking stages.

Optimal

guaranteed

vs

~80%

LLM accuracy

🚀

Pre-computation

Pre-compute profiles, playbooks, semantic bridges. Context is ready before the LLM sees the query.

5ms

pre-built context

vs

1300ms

real-time fetch

💡

Why This Matters

Every LLM token costs money. Every LLM call adds latency. By routing deterministic operations away from the LLM, we achieve:

60-70%

of operations avoid the LLM entirely

260x

faster ranking via Redis cache

$0

LLM cost per cached response

100%

auditable deterministic decisions

4-Layer Caching Architecture

Every layer catches requests before they hit expensive downstream services. The LLM only runs when no cached answer exists.

Stale-While-Revalidate Pattern

The ProfileCache never blocks the user. It serves slightly stale data instantly while refreshing in the background.

// ProfileCache: stale-while-revalidate
SOFT_TTL = 1 hour   // After this: serve stale, refresh in background
HARD_TTL = 4 hours  // After this: block and refetch from BQ
Timeline for Shopper #4821:
  T+0:   First visit  → BQ query (~800ms) → cache profile
  T+30m:  Return visit  → cache hit (<1ms) → fresh data
  T+90m:  Return visit  → cache hit (<1ms) → stale data, background refresh triggered
  T+91m:  Next request  → cache hit (<1ms) → fresh data (background refresh completed)
  T+5h:   Return visit  → cache expired → BQ query (~800ms) → re-cache

Deterministic Algorithms Replace LLM Reasoning

Where the answer is computable, we compute it. The LLM is reserved for genuinely creative or ambiguous tasks.

Integer Linear Programming (ILP) Solver

OPTIMAL

Replaces LLM-based ingredient consolidation, unit conversion, and package selection with a mathematically optimal solution.

Without ILP (LLM Does Everything)

User: "Add chicken stir fry and pad thai"
LLM: "You need chicken for both. I think
about 2 pounds should cover it... maybe
get the 1.5lb pack? Actually, 2 packs
of 1lb should work."
// Guesswork. Not optimal. May over/under buy.
// No brand preference. No price optimization.

With ILP Solver (Deterministic)

// Aggregate: chicken_breast = 680g total
solver = SCIP()
minimize: total_cost
subject_to: total_qty >= 680g
weights:
  -10% if previously_purchased
  -15% if preferred_brand
  +5% national_brand_penalty
Result: GE Organic 1.5lb ($9.49) = optimal

Deterministic Unit Conversion

200+ RULES

LLMs get unit math wrong ~20% of the time. Our lookup table is always correct.

3 tbsp + 1/4 cup
↓ 0.4375 cups

12 oz + 0.5 lb
↓ 680g (metric)

1/2 dozen + 3 eggs
↓ 9 count

11-Stage Ranking Pipeline

DETERMINISTIC

Instead of asking the LLM to "pick the best products," we run a configurable pipeline of deterministic stages. Each stage has clear, auditable logic.

LLM-Only Approach

"Here are 50 products. Please rerank
them considering the user's purchase
history, dietary needs, freshness
preference, and health goals."
// 2000+ tokens in, 2000+ tokens out
// ~3-5 seconds latency
// Non-deterministic ordering
// No audit trail

Pipeline Approach

relevance_filter    → drop 8 false friends
dietary_hard_filter  → exclude 3 (allergy)
freshness_rerank     → boost fresh +200pts
history_boost        → past buys +300pts
health_propensity    → sodium -20%, protein +15%
// 0 LLM tokens. ~5ms total.
// Fully deterministic & auditable.

Deterministic Shopping List Merge

INSTANT

When new ingredients need to merge with an existing cart, we use canonical name matching and unit normalization — no LLM needed.

// Existing cart: [{name: "Chicken Breast", qty: 1, unit: "lb"}]
// New recipe needs: [{name: "chicken breast", qty: 680, unit: "g"}]
1. Normalize: "chicken breast" == "Chicken Breast" ✓ match
2. Convert: 1 lb = 454g, 454g + 680g = 1134g
3. Output: {action: "update", merged_qty: 1134, unit: "g"}
// Structured action, not prose. UX layer handles display.

Virtual Pantry Deduction

FORMULA

Instead of the LLM guessing "you probably have salt at home," we compute estimated stock using an exponential decay model.

estimated_stock = quantity × e(-days_since_purchase / shelf_life)
Example: Olive Oil purchased 14 days ago, shelf_life=180 days
  stock = 1.0 × e^(-14/180) = 0.925 = 92.5% remaining → SKIP
Example: Milk purchased 6 days ago, shelf_life=10 days
  stock = 1.0 × e^(-6/10) = 0.549 = 54.9% remaining → "May still have"
Example: Bananas purchased 5 days ago, shelf_life=7 days
  stock = 1.0 × e^(-5/7) = 0.489 = 48.9% remaining → ADD TO LIST

Pre-computed Context & Structured Templates

Instead of the LLM figuring things out at runtime, we pre-build structured context that's ready before the conversation starts.

Structured Profile Templates

PRE-BUILT

Instead of giving the LLM raw propensity scores and expecting it to interpret them, we pre-classify into structured rules with clear categories and thresholds.

Raw Profile (LLM Must Interpret)

propensity_vegan: 0.95
propensity_low_sodium: 0.78
propensity_organic: 0.62
propensity_gluten_free: 0.12
// LLM must decide: Is 0.78 "important"?
// Is 0.12 a constraint or noise?
// Should vegan be a HARD filter?

Structured Rules (Pre-Classified)

RestrictionRule(
  category="strict_dietary",
  constraint="exclude",
  items=["dairy","eggs","meat","honey"],
  propensity=0.95, strength="HARD"
)
PropensitySignal(
  name="low_sodium",
  score=0.78, strength="STRONG"
)
// Clear. Unambiguous. Pre-decided.

12 Structured Playbook Templates

JINJA2

Instead of the LLM learning behavior from examples (few-shot), we inject structured playbooks that tell it exactly what to do — reducing token count and improving consistency.

Pre-computed Semantic Bridge

ML.DISTANCE

Instead of real-time semantic matching (expensive embedding calls), we pre-compute high-confidence ingredient-to-product mappings.

Real-Time (Expensive)

For each ingredient in recipe:
  embed(ingredient)  // 50ms
  vector_search(catalog)  // 200ms
  rank_matches()  // 100ms
// 8 ingredients × 350ms = 2.8s
// Plus "banana pepper vs banana" errors

Pre-computed (Fast Lookup)

SELECT sku, product_name, distance
FROM recipe_product_bridge
WHERE ingredient = 'bananas'
AND distance < 0.20
ORDER BY rank LIMIT 3
// 8 ingredients: single BQ query ~50ms
// Pre-filtered: no "banana pepper" errors

Ranking Context Pre-Hydration

REDIS PIPELINE

The full ranking context (profile + purchase history + category affinity) is pre-built in Redis. A single pipeline call hydrates everything in ~5ms.

// RankingCache: 3 data structures per shopper
pipe = redis.pipeline(transaction=False)
pipe.get(f"eagle:rank:profile:{mpid}")   // propensities, health scores
pipe.get(f"eagle:rank:history:{mpid}")   // by_sku purchase dict
pipe.get(f"eagle:rank:affinity:{mpid}")  // dept + brand affinity
results = pipe.execute()  // ONE Redis round-trip
Before: 3 BQ queries = ~1300ms total
After:  1 Redis pipeline = ~5ms total
Speedup: 260x faster

Config-Driven LLM Dial

Every aspect of the system — which tools the LLM can call, which ranking stages run, which pipeline preset to use — is configurable via environment variables. No code changes needed to tune the LLM dependency dial.

Tool Enable/Disable

Control which tools the LLM can invoke. Fewer tools = fewer unnecessary API calls = lower cost.

Core Tools (Always On)

Optional Tools (Toggleable)

# Disable expensive tools to reduce cost:
EAGLE_AI_DISABLED_TOOLS=get_promos_and_offers,get_frequently_bought_together
# Or enable selectively per request:
context.enabled_tools = ["search_products", "get_user_profile"]

Pipeline Preset Selector

Different contexts need different amounts of processing. Cart-building doesn't need 11 stages.

Select a preset above

Per-Stage Environment Toggles

Each ranking stage can be independently enabled or disabled via environment variables. This gives operations teams fine-grained control without code deployments.

Not Everything Needsan LLM Call.