Delectable AI: Addressing Your Grocer's Core Questions

Five Questions.
Evidence-Based Answers.

Your Grocer has raised five important questions about the Delectable AI platform. This document provides concrete, code-referenced answers backed by the production system architecture.

27+ ML Models

11 Ranking Stages

8 Food Databases

70K+ Enriched SKUs

14× Cost Advantage

1

"This looks hardcoded — it's just search."

The concern is that Delectable AI is simply wrapping keyword search with a chatbot UI, with manually curated rules rather than genuine intelligence.

Delectable AI is an 11-Stage Adaptive Ranking Pipeline

Every product result passes through a composable, configurable pipeline where each stage applies a different model or algorithm. The pipeline is not static — it adapts per-user, per-session, and per-query.

Production Code agents/grocery/ranking/pipeline.py

Stage 1

Relevance Filter

Semantic matching + stemming

Stage 2

Dietary Hard Filter

Allergen safety enforcement

Stage 3

Dietary Enrichment

BQ attribute hydration

Stage 4

Freshness Rerank

Inventory age scoring

Stage 5

History Boost

Propensity-weighted repurchase

Stage 6

Session Diversity

Anti-monotony enforcement

Stage 7

Purchase Injection

Frequent-buy injection

Stage 8

Purchase Matching

SKU + brand affinity

Stage 9

Dietary Annotation

Badge + warning labels

Stage 10

Health Propensity

Nutrient-weighted scoring

Stage 11

✓ What's Dynamic (Learned)

▸

Propensity scoring — per-user weights derived from 12 months of purchase history. Score formula: S = S_base + λ_sodium·N + λ_sugar·N − λ_protein·N

▸

K-Means mission clustering — 6 shopping archetypes discovered from 202M events across 27.9K users (BQML)

▸

Bayesian pantry decay — per-household consumption rates updated with category priors: posterior = (n·household_rate + k·category_rate) / (n+k)

▸

768-dim embeddings — Vertex AI text-embedding-005 for semantic product/ingredient matching

▸

PMI flavor affinity — ingredient co-occurrence scores from 2.1M recipe corpus

▸

Household personas — K=8 COSINE-distance clustering on 9 behavioral features with confidence scoring

⚡ What's Deterministic (By Design)

▸

Unit conversions — 60+ exact conversion factors (tsp→tbsp, oz→cups). Rules are correct, not approximate — ML would introduce errors here.

▸

Allergen hard-filtering — safety-critical. A probabilistic approach is unacceptable for nut allergy enforcement.

▸

ILP cost optimization — OR-Tools SCIP solver finds provably optimal package combinations. An LLM would guess.

▸

Ingredient stemming — "roma tomato" → "tomato" with 300+ aliases. False-friend detection prevents "coconut milk" from matching "milk".

Deterministic ≠ hardcoded. These are engineered algorithms chosen because they're more reliable than ML for these specific tasks. The system deliberately uses the right tool for each job.

2

"Where's the real AI / ML?"

The concern is that the system is marketing-level AI — using the term without substantive machine learning models in production.

27+ Production ML Models Across 6 Categories

Every model listed below exists in the codebase with training SQL, inference code, and integration into the ranking pipeline.

Embeddings Neural vector representations

Multi-Aspect Product Embeddings

768-dim Vertex AI text-embedding-005 · Title + brand + category + attributes

70K+ products

pim/enrichments/embeddings.py

Recipe Corpus Embeddings

2.1M recipes from corbt/all-recipes indexed for semantic search

2.1M recipes

ml/domains/grocery/semantic_bridge/

Ingredient Embeddings

223K unique ingredients with 768-dim vectors for bridge matching

223K ingredients

ml/domains/grocery/semantic_bridge/ddl/

Purchase Pattern Embeddings

Shopper behavior vectors for personalization clustering

27.9K users

ml/domains/grocery/personalization/ddl/

Clustering Unsupervised pattern discovery

Household Persona Clustering (K=8)

BQML K-Means · COSINE distance · 9 behavioral features · Standardized

27.9K households

ml/domains/grocery/householding/ddl/

Shopping Mission Clustering (K=6)

6 archetypes: Candy Run, Dairy Prep, Produce Run, Weekly Stock-Up, Essentials, Quick Meal

202M events

infrastructure/bigquery/giant_eagle_propensity_models.sql

Statistical Bayesian inference & co-occurrence

Bayesian Pantry Decay Model

posterior = (n·household_rate + k·category_rate) / (n+k) · Department-specific shelf life priors

9,366 profiled users

ml/domains/grocery/pantry_intelligence/ddl/

PMI Flavor DNA / Co-occurrence

Normalized Pointwise Mutual Information [-1, 1] across 2.1M recipe corpus

2.1M recipes

ml/domains/grocery/food_science/ddl/

Propensity Scoring (11 Dimensions)

Per-user dietary/health scores from purchase frequency analysis with health consciousness amplifier

Per-shopper

agents/grocery/ranking/stages/health.py

Optimization Mathematical solvers

ILP Package Optimizer (OR-Tools SCIP)

Mixed-integer linear programming for optimal package selection with personalization factors

Per-request

agents/grocery/tools/shopping_optimizer.py

Search & Retrieval Vector indices & commerce search

Vertex AI Commerce Search

ML-ranked product retrieval with LLM-driven boost specs (category, health tier, NOVA)

70K+ SKUs

search/engines/vertex_adapter.py

Semantic Bridge (Bi-Encoder + Cross-Encoder)

Ingredient→product matching with Gemini Flash verification in uncertain zone (0.12–0.35)

500K+ pairs

ml/domains/grocery/semantic_bridge/

Generative AI Foundation models & fine-tuning

Gemini 2.0 Flash (Primary Brain)

Conversational agent, tool orchestration, food intelligence gap-filling

All sessions

agents/grocery/agent.py

Multi-Source Food Intelligence

3-tier enrichment: USDA/OFF/FooDB (BQ) → Ontology → Gemini gap-filling · 40+ derived fields

70K products

pim/enrichments/unified_food_enrichment.py

DPO Fine-Tuning Pipeline

Direct Preference Optimization training data generation for model alignment

Continuous

ml/llm/models/dpo-training-and-validation-generator.py

Knowledge Graph (BigQuery GRAPH_TABLE)

Multi-hop reasoning: User → Household → Purchases → Categories → Personas

27.9K users

agents/grocery/data/knowledge_graph_tools.py

Food Intelligence: 8 Scientific Databases

Product enrichment pulls from 8 independent food science databases, merged into 40+ derived attributes per SKU. This is not prompt engineering — it's a data pipeline.

🇺🇸

USDA FoodData

Branded + Foundation + SR Legacy

🌍

Open Food Facts

Global crowdsourced · 3M+ products

🧬

FooDB

Chemical/molecular composition

🌱

Agribalyse

Environmental impact / eco-scores

⚗️

FOSCOLLAB/JECFA

Food additive safety data

🌿

ePlantLIBRA

Phytochemical bioactives

🏛️

FDA/EFSA

Health claims evidence

🍳

Recipe Corpus

2.1M recipes · PMI analysis

3

"A customer could just use ChatGPT for this."

The concern is that a general-purpose LLM like ChatGPT or Gemini could replace Delectable AI's grocery assistant functionality.

Six Unbridgeable Gaps

A generic LLM has no access to Your Grocer's data ecosystem. These aren't future features — they're production capabilities today.

🏪 Real-Time Catalog

70K+ SKUs with live pricing, inventory, and promotions. ChatGPT can say "buy salmon" — Delectable AI shows you the exact salmon at $12.49 in aisle 7 with a $2 Rewards coupon.

ChatGPT's training data is months stale. Prices, stock, and promos change daily.

🧬 Loyalty Intelligence

12 months of purchase history, 11 propensity dimensions, household de-averaging, mission clustering. ChatGPT starts from zero every conversation.

Delectable AI knows Sarah prefers organic (0.82), has a child with nut allergy, and shops for quick meals.

🧊 Virtual Pantry

Bayesian decay modeling predicts what's still in the kitchen. Delectable AI won't suggest chicken if you bought it 2 days ago (stock: 0.7). ChatGPT has zero visibility into your home.

Reduces food waste, prevents unnecessary purchases, and enables "cook with what you have" scenarios.

⚠️ Dietary Safety

Hard-filters products with allergens using verified ingredient data. ChatGPT suggested "peanut butter rice cakes" for a child with a nut allergy in our testing. That's dangerous.

DietaryHardFilter is stage 2 in the pipeline — it runs before any product reaches the user.

🛒 Commerce Integration

Products go directly into the Your Grocer cart with one tap. Full pricing, estimated totals, and Rewards savings calculated in real-time. ChatGPT outputs text you have to manually re-enter.

Every interaction is a conversion opportunity. ChatGPT sends users to... Amazon? Instacart?

💰 Revenue Generation

Retail Media Network integration with closed-loop attribution. Sponsored placements are injected at configurable positions by the ranking pipeline, with interaction→transaction tracking.

ChatGPT generates zero revenue, zero attribution, zero measurable ROAS.

⚡💰 The Performance and Economics Gap

~4 sec

LLM-only latency

(5 sequential calls)

~490 ms

Delectable AI latency

(1 LLM call + cache)

$0.033

LLM-only per request

(all tokens billed)

$0.0024

Delectable AI effective cost

(60% cache hit rate)

💡

The fundamental difference: ChatGPT is a brain without a body. It can think about food but can't see Your Grocer's actual products, can't know the shopper's history, can't enforce allergen safety, and can't put anything in a cart. Delectable AI is brain + body + memory — a purpose-built system that thinks, knows, and acts within Your Grocer's ecosystem.

4

"What's the cost-margin impact at 1.25 million shoppers?"

The concern is whether the AI platform's operating costs will erode margins when deployed across Your Grocer's full loyalty base.

The Economics of Hybrid AI vs. LLM-Only

Delectable AI's architecture is specifically designed to minimize LLM token consumption. Most intelligence is pre-computed, cached, or deterministic.

Cost Model: 1.25M Loyalty Members

Active Monthly Users

~30% of 1.25M loyalty base

375,000

Avg. Interactions / User / Month

Based on grocery shopping frequency

12

Monthly Interactions

375K × 12

4.5M

LLM-Only Approach

Cost per interaction$0.033

Monthly (4.5M × $0.033)$148,500

Annual$1,782,000

Scales linearly. Every additional user = full LLM cost. No efficiency gains at scale.

Delectable AI Hybrid Architecture

Raw cost per interaction$0.006

Cache hit rate (measured)~60%

Effective cost per interaction$0.0024

Monthly (4.5M × $0.0024)$10,800

Annual$129,600

Cache efficiency increases with scale. Popular queries converge. Effective cost decreases as user count grows.

Annual savings vs. LLM-only at 1.25M loyalty members

$1.65M / year

$1,782,000 (LLM) − $129,600 (Delectable AI) = 93% cost reduction

Why 60% of Requests Cost $0

Delectable AI's 4-layer caching architecture means most interactions never reach the LLM.

Layer 1

<1ms

Response Cache — full response hit (35%)

Layer 2

<1ms

Tool Result Cache (15%)

Layer 3

5ms

Profile SWR (10%)

Layer 4

~490ms

LLM call required — only for novel, complex queries (40%)

📈 Revenue Offset: RMN Monetization

The platform cost isn't just an expense — it's an investment with measurable revenue generation through the Retail Media Network.

$0.15–$0.50

Estimated CPM for sponsored placements

Closed-Loop

Impression → interaction → cart → purchase

Measurable

ROAS tracking per brand partner

At 4.5M monthly interactions with even modest RMN fill rates, sponsored placements can offset 100% of platform operating costs — potentially making Delectable AI a profit center rather than a cost center.

5

"Why can't our team build this themselves?"

The concern is whether Delectable AI's contribution is replaceable by Your Grocer's internal engineering and data science teams.

What Delectable AI Built — And Why It Took Domain Expertise

This isn't a chatbot wrapper. It's an end-to-end food intelligence platform that required deep expertise across grocery retail, food science, ML engineering, and commerce search — simultaneously.

🧪 Food Science Engineering

We built a food intelligence layer that doesn't exist anywhere else — not in any vendor product, not in any open-source library.

▸

8 food databases unified into a single enrichment pipeline (USDA, Open Food Facts, FooDB, Agribalyse, FOSCOLLAB, ePlantLIBRA, FDA/EFSA, 2.1M recipe corpus)

▸

40+ derived attributes per product — health tier, sugar/sodium tiers, Nutri-Score, NOVA group, Food Compass, dietary claims, scientific profile, product form

▸

GTIN-14 matching — 75% of Your Grocer's 70K SKUs matched to scientific databases via standardized product identifiers

▸

PMI flavor affinity — ingredient co-occurrence scoring across 2.1M recipes enabling "what goes well with X" intelligence

This required food science domain knowledge to select the right databases, build the matching pipeline, and validate enrichment accuracy. A general engineering team would need to discover, evaluate, and integrate these sources from scratch.

🤖 ML Pipeline Architecture

The 11-stage ranking pipeline, embedding infrastructure, and personalization models required ML engineering expertise that goes far beyond "plug in an LLM."

▸

11-stage composable ranking pipeline with per-stage observability, A/B testing hooks, and 5 configurable presets

▸

Bayesian pantry decay model — not available in any commercial product; required statistical modeling of household consumption patterns

▸

Bi-encoder + cross-encoder semantic bridge — ingredient→product matching at scale with confidence-zone verification

▸

ILP solver integration — OR-Tools SCIP for provably optimal package selection with personalization factors

Each of these components took weeks of design, implementation, and tuning. The integration between them — where one stage feeds the next — is the hardest part to replicate.

🏪 Grocery Retail Domain

Grocery is uniquely complex. Generic AI platforms don't understand perishability, allergen safety, household dynamics, or shopping missions.

▸

Household de-averaging — separating individual preferences within a shared loyalty card (parent vs. child, athlete vs. cook)

▸

Shopping mission classification — recognizing the difference between a "weekly stock-up" and a "quick meal run" from trip signals

▸

Allergen safety as a first-class constraint — not a suggestion, but a hard filter enforced at the pipeline level with 26+ known allergens

▸

Unit consolidation with retail package mapping — "2.5 cups of flour" → "1× Gold Medal All-Purpose Flour 5lb" with pantry deduction

These problems are specific to grocery. A general-purpose AI team would need to discover them through trial and error. Delectable has already navigated them.

⚡ Cost-Optimized Architecture

The naive approach (send everything to an LLM) costs 14× more. Delectable AI's architecture was specifically designed to minimize token consumption while maximizing intelligence.

▸

4-layer caching architecture — response cache, tool result cache, profile SWR, Redis pipeline. 60% of requests never touch the LLM.

▸

Deterministic where appropriate — unit conversion, allergen filtering, and ranking use algorithms that are faster and more reliable than LLM inference

▸

Pre-computation pipeline — profile hydration, semantic bridge, embedding indices are computed offline so real-time queries are fast

▸

Jinja2 playbook templates — 12 structured prompt templates reduce token waste vs. free-form prompting

This architecture took months to design and tune. An internal team starting from scratch would first build the naive (expensive) version, then spend months optimizing it — re-discovering the same caching and pre-computation patterns.

Build vs. Partner: The Timeline Gap

Assuming Your Grocer assembles a dedicated team of ML engineers, data scientists, and food domain experts:

Food Intelligence

6–9 months (DB eval, GTIN matching, enrichment pipeline)

ML Pipeline

8–12 months (ranking, embeddings, clustering, pantry model)

LLM Integration

4–6 months (prompting, caching, cost optimization)

Commerce Integration

5–8 months (cart, loyalty, Vertex, RMN)

Iteration & Tuning

3–6 months (A/B testing, pipeline tuning, DPO)

Total Estimate

18–24+ months to reach current Delectable AI capability

Delectable AI's Core Value Proposition

Speed to Market

The platform is built, tested, and deployed. Your Grocer gets 18–24 months of engineering condensed into an immediate integration — with continuous improvement from a team that has already navigated the hard problems.

Domain Specialization

Food science, grocery retail, and AI/ML expertise combined in a single team. This cross-domain knowledge is what produced the food intelligence pipeline, pantry modeling, and allergen safety system that no generic AI vendor offers.

Continuous Innovation

DPO fine-tuning, session evaluation, household GNN modeling, and RMN attribution are all active development areas. Delectable is investing in the roadmap — Your Grocer benefits from every advancement across the platform.

Five Questions. Evidence-Based Answers.