Specialization
AI PM - where product thinking meets the model
I build LLM products the way a PM ships any other product: with a crisp problem, an eval rubric, a cost envelope, and a way to roll back. This page collects the playbooks, artefacts, and shipped work behind that stance - most of it learned building Aarchid with Dilpreet Grover.
How I work on AI products
Scoping an LLM feature
How to write a PRD when the model is the product. Success criteria, eval harness, guardrails, and cost envelope - before a single prompt is written.
Eval-driven development
Treat your golden set like a test suite. Offline evals → shadow traffic → A/B. How we validated 92% diagnosis accuracy on Aarchid.
Cost modelling at the edge
Per-request math for multi-model pipelines (vision + retrieval + research). Caching, batching, and the $0.25/user/mo envelope.
Citations or it didn't happen
Why user trust collapses without grounded sources, and the architectural pattern for research-augmented LLM responses.
An eval harness, in your browser
Six plant-diagnosis cases. Two model versions. One confidence gate. Toggle the controls and watch the same golden set re-score in real time - this is how I validate an LLM feature before it ships.
- PASSMonstera deliciosa, yellowing lower leaves, soil wet 4 days post-waterExpectedOverwatering / root rot riskPredictedOverwatering / root rot riskcitedConfidence91%Latency2.94s
- PASSFiddle-leaf fig, brown spots with yellow halo, recent move near AC ventExpectedCold draft + fungal stressPredictedCold draft + fungal stresscitedConfidence86%Latency2.71s
- PASSSnake plant, mushy base, leaves falling at touchExpectedAdvanced root rotPredictedAdvanced root rotcitedConfidence96%Latency2.53s
- PASSPothos, pale variegated leaves, low-light corner for 6 weeksExpectedInsufficient lightPredictedInsufficient lightcitedConfidence92%Latency2.64s
- PASSCalathea orbifolia, crispy edges, indoor humidity 28%ExpectedLow humidity stressPredictedLow humidity stresscitedConfidence89%Latency2.82s
- PASSZZ plant, drooping stems, watered weekly past monthExpectedOverwateringPredictedOverwateringcitedConfidence88%Latency2.89s
Toggle between the v1 baseline and the grounded v2 stack, or raise the confidence gate, to see how the same golden set re-scores. This is the same shape of harness we used on Aarchid to validate the 92% diagnosis accuracy claim before any user saw the model in production.
Cost modelling, in real time
Same harness mindset, applied to economics. Move the sliders to see how batch size, cache hit rate, and request volume reshape the per-user-per-month bill - and whether you stay inside the $0.25 envelope.
Per-request breakdown
- Vision (Gemini 1.5 Pro)$0.00350
- Retrieval (Exa AI API)$0.00500
- Embed (cache lookup, on hit)$0.00010
- Edge worker$0.0000005
The Aarchid envelope is $0.25 / active user / month. Vision is the dominant cost - batching it across multiple images (gallery upload, time-lapse) and caching repeat diagnoses by perceptual hash are the two levers that keep us under budget at scale.
Aarchid - shipped proof
AI Botanical Intelligence · 92% diagnosis accuracy
Co-created with Dilpreet Grover. Multimodal vision (Gemini 1.5 Pro) grounded by research-augmented reasoning (Exa AI API), running on Cloudflare Workers. Sub-10s P95, $0.25 per active user per month at scale.
On the bench
- AI PM interview prep kit - deconstructed case questions, eval-harness design, and model economics cheatsheets.
- Second Aarchid-scale build - applying the same Edge Stack pattern to a different problem domain.
- Essay series: “The PRD is dead, long live the eval set” - in progress.
Looking for an AI PM who can spec, eval, and ship? Get in touch.