If AI is mission-critical for your business, build your own feedback loop

(inato.substack.com)

5 points | by anatolecallies 3 hours ago ago

2 comments

anatolecallies 3 hours ago ago
We run an LLM system that reads messy medical records and determines clinical trial eligibility.
We tried eval platforms, LLM-as-judge, and automated prompt optimizers. None helped with what actually mattered: hidden domain policies that weren’t explicitly written anywhere.
We ended up building our own annotation UI, prompt integration workflow (via Claude Code SDK), and HTML diff-based experiment reports.
The biggest lesson: off-the-shelves Eval/Annotations/Prompt Optimization tools are sub-part because they can only be generic.
Curious whether others building AI products have reached the same conclusion.
[-]