Show HN: Compression API for LLM prompts (40-60% token savings, ~5ms overhead)

(agentready.cloud)

2 points | by christalingx 4 hours ago ago

2 comments

christalingx 4 hours ago ago
Hey HN,
I built AgentReady — a compression API that sits between your code and your LLM. It deterministically strips filler words, redundant connectors, duplicate lines, and boilerplate from prompts before you send them. Same meaning, fewer tokens.
How it works (two-step pattern):
Key design decisions:
Your LLM key never leaves your machine. AgentReady only sees the text to compress. You call OpenAI/Anthropic/etc. directly. Not a summarizer. It removes linguistic noise (filler, verbose phrasing, whitespace) while preserving all semantic content, code blocks, URLs, and numbers. ~5ms overhead. Deterministic text transforms, no ML inference in the compression path. 0.4% avg accuracy delta tested across GPT-4, Claude, and Gemini (BLEU/ROUGE < 2% delta). Three compression levels:
Level Savings What it does light 20-30% Whitespace + boilerplate cleanup standard 40-50% + filler removal, dedup, connectors aggressive 50-60% + stop words, short-line pruning What else is included:
SDKs for Python (pip install agentready-sdk) and Node.js (npm install agentready-sdk) MCP server for Claude Desktop / Cursor Monkey-patch mode: agentready.patch_openai() — zero code changes to existing apps Chrome Extension to convert any webpage to clean Markdown (strips 90%+ of HTML noise) Works with any LLM provider and frameworks (LangChain, LlamaIndex, CrewAI, Vercel AI SDK) Free during open beta — no limits, no credit card. I want real-world feedback before setting pricing.
Live demo + interactive playground: https://agentready.cloud/hn
Happy to answer questions about the compression approach, quality benchmarks, or architecture.
francesco933 4 hours ago ago
Awesome idea!