Show HN: Pluckr – LLM-powered HTML scraper that caches selectors and auto-heals

(github.com)

1 points | by pankaj3112 2 hours ago ago

1 comments

pankaj3112 2 hours ago ago
Calling an LLM on every scrape felt wasteful, so I made it run once, cache the CSS selectors, and only re-invoke when a selector stops matching. The core flow, LLM runs in an agentic tool loop where it inspects the HTML, generates selectors, and tests them before committing. Once validated, selectors are stored with the cache key you provide. Every subsequent extraction is pure CSS, zero LLM calls. Self-healing kicks in when a selector returns null or fails schema validation. It passes the broken selector back to the LLM with context so it can fix rather than starting from scratch. Works with any Vercel AI SDK model and any HTML source. Happy to answer questions.