Show HN: Semantic Splitting with WordLlama

(github.com)

3 points | by deepsquirrelnet 9 months ago ago

4 comments

deepsquirrelnet 9 months ago ago
Over the last few weeks, I've been working hard at adding semantic splitting/chunking into WordLlama, and I'm excited to share what I came up with. Because of the fast and lightweight nature of WordLlama, I felt it was a great application for the platform, and aligns with our goal of creating a useful utility for LLM-related interfacing tasks.
In this blog post, I demonstrate the methodology that I arrived at. At the end, I show semantic splitting on a 1 million character text of "The Lord of the Rings". The new (python) method `wl.split(text)` executes on a single cpu core of my T480 Thinkpad in 700ms.
You might use a feature like this when chunking for building knowledge bases (RAG) -- or extracting and filtering text to send to an LLM (combining wl.spit(...) and wl.filter(...)) in online applications.
I hope you enjoy the technical deep dive, and find this feature useful.
[-]