Chunkr – Vision model based PDF chunking

(github.com)

43 points | by redbell 15 hours ago ago

4 comments

  • AlbertoGP 9 hours ago ago

    > [...] self-hostable solution that leverages state-of-the-art (SOTA) vision models for segment extraction and OCR, unifying the output through a Rust Actix server. This setup allows you to process PDFs and extract segments at an impressive speed of approximately 5 pages per second on a single NVIDIA L4 instance, offering a cost-effective and scalable solution for high-accuracy bounding box segment extraction and OCR. This solution has models that accommodate for both GPU and CPU environments.

  • kybernetikos 6 hours ago ago

    It'd be great to see some examples on the web site.

  • saaaaaam 5 hours ago ago

    Although the docs say “get started by creating an account on chunkr.ai” there doesn’t seem to be any way to create an account.