What's the performance like compared to tesseract?
I don't see tesseract mentioned anywhere in the readme, which is surprising considering that's the number one tool most go to for Image > text OCR.
No rigorous eval, and I love Tesseract. Here's the example that motivated me to build textsnap (which is in the github's README), parsed with Tesseract:
Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...
What's the performance like compared to tesseract? I don't see tesseract mentioned anywhere in the readme, which is surprising considering that's the number one tool most go to for Image > text OCR.
No rigorous eval, and I love Tesseract. Here's the example that motivated me to build textsnap (which is in the github's README), parsed with Tesseract:
https://imgur.com/a/i2eQra8
Very noticable difference and the exact issue I run repeatedly with tesseract! Definitely going to try dropping textsnap into my scripts now. Thanks!!
Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...
I had to extract the image from a PDF for it to work. Then run it on each page image extracted.
- how well do you think this ll work with code? i mean take code screenshots and convert it into actual code for vscode
Just ran
and got thisWhat was the reason for adopting PaddleOCR? Can other OCR models be used as well?
No reason other than their Q4 model working reasonably well and fast on my CPU laptop. Should work with any ONNX VLM model
Roman alphabet only or does this work with other alphabets?
109 languages, including other alphabets.
Very cool, I'm building my own local-first product as well
thank you! what is it about?
Now this is legit cool, keep up the great work.
thank you!