SolidStart - Hacker News

KetoManx64 6 hours ago ago

What's the performance like compared to tesseract? I don't see tesseract mentioned anywhere in the readme, which is surprising considering that's the number one tool most go to for Image > text OCR.

[-]

mrkn1 6 hours ago ago

No rigorous eval, and I love Tesseract. Here's the example that motivated me to build textsnap (which is in the github's README), parsed with Tesseract:

https://imgur.com/a/i2eQra8

[-]

KetoManx64 2 hours ago ago

Very noticable difference and the exact issue I run repeatedly with tesseract! Definitely going to try dropping textsnap into my scripts now. Thanks!!

abstract257 14 hours ago ago

Curious how it does on multi-page scanned PDFs vs. single screenshots? The ORT vision/decoder split is the part that usually makes or breaks CPU VLM OCR...

[-]

krunck 14 hours ago ago

I had to extract the image from a PDF for it to work. Then run it on each page image extracted.

vivzkestrel 12 hours ago ago

- how well do you think this ll work with code? i mean take code screenshots and convert it into actual code for vscode

[-]

mrkn1 12 hours ago ago

Just ran

  textsnap "https://i.ytimg.com/vi/LBNDfxjEYlA/maxresdefault.jpg"

and got this

  $('.count').each(function () {
  $('this').prop('Counter', 0).animate({
    Counter: $('this').text()
  }, {
      duration: 4000,
      easing: 'swing',
      step: 'function (now) {
          $('this").text(Math.ceil(now));
      }
    }); 
  });

monosma 12 hours ago ago

What was the reason for adopting PaddleOCR? Can other OCR models be used as well?

[-]

mrkn1 12 hours ago ago

No reason other than their Q4 model working reasonably well and fast on my CPU laptop. Should work with any ONNX VLM model

kouru225 12 hours ago ago

Roman alphabet only or does this work with other alphabets?

[-]

mrkn1 12 hours ago ago

109 languages, including other alphabets.

garrett2558 16 hours ago ago

Very cool, I'm building my own local-first product as well