It's a combination of using an LLM and some pre and post processing. Data extraction itself has been fairly accurate in my experience. The bigger challenge has been biomarker name normalization because different labs often name the same biomarkers quite differently.
This is nice!
How do you extract the data from pdf or images? How do you reduce inaccuracies in this process?
It's a combination of using an LLM and some pre and post processing. Data extraction itself has been fairly accurate in my experience. The bigger challenge has been biomarker name normalization because different labs often name the same biomarkers quite differently.
Thanks, sounds interesting!