Improving Accessibility Using Vision Models

(myswamp.substack.com)

41 points | by bearjaws 13 hours ago ago

11 comments

  • jmull 9 hours ago ago

    I don't understand "The Results" graph.

    The x-axis has integers, 0, 1, 2, 3, 4, 5, 6, but the text talks about models struggling at the 30 character mark? On the graph they all start getting bad around 3, depending on what you mean by bad. Is the x-axis tens of characters??

    Anyway...

    > anything longer than 20 characters would tend to have more issues, we flagged those for manual review.

    Even though the failure rate was smaller, is it OK if several of the shorter equations are wrong? Maybe they should have manually reviewed all of them.

    Edit: Now I see someone else brought up the x-axis issue. There's a response that seems to say the x-axis is buckets of 10 characters. I guess the update hasn't gone through yet.

  • gostsamo 12 hours ago ago

    Funnily enough, the images in the article do not have actually useful alt text and like every image in Substack I've encountered so far have no useful captions either.

    • bearjaws 12 hours ago ago

      How is the alt-text not useful? I even went through the effort of putting the data in the alt text for the bar chart. I tend to think of alt text as proving the same context as the image, for example the line chart is meant to convey how 1.5-flash outperforms 4o, but I am not going to embed each discrete data point in the alt text.

      • SalmonSnarker 11 hours ago ago

        3 out of 5 images on the post have empty alt text (alt=""). most substacks are pretty careless about alt text and so previous poster is just noting that your accessibility post follows this trend. (It's worth noting the post you made previous to this has 0 out of 4 images with alt text.)

        • bryanrasmussen 21 minutes ago ago

          looking through it the images that are definitely content controlled by the user has alt text - that is to say the graphs, the first alt text = "" is inside a bit of content that is display:none and thus not available to a screen reader - I suppose the others, so it is not knowable if that alt text will be filled when the area is rendered (probably not) I didn't look for the other one but I expect it is the same situation because all the images I encountered that were in the writer's control had alt text.

          About the empty ones I have not investigated but there are numerous situations in which an empty alt text makes perfect sense and is a better accessibility solution for most users of screen readers than otherwise. For example if they are inside something clickable that has an aria label on it telling you how to use that part of the dom, the alt text on a child image just makes things overly verbose and annoying in most circumstances.

          I have an article in the works that touches on these issues with proposed solutions but unfortunately it would be too big to talk all about here.

          on edit: of course it is possible that, being alerted to the fact, the writer has added the alt text in.

      • gostsamo 12 hours ago ago

        Checking the later pictures that you talk about, the alt text is found indeed. My recommendation though would be to give a summary of the data and not the conclusion. E.g. Gemini flash has error rate of x% while the others are y% and z%.

      • gostsamo 12 hours ago ago

        Maybe something is lost in the translation, but here it is what my screen reader makes out of the article:

        Along the way we realized some of our math courses had not been updated in quite some time, and some schools were still leveraging these courses to teach. Images for equations are bad m’kay

        It was immediately apparent was the use of images to represent equations like this: https%3A%2F%2Fsubstack-post-me… https%3A%2F%2Fsubstack-post-me… This is not great… the font is a bit on the smaller side and the font itself is not very legible, in my non-font expert opinion. Making matters worse, there is no alt-text provided that can explain the equation.

  • pumanoir 11 hours ago ago

    I've had great success to convert math pics to latex using qwen2-vl

  • bearjaws 12 hours ago ago

    Funny Google just released moments ago - gemini-1.5-flash-8b which scores slightly lower on vision. For clarity this is on the "older" gemini-1.5-flash.

    https://developers.googleblog.com/en/gemini-15-flash-8b-is-n...

  • armoredkitten 12 hours ago ago

    What is the measurement on the x-axis in the graph?? The text is talking about equations of 20 or 30 characters, but the graph goes up to...6. Six what?? Characters? Terms? If it's characters, why do we only get to see the performance from 1-6, when apparently 7% of equations had more than 20?

    • bearjaws 12 hours ago ago

      That's a fair point, I bucketed them into lengths of 1-10, 11-20, 21-30. I'll do a quick update.