SolidStart - Hacker News

embedding-shape 21 hours ago ago

Video understanding is kind of new, especially if done well, and hopefully working well with UI and UX, that'd be great. Current agents already struggle a bit with 2D space with normal screenshots of unconventional UIs, wonder if this model would do better with actual recordings of navigating and using applications, feels like it could help a bunch with understanding UX at least hopefully. Will be fun to play around with :)

wxw 18 hours ago ago

What’s SOTA for video understanding? AFAIK most video search is powered by transcription and not the actual video. This seems impressive.

bguberfain a day ago ago

Any plans to port to sglang or vLLM?

[-]

cleardusk 2 hours ago ago

vllm-omni support is on the way : )

nkvdev a day ago ago

Great quality, forked and going to try

Tsarp a day ago ago

Nice work. Wish they had picked another name given how popular lance/lancedb is.

popalchemist a day ago ago

Seems like the video output is crippled. Resolution is low (720 or so), as is the frame rate. The samples are shown up-scaled and frame-interpolated.

Why do that? Seems strange to be building sub-hd resolution video models in 2026.

[-]

jadbox a day ago ago

Sure, but again, it's a micro 3B model. Perhaps it can't be used for general video work, but it might be able to do basic edits like remove an object from a table in a shot.

[-]

MattRix a day ago ago

It’s not a micro model at all, it requires 40gb of VRAM. The 3B is just the active parameters.

asadm a day ago ago

last dance for lance vance!

[-]

cleardusk a day ago ago

:D

Show HN: Lance – image/video generation and understanding in one model