Ask HN: Best Embedding Models?

14 points | by devstein 12 hours ago ago

10 comments

  • emschwartz 3 hours ago ago

    I’ve been using MixedBread, which is a pretty old model at this point. Recently, I tried comparing it to some newer models and was disappointed that the results weren’t dramatically and uniformly better.

    You probably can’t go wrong if you pick a recent one that scores decently well on benchmarks and is at the right price point (or memory requirement) for whatever you’re trying to do.

  • rapatel0 11 hours ago ago

    I've liked qwen and embeddinggemma for local search. Qwen because 32K is enough to basically fit a whole page into the context window and embeddiggemma because it's crazy efficient.

  • LogicCraft678 4 hours ago ago

    Feels like embeddings are underrated compared to LLM's hype, but they doing great.

    • Alifatisk 19 minutes ago ago

      Why do you feel like embeddings are underrated? What is it with embeddings that deserves more attention?

  • PhilippGille 9 hours ago ago

    Benchmarks only paint part of the picture, but it's still a decent place to start looking into recent models:

    https://huggingface.co/spaces/mteb/leaderboard

  • didgeoridoo 5 hours ago ago

    I’m partial to jina.ai — they have open models for code and prose, all easily runnable locally.

  • Yogeshshirsath 2 hours ago ago

    E5 (Microsoft)

  • jayshah5696 10 hours ago ago

    embeddings are easy to fine tune. Try modern bert.

  • frederickabrah 4 hours ago ago

    who knows a tool for rug check in crypto

  • halvorbuilds 4 hours ago ago

    gemma4