1 comments

  • badmonster 13 hours ago ago

    a subtle but powerful insight: large multimodal models like CLIP don’t just learn individual concepts. they also depend heavily on how often those concepts appear together during training.