The key idea is to bring text that describes an image close together with the image itself in the embedding space (often referred to as latent space). We take the text embedding and the corresponding image embedding and ensure they are very close in this space, meaning they represent a similar concept.
The key idea is to bring text that describes an image close together with the image itself in the embedding space (often referred to as latent space). We take the text embedding and the corresponding image embedding and ensure they are very close in this space, meaning they represent a similar concept.