3 comments

  • mergisi 2 days ago ago

    The "zero-config" angle is strong here — the biggest friction with pgvector is the manual pipeline: choose an embedding model, write the chunking logic, create the index, tune the distance metric. Abstracting that into a single CLI command removes the part that stops most teams from even starting.

    Question on the embedding side: are you generating embeddings at INSERT time via triggers, or is there a batch sync step? The trigger approach gives you real-time search but adds write latency. Batch sync is friendlier for high-throughput tables but means search results can lag behind.

    Also curious how you handle schema evolution — if someone adds a new text column they want searchable, does pgsemantic pick that up automatically or require a re-config?

    One pattern I've seen work well alongside semantic search: pairing it with natural language → SQL translation (tools like ai2sql.io do this) so users can combine structured filters with vector similarity in a single query. Something like "find invoices from Q1 similar to 'billing dispute'" where the date filter is SQL and the similarity part is pgvector. That hybrid query pattern is where most real-world use cases end up.

  • dailoxxxx 2 days ago ago

    [dead]