Show HN: Velvet – Store OpenAI requests in your own DB

(usevelvet.com)

109 points | by elawler24 5 days ago ago

54 comments

  • DeveloperErrata 5 days ago ago

    Seems neat - I'm not sure if you do anything like this but one thing that would be useful with RAG apps (esp at big scales) is vector based search over cache contents. What I mean is that, users can phrase the same question (which has the same answer) in tons of different ways. If I could pass a raw user query into your cache and get back the end result for a previously computed query (even if the current phrasing is a bit different than the current phrasing) then not only would I avoid having to submit a new OpenAI call, but I could also avoid having to run my entire RAG pipeline. So kind of like a "meta-RAG" system that avoids having to run the actual RAG system for queries that are sufficiently similar to a cached query, or like a "approximate" cache.

    • davidbarker 5 days ago ago

      I was impressed by Upstash's approach to something similar with their "Semantic Cache".

      https://github.com/upstash/semantic-cache

        "Semantic Cache is a tool for caching natural text based on semantic similarity. It's ideal for any task that involves querying or retrieving information based on meaning, such as natural language classification or caching AI responses. Two pieces of text can be similar but not identical (e.g., "great places to check out in Spain" vs. "best places to visit in Spain"). Traditional caching doesn't recognize this semantic similarity and misses opportunities for reuse."
      • OutOfHere 5 days ago ago

        I strongly advise not relying on embedding distance alone for it because it'll match these two:

        1. great places to check out in Spain

        2. great places to check out in northern Spain

        Logically the two are not the same, and they could in fact be very different despite their semantic similarity. Your users will be frustrated and will hate you for it. If an LLM validates the two as being the same, then it's fine, but not otherwise.

        • DeveloperErrata 5 days ago ago

          I agree, a naive approach to approximate caching would probably not work for most use cases.

          I'm speculating here, but I wonder if you could use a two stage pipeline for cache retrieval (kinda like the distance search + reranker model technique used by lots of RAG pipelines). Maybe it would be possible to fine-tune a custom reranker model to only output True if 2 queries are semantically equivalent rather than just similar. So the hypothetical model would output True for "how to change the oil" vs. "how to replace the oil" but would output False in your Spain example. In this case you'd do distance based retrieval first using the normal vector DB techniques, and then use your custom reranker to validate that the potential cache hits are actual hits

          • OutOfHere 5 days ago ago

            Any LLM can output it, but yes, a tuned LLM can benefit with a shorter prompt.

        • jankovicsandras 4 days ago ago

          A hybrid search approach might help, like combining vector similarity scores with e.g. BM25 scores.

          Shameless plug (FOSS): https://github.com/jankovicsandras/plpgsql_bm25 Okapi BM25 search implemented in PL/pgSQL for Postgres.

    • OutOfHere 5 days ago ago

      That would totally destroy the user experience. Users change their query so they can get a refined result, not so they get the same tired result.

      • pedrosorio 5 days ago ago

        Even across users it’s a terrible idea.

        Even in the simplest of applications where all you’re doing is passing “last user query” + “retrieved articles” into openAI (and nothing else that is different between users, like previous queries or user data that may be necessary to answer), this will be a bad experience in many cases.

        Queries A and B may have similar embeddings (similar topic) and it may be correct to retrieve the same articles for context (which you could cache), but they can still be different questions with different correct answers.

      • elawler24 5 days ago ago

        Depends on the scenario. In a threaded query, or multiple queries from the same user - you’d want different outputs. If 20 different users are looking for the same result - a cache would return the right answer immediately for no marginal cost.

        • OutOfHere 5 days ago ago

          That's not the use case of the parent comment:

          > for queries that are sufficiently similar

    • elawler24 5 days ago ago

      Thanks for the detail! This is a use case we plan to support, and it will be configurable (for when you don’t want it). Some of our customers run into this when different users ask a similar query - “NY-based consumer founders” vs “consumer founders in NY”.

  • OutOfHere 5 days ago ago

    A cache is better when it's local rather than on the web. And I certainly don't need to pay anyone to cache local request responses.

    • knowaveragejoe 5 days ago ago

      How would one achieve something similarly locally, short of just running a proxy and stuffing the request/response pairs into a DB? I'm sure it wouldn't be too terribly hard to write something, but I figure something open source already exists for OpenAI-compatible APIs.

      • w-ll 5 days ago ago

        Recently did this workflow.

        Started with nginx proxy with rules to cache base on url/params. Wanted more control over it and explored lua/redis apis, and opted to build a app to do be a little more smart for what i wanted. Extra ec2 cost is negligible compared to cache savings.

        • doubleorseven 5 days ago ago

          Yes! It's amazing how many things you can do with lua in nginx. I had a server that served static websites where the files and the certificates for each website were stored in a bucket. Over 20k websites with 220ms overhead if the certificate wasn't cached.

      • OutOfHere 5 days ago ago

        There are any number of databases and language-specific caching libraries. A custom solution or the use of a proxy isn't necessary.

    • nemothekid 5 days ago ago

      As I understand it, your data remains local, as it leverages your own database.

      • manojlds 5 days ago ago

        Why do I even ha e to use this saas? This should be a open source lib or just a practice that I implement myself.

        • dsmurrell 5 days ago ago

          Implement it yourself then and save your $$ at the expense of your time.

          • torlok 4 days ago ago

            If you factor in dealing with somebody's black box code 6 months into a project, you'll realise you're saving both money and time.

            • OutOfHere 4 days ago ago

              It's not complicated as you make it. There are numerous caching libraries, and databases have been a thing for decades.

          • manojlds 4 days ago ago

            Like this is not a big thing to implement, that's my point. There are already libraries like OpenLLMetry and sink to a DB. We are doing something like this already.

            • nemothekid 4 days ago ago

              Yes, the ol' Dropbox "you can already build such a system yourself quite trivially by getting an FTP account" comment. Even after 17 years, people still feel the need to make this point.

        • heavensteeth 5 days ago ago

          So they can charge you for it.

  • phillipcarter 5 days ago ago

    Congrats on the launch! I love the devex here and things you're focusing on.

    Have you had thoughts on how to you might integrate data from an upstream RAG pipeline, say as a part of a distributed trace, to aid in debugging the core "am I talking to the LLM the right way" use case?

    • elawler24 5 days ago ago

      Thanks! You can layer on as much detail as you need by including meta tags in the header, which is useful for tracing RAG and agent pipelines. But would love to understand your particular RAG setup and whether that gives you enough granularity. Feel free to email me too - emma@usevelvet.com

  • angoragoats 4 days ago ago

    I don't understand the problem that's being solved here. At the scale you're talking about (e.g. millions of requests per day with FindAI), why would I want to house immutable log data inside a relational database, presumably alongside actual relational data that's critical to my app? It's only going to bog down the app for my users.

    There are plenty of other solutions (examples include Presto, Athena, Redshift, or straight up jq over raw log files on disk) which are better suited for this use case. Storing log data in a relational DB is pretty much always an anti-pattern, in my experience.

    • philip1209 4 days ago ago

      Philip here from Find AI. We store our Velvet logs in a dedicated DB. It's postgres now, but we will probably move it to Clickhouse at some point. Our main app DB is in postgres, so everybody just knows how it works and all of our existing BI tools support it.

      Here's a video about what we do with the data: https://www.youtube.com/watch?v=KaFkRi5ESi8

    • elawler24 4 days ago ago

      It's a standalone DB, just for LLM logging. Since it's your DB - you can configure data retention, and migrate data to an analytics DB / warehouse if cost or latency becomes a concern. And, we're happy to support whatever DB you require (ClickHouse, Big Query, Snowflake, etc) in a managed deployment.

      • angoragoats 4 days ago ago

        I guess I should have elaborated to say that even if you're spinning up a new database expressly for this purpose (which I didn't see specifically called out in your docs anywhere as a best practice), you're starting off on the wrong foot. Maybe I'm old-school, but relational databases should be for relational data. This data isn't relational, it's write-once log data, and it belongs in files on disk, or in purpose-built analytics tools, if it gets too large to manage.

        • elawler24 4 days ago ago

          Got it. We can store logs to your purpose-built analytics DB of choice.

          PostgreSQL (Neon) is our free self-serve offering because it’s easy to spin up quickly.

  • simple10 5 days ago ago

    Looks cool. Just out of curiosity, how does this compare to other OpenLLMetry-type observation tools like Arize, Traceloop, LangSmith, LlamaTrace, etc.?

    From personal experience, they're all pretty simple to install and use. Then mileage varies in analyzing and taking action on the logs. Does Velvet offer something the others do not?

    For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

    RAG support would be great to add to Velvet. Specifically pgvector and pinecone traces. But maybe Velvet already supports it and I missed it in the quick read of the docs.

    • elawler24 5 days ago ago

      Velvet takes <5 mins to get set up in any language, which is why we started as a proxy. We offer managed / custom deployments for enterprise customers, so we can support your client requirements.

      We warehouse logs directly to your DB, so you can do whatever you want with the data. Build company ops on top of the DB, run your own evals, join with other tables, hash data, etc.

      We’re focusing on backend eng workflows so it’s simple to run continuous monitoring, evals, and fine-tuning with any model. Our interface will focus on surfacing data and analytics to PMs and researchers.

      For pgvector/pinecone RAG traces - you can start by including meta tags in the header. Those values will be queryable in the JSON object.

      Curious to learn more though - feel free to email me at emma@usevelvet.com.

    • marcklingen 5 days ago ago

      disclosure: founder/maintainer of Langfuse (OSS LLM application observability)

      I believe proxy-based implementations like Velvet are excellent for getting started and solve for the immediate debugging use case; simply changing the base path of the OpenAI SDK makes things really simple (the other solutions mentioned typically require a few more minutes to set up).

      At Langfuse (similarly to the other solutions mentioned above), we prioritize asynchronous and batched logging, which is often preferred for its scalability and zero impact on uptime and latency. We have developed numerous integrations (for openai specifically an SDK wrapper), and you can also use our SDKs and Decorators to integrate with any LLM.

      > For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

      I can echo this. We observe many self-hosted deployments in larger enterprises and HIPAA-related companies, thus we made it very simple to self-host Langfuse. Especially when PII is involved, self-hosting makes adopting an LLM observability tool much easier in larger teams.

  • ramon156 5 days ago ago

    > we were frustrated by the lack of LLM infrastructure

    May I ask what you specifically were frustrated about? Seems like there are more than enough solutions

    • elawler24 5 days ago ago

      There were plenty of UI-based low code platforms. But they required that we adopt new abstractions, use their UI, and log into 5 different tools (logging, observability, analytics, evals, fine-tuning) just to run basic software infra. We didn’t feel these would be long-term solutions, and just wanted the data in our own DB.

  • reichertjalex 4 days ago ago

    Very nice! I really like the design of the whole product, very clean and simple. Out of curiosity, do you have a designer, or did you take inspiration from any other products (for the landing page, dashboard, etc) when you were building this? I'm always curious how founders approach design these days.

    • elawler24 4 days ago ago

      I’m a product designer, so we tend to approach everything from first principles. Our aim is to keep as much complexity in code as possible, and only surface UI when it solves a problem for our users. We like using tools like Vercel and Supabase - so a lot of UI inspiration comes from the way they surface data views. The AI phase of the internet will likely be less UI focused, which allows for more integrated and simple design systems.

  • TripleChecker 4 days ago ago

    Does it support MySQL for queries/storage - or only PostgreSQL?

    Also, caught a few typos on the site: https://triplechecker.com/s/o2d2iR/usevelvet.com?v=qv9Qk

    • elawler24 4 days ago ago

      We can support any database you need, PostgreSQL is the easiest way to get started.

  • turnsout 5 days ago ago

    Nice! Sort of like Langsmith without the Langchain, which will be an attractive value proposition to many developers.

    • efriis 5 days ago ago

      Howdy Erick from LangChain here! Just a quick clarification that LangSmith is designed to work great for folks not using LangChain as well :)

      Check out our quickstart for an example of what that looks like! https://docs.smith.langchain.com/

      • turnsout 5 days ago ago

        TIL! LangSmith is great.

  • ji_zai 5 days ago ago

    Neat! I'd love to play with this, but site doesn't open (403: Forbidden).

    • elawler24 5 days ago ago

      Might be a Cloudflare flag. Can you email me your IP address and we'll look into it? emma@usevelvet.com.

  • codegladiator 5 days ago ago

    Error: Forbidden

    403: Forbidden ID: bom1::k5dng-1727242244208-0aa02a53f334

  • hiatus 5 days ago ago

    This seems to require sharing our data we provide to OpenAI with yet another party. I don't see any zero-retention offering.

    • elawler24 5 days ago ago

      The self-serve version is hosted (it’s easy to try locally), but we offer managed deployments where you bring your own DB. In this case your data is 100% yours, in your PostgreSQL. That’s how Find AI uses Velvet.

      • knowaveragejoe 5 days ago ago

        Where is this mentioned? Is there a github(etc) somewhere that someone can use this without using the hosted version?

        • elawler24 5 days ago ago

          Right now, it’s a managed service that we set up for you (we’re still a small team). Email me if you’re interested and I can share details - emma@usevelvet.com.

  • bachback 5 days ago ago

    interesting, seems more of an enterprise offering. its OpenAI only for and you plan to expand to other vendors? anything opensource?

    • elawler24 5 days ago ago

      We already support OpenAI and Anthropic endpoints, and can add models/endpoints quickly based on your requirements. We plan to expand to Llama and other self-hosted models soon. Do you have a specific model you want supported?

    • beepbooptheory 5 days ago ago

      I guess I don't understand what this is now. If its just proxying requests and storing in db, can't it be literally any API?

      • elawler24 5 days ago ago

        We could support any API. We’re focused on building data pipelines and tooling for LLM use cases.