Am I able to upload a book and have it respond truthfully to the book in a way that's superior to NotebookLM or similar? Generally most long context solutions are very poor. Or does the data have to be in a specific format?
To get the outcome you want, RAG (retrieval augmented generation) would be the way to go, not fine-tuning. Fine-tuning doesn't make the model memorize specific content like a book. It teaches new behaviors or styles. RAG allows the model to access and reference the book during inference. Our platform focuses on fine-tuning with structured datasets, so data needs to be in a specific format.
The magic behind NotebookLM can't be replicated only with fine-tuning. It's all about the workflow, from the chunking strategy, to retrieval etc.
For a defined specific use-case it's certainly possible to beat their performance, but things get harder when you try to create a general solution.
To answer your question, the format of the data depends entirely on the use-case and how many examples you have. The more examples you have, the more flexible you can be.
If you currently have an SDK in any of the 5 major languages, or if your API is well documented in a structured way, it should be very easy to write ab SDK in Python, Go, anything LLMs know well.
If that's the case then I'll try the platform out :) I want to finetune Codestral or Qwen2.5-coder on a custom codebase. Thank you for the response! Are there some docs or infos about the compatibility of the downloaded models, meaning will they work right away with llama.cpp?
We don't support Codestral or Qwen2.5-coder right out of the box for now, but depending on your use-case we certainly could add it.
We utilize LoRA for smaller models, and qLoRA (quantized) for 70b+ models to improve training speeds, so when downloading model weights, what you get is the weights & adapter_config.json. Should work with llama.cpp!
Thanks! We have a free tier with limited features. Our pro plan starts at €50 per seat per month and includes all features. Teams often collaborate with domain experts to create datasets. And for custom integrations, we offer custom plans on request.
Yes, you can fine-tune using plain text completions. You don't need structured conversations unless you want conversational abilities. Plain text works great if you want the model to generate text in a specific style or domain. It all depends on what you're trying to achieve.
Hi, current pricing for Llama 3.1 8B for example is: Training Tokens: $2 / 1M, Input and Output Tokens: $0.30 / 1M. We'll update pricing on the website shortly to reflect this.
Other co-founder here, so we offer more specific features around iterating on your datasets and include domain experts in this workflow. And I'd argue that you also want your datasets not necessarily with your foundation model provider like OpenAI, so you have the option to test with and potentially switch to open-source models.
Am I able to upload a book and have it respond truthfully to the book in a way that's superior to NotebookLM or similar? Generally most long context solutions are very poor. Or does the data have to be in a specific format?
To get the outcome you want, RAG (retrieval augmented generation) would be the way to go, not fine-tuning. Fine-tuning doesn't make the model memorize specific content like a book. It teaches new behaviors or styles. RAG allows the model to access and reference the book during inference. Our platform focuses on fine-tuning with structured datasets, so data needs to be in a specific format.
This is a very common topic, so I wrote a blog post that explains the difference between fine-tuning and RAG if you're interested: https://finetunedb.com/blog/fine-tuning-vs-rag
These days, I'd say the easiest and most effective approach is to put the whole book in the context of one of the longer context models.
Agreed, for this use case probably the easiest way to go.
(and most expensive)
Agreed too
Not really, for something like gemini the accuracy and performance is very poor.
The magic behind NotebookLM can't be replicated only with fine-tuning. It's all about the workflow, from the chunking strategy, to retrieval etc.
For a defined specific use-case it's certainly possible to beat their performance, but things get harder when you try to create a general solution.
To answer your question, the format of the data depends entirely on the use-case and how many examples you have. The more examples you have, the more flexible you can be.
Was looking for a solution like this for a few weeks, and started coding my own yesterday. Thank you for launching! Excited to give it a shot.
Question: when do you expect to release your Python SDK?
There hasn't been a significant demand for the Python SDK yet, so for now we suggest interacting with the API directly.
With that being said, feel free to email us with your use-case, I could build the SDK within a few days!
If you currently have an SDK in any of the 5 major languages, or if your API is well documented in a structured way, it should be very easy to write ab SDK in Python, Go, anything LLMs know well.
Main requirement is to programmatically send my chat logs. Not a big deal though, thanks!
Ah I see, got it. For now the API should work fine for that!
Very happy to hear, please do reach out to us with any feedback or questions via founders@finetunedb.com
Looks pretty cool, congrats so far! Do you allow downloading the fine tuned model for local inference?
Thank you, and yes that is possible. Which model are you looking to fine-tune?
If that's the case then I'll try the platform out :) I want to finetune Codestral or Qwen2.5-coder on a custom codebase. Thank you for the response! Are there some docs or infos about the compatibility of the downloaded models, meaning will they work right away with llama.cpp?
We don't support Codestral or Qwen2.5-coder right out of the box for now, but depending on your use-case we certainly could add it.
We utilize LoRA for smaller models, and qLoRA (quantized) for 70b+ models to improve training speeds, so when downloading model weights, what you get is the weights & adapter_config.json. Should work with llama.cpp!
Looks nice. What is the price and what does it depend on?
Thanks! We have a free tier with limited features. Our pro plan starts at €50 per seat per month and includes all features. Teams often collaborate with domain experts to create datasets. And for custom integrations, we offer custom plans on request.
More details here: https://docs.finetunedb.com/getting-started/pricing
Any specific features or use cases you're interested in?
Hey all, co-founder here happy to answer any questions!
Is it possible to fine-tune language models using plain text completions, or is it necessary to use datasets consisting of structured conversations?
Yes, you can fine-tune using plain text completions. You don't need structured conversations unless you want conversational abilities. Plain text works great if you want the model to generate text in a specific style or domain. It all depends on what you're trying to achieve.
What's the cost of fine tuning and then serving a model, say Llama 3 8B or 70B? I couldn't find anything on the website...
Hi, current pricing for Llama 3.1 8B for example is: Training Tokens: $2 / 1M, Input and Output Tokens: $0.30 / 1M. We'll update pricing on the website shortly to reflect this.
What benefits does this bring me vs just using OpenAI's official tools?
Other co-founder here, so we offer more specific features around iterating on your datasets and include domain experts in this workflow. And I'd argue that you also want your datasets not necessarily with your foundation model provider like OpenAI, so you have the option to test with and potentially switch to open-source models.
This looks awesome!
Thanks!