Design and implementation of DuckDB internals

(duckdb.org)

197 points | by mpweiher 5 days ago ago

16 comments

  • mrtimo 2 days ago ago

    If you are a data scientist or do anything with data... duckdb is like a swiss army knife. So many great ways it can help your workflow. The original video from CMU in 2020 [1] is a classic. Minutes 3-8 present a good argument for adding duckdb to your data cleaning/processing workflow.

    And if you want to add a semantic layer on top of data, Malloy [2] is my favorite so far (it has duckdb built in):

    [1]: https://www.youtube.com/watch?v=PFUZlNQIndo [2]: https://docs.malloydata.dev/documentation/

    • anitil 21 hours ago ago

      Thank you for the recommendation on that video! I've already adopted to using DuckDB for my ad-hoc analytics work but I didn't know the background

  • owlstuffing 2 days ago ago

    Analytics with type-safe raw SQL (including DuckDb’s awesome extensions) is pure gold:

    https://github.com/manifold-systems/manifold/blob/master/doc...

  • password4321 2 days ago ago

    Over the years I've seen anecdotes here on HN that DuckDB crashes often for several people. Is this still an issue for anyone?

    • wenc a day ago ago

      I use DuckDB daily.

      In short — It doesn’t crash often at all.

      What you may be remembering were reports of exceptional cases where it didn’t handle out of memory errors well. I was one of the people affected. I was running complex analytic queries on 400 GB parquets and I only had 128GB memory. It used jemalloc which didn’t gracefully degrade. They fixed a lot of the OOM issues so it’s more robust now. I haven’t had a crash for a long time.

      On normal sized datasets it never crashes.

    • xtracto a day ago ago

      We use it heavily at my workplace. It doesn't crash at all if you use it as OLAP. But if you use it incorrectly, it will crash.

      It's pretty solid.

    • jazzpush2 a day ago ago

      Never seen this and have several products that use it...

  • mpweiher 2 days ago ago

    The actual slides are linked from the intro-text:

    https://github.com/DBatUTuebingen/DiDi

  • chkrishnatej a day ago ago

    That was a good start for understanding DuckDB internals!!!

  • fg137 2 days ago ago

    Unfortunately it does not seem that there are lecture videos.

  • buryat 2 days ago ago

    thank you! Learned why DuckDB is named this way

  • viccis 2 days ago ago

    Am I missing something or is the content empty?