Artifacts: Versioned storage that speaks Git

(blog.cloudflare.com)

107 points | by jgrahamc 8 hours ago ago

7 comments

  • podviaznikov 7 minutes ago ago

    this looks very cool!

    i'm thinking how come we have tools like dropbox that are optimized for non technical people, GitHub for developers and now artifacts for agents.

    they all do the same thing: version files.

    why we cannot have one system that all types of users can use!

    trying to make something myself here https://sublimated.com

    and it works both for non technical peoplle, dev and agents via git protocol.

  • a_t48 an hour ago ago

    The ArtifactFS thing looks neat. I love this thing: https://github.com/cloudflare/artifact-fs/blob/main/examples... - it's kind of a generic image that turns into "another image" based on an argument you pass in. I recently built something similar for my own project that does the same thing except for Dockerfiles (needed for my stuff because cloud machines don't understand my registry!).

    I wonder if I can fork/extend ArtifactFS for other types of content addressed storage. My registry is very git-like in some ways - it has an index, files are content addressed, etc.

  • xeubie 4 hours ago ago

    I really like it. API-first git repos without the limitations of a git service like github that are built primarily for humans. Looks like a competitor to code.storage by pierre.

    Zig is a great choice. I spent the last three years working on my own git implementation in Zig (see my profile) and it's really the perfect language for this. It gives precise low level control and heavily emphasizes eliminating dependencies (like libc) which makes it perfect for web assembly.

  • tln 2 hours ago ago

    Ooh, this looks great!

    The usage costs are rather high compared to S3 - 30x higher PUT/POST. It looks like batching operations is going to be vital.

    • tln 2 hours ago ago

      Hmm, I'd expect to be able to actually access the contents of the git repo...

      Docced Features: clone repos init new repos import repos

      Missing features: list branches and tags list objects list commit history create new commits read raw git objects merge branches or repos read git object by path

  • crabbone an hour ago ago

    > Agents have changed how we think about source control, file systems, and persisting state. Developers and agents are generating more code than ever — more code will be written over the next 5 years than in all of programming history — and it’s driven an order-of-magnitude change in the scale of the systems needed to meet this demand. Source control platforms are especially struggling here: they were built to meet the needs of humans, not a 10x change in volume driven by agents who never sleep, can work on several issues at once, and never tire.

    I keep hearing this argument, but there's not even an attempt at explanation for why this should be true.

    The amount of code written is predicated on the amount of features planned, which is in turn predicated on the customer's needs and willingness to pay. The amount of code a programmer is able to produce per day is not (and hasn't been for a while, don't know if ever) a problem when it comes to the speed of product development.

    Having witnessed some projects from early start to transition into maintenance mode, I could attest to the amount of code generated by the same programmers during different project maturity stages being dramatically different: at the very beginning, it's possible that a single programmer will be doing hundreds commits a day, each worth of hundreds of changes. But, once the project is mostly fleshed out, the commits start coming maybe once a day, but could be even fewer. The initial stage doesn't last very long either. It's typically measured in months.

    So, sorry... I don't think that agents changed any of "source control, filesystems, and persistent state". There's no reason and no evidence to believe that they did.

    * * *

    > Further, Git’s data model is not only good for source control, but for anything where you need to track state, time travel, and persist large amounts of small data.

    Persist large amounts of small data? Have these people never seen a relational database? Git doesn't hold a candle to a proper database when it comes to storing large amounts of small data. Its database model and implementation are extremely naive... which is OK for a program that isn't trying to be a general-purpose database to store large amounts of small data. Git is not the problem. The authors of this article are.

    * * *

    > Artifacts’ Git API might make you think it’s just for source control, but it turns out that the Git API and data model is a powerful way to persist state in a way that allows you to fork, time-travel and diff state for any data.

    Seriously? And yet Git struggles with anything that isn't a flat file where line serves as an important unit of measuring and storing information... Any departure from line-oriented diff produces really poor results in Git and either requires specialized tooling on top of Git to handle or is just outright so bad that even specialized tooling can't be made to deal with it.

    * * *

    > But what about a multi-GB repository and/or repos with millions of objects? How can we clone that repo quickly, without blocking the agent’s ability to get to work for minutes and consuming compute?

    This is misunderstanding the problem: nobody needs repos with millions of objects. What Git needs is... better modules (not necessary git-module, just the modularity part). S.t. for example it's possible to check out only the relevant sub-tree from the remote, or to commit changes only to the said sub-tree w/o having to affect and therefore contest the history of unrelated parts of the repo.

  • jauntywundrkind 4 hours ago ago

    As someone who has spent probably a percent or more of my life working on or thinking about state, and how it could or should just be decomposed 9p files when possible, about externalizing state & opening up new frontiers of scripting, and how git can tie that together and let us build new distributed systems, I am cheering. Cheering wildly.

    The zig wasm sounds so so good. I've enjoyed git on rust via gitoxide ( https://github.com/gitoxidelabs/gitoxide ) but haven't tried wasm yet. I rather expect gitoxide/rust would be bigger. The ability to really control memory like they talk of here seems like it could be a huge advantage for wasm inter-op across a SharedArrayBuffer (or like) holding the code too. Rust seems unlikely to be able to offer that.

    The ArtifactFS fuse driver sounds wonderful. My LLM session to build an csi storage driver is already begun!

    On another note, this gives me all kinds of feels:

    > Inside Cloudflare, we’re using Artifacts for our internal agents: automatically persisting the current state of the filesystem and the session history in a per-session Artifacts repo.

    On a personal level I find this amazing & incredible & I love it.

    But reciprocally this feels like an incredibly difficult social change. To collect all the work, to collectivize the thought processes / thought making.

    I am so enamoured with LLM programming. And I have so wanted engineering to better be able to externalize the tale of what happened, what did we do. But this also feels like there is no privacy, that this raw data is deeply deeply deeply personal.

    I feel so so so good about this & so scared too. I want very much to work more in public, but I also want some refuse, some space of my own. We lost offices for cubicles, and now we lose the sanctity of our own screens too? I both want to share, so much, to have shared means of thinking, but via more consensual deliberate means, please.