The Garbage Collection Handbook

(gchandbook.org)

278 points | by andsoitis 2 days ago ago

55 comments

  • OptionOfT 2 days ago ago

    My favorite story about garbage collection: https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...

    • whartung 2 days ago ago

      They do that in other places.

      As I heard the tale, on the Standard Missile, they don't recirculate the hydraulic fluid, they just spit out as the missile flies. It's a wonderful engineering solution.

      • pfdietz 2 days ago ago

        And on the Falcon 9, the hydrocarbon fuel is used as hydraulic fluid, then just dumped back into the fuel tank.

        • 01HNNWZ0MV43FF 2 days ago ago

          And the SR-71 uses its fuel as coolant.

          "There was a lot we couldn't do, but we were the fastest kids on the block..."

    • Findecanor 2 days ago ago

      I would call that a region-based memory allocator... Only that it has a single region, ever.

      • amelius 2 days ago ago

        Yeah if you have for example a http request, you can just collect garbage you create during that request in a single region, then throw it away when the request has been handled. This is quite standard.

      • antonvs 2 days ago ago

        Or it's a generational garbage collector with the generation management and collection functionality omitted.

      • eru 2 days ago ago

        Well, the garbage is collected when the missile hits the target region.

        • amelius 2 days ago ago

          The garbage is spread out over the target region.

    • Agingcoder 2 days ago ago

      It’s pretty standard in many places I think - the point here is not the null gc but rather exact memory requirements being proved statically.

    • zipy124 2 days ago ago

      This is one of my favourite anecodtes to tell peers and colleagues because it's important when understanding buisness case/needs against programming. We all want to make perfect software, but it isn't always neccessary.

    • dana321 2 days ago ago

      now that is what i call the ultimate in garbage collection technology

      • naasking 2 days ago ago

        I think the missile impact creates a lot more garbage spread over a wider area.

  • charcircuit 2 days ago ago

    I wish the author section provided what production garbage collectors the authors worked on. There's plenty of nonintuitive things you can learn in the real world, so a book including those would be both interesting and useful.

  • nhatcher 2 days ago ago

    Great book. Previous discussion: https://news.ycombinator.com/item?id=35492307

    (387 points, 166 comments)

  • iainctduncan 2 days ago ago

    I have this, it is very well written and thorough. Highly recommend!

  • throwaway17_17 2 days ago ago

    I see that there is a section (relatively short) on real time GC. But for anyone who has read the Handbook, how much emphasis is placed on GC in constrained environments. I have fought the urge to implement a 3D, modern AA game with GC just to prove it is viable outside all but the most resource poor platforms or the most AAAAA, cutting edge, every cycle counted, hyper optimized game. But I am transitioning to a slightly less focused area of responsibility at work and may have some free time to prototype and this may be how I spend my winter and spring free time.

    • indigo945 2 days ago ago

      I think you would be hard-pressed to find a modern AA game that does not already use a GC. The major game engines Unreal and Unity are garbage collected - although they use manual memory management for some of their internals, the exposed API surface (including the C++ API) is designed with garbage collection in mind.

      Notably, the popular-with-hobbyists Godot Engine does not use a garbage collector. It uses reference counting with some objects, but does not provide cycle detection, thus requires all objects to be laid out in a tree structure (which the engine is built around).

      • pjmlp a day ago ago

        Reference counting is chapter 5 on the linked book.

        • indigo945 a day ago ago

          I said "a GC", that is, "a garbage collector". Even if you consider reference counting as technically being garbage collection, purely reference counted systems do not have a distinct entity that can be identified as "a" garbage collector. So I'm technically correct here even in face of this pedantry.

          Not that I think it's a reasonable approach to language to be pedantic on this. RC being GC is, of course, true from an analytic approach to language: a garbage collection system is defined as a system that collects and frees objects that are unreachable and thus dead; a reference counting pointer collects and frees objects that are unreachable and thus dead; therefore, reference counting is garbage collection.

          One problem with this is the vagueness: now, the use of a call stack is garbage collection; after all, returning from a function collects and frees the objects in the stack frame. Leaking memory all over the place and expecting the operation system to clean up when you call `exit()` likewise is "garbage collection".

          But more importantly, it's just not how anyone understands the word. You understood perfectly well what I meant when I said "you would be hard-pressed to find a modern AA game that does not already use a GC"; in other words, you yourself don't even understand the word differently. You merely feel an ethical imperative to understand the word differently, and when you failed to do so, used my comment as a stand-in to work through the emotions caused by your own inability to live up to this unfulfilled ethic.

          • pjmlp 17 hours ago ago

            Except they do, when one bothers to read computer science reference literature, instead of blog posts from folks that learned programming on their own way.

            Being pedantic is required mechanism to fix urban myths, that is how we end up with he says, she says, adultered knowledge.

            All those "garbage collection" variations are exactly the proof what happens when people on the street discuss matters without having a clue about what they are talking about, it is like practice medecine with village recipes "I hear XYZ cures ABC".

            It is not vague, IEEE and ACM have plenty of literature on the matter.

        • pizlonator a day ago ago

          Reference counting isn’t garbage collection.

          • hashmash a day ago ago

            Just say "tracing garbage collection" to avoid the usual referencing counting arguments.

          • gf000 a day ago ago

            It absolutely is (and as per another thread under another post, both pjmlp and me are notorious for correcting people on this specific point)

            • pizlonator a day ago ago

              They are not interchangeable. The semantics are observably different. Therefore, RC is not GC.

              Reference counting gives you eager destruction. GC cannot.

              GC gives lets you have garbage cycles. RC does not.

              I think a part of the GC crew reclassified RC as GC to try to gain relevance with industry types during a time when GC was not used in serious software but RC was.

              But this is brain damage. You can’t take a RC C++ codebase and replace the RC with GC and expect stuff to work. You can’t take a GC’d language impl and replace the GC with RC and expect it to work. Best you could do is use RC in addition to GC so you still keep the GC semantics.

              • cryptonector 11 hours ago ago

                > GC gives lets you have garbage cycles. RC does not.

                This is the biggest difference, but if you disallow cycles then they come close. For example, the jq programming language disallows cycles, therefore you could implement it with RC or GC and there would be no observable difference except "eager destruction", but since you could schedule destruction to avoid long pauses when destroying large object piles, even that need not be a difference. But of course this is a trick: disallowing cycles is not a generic solution.

              • gf000 a day ago ago

                > Reference counting gives you eager destruction. GC cannot.

                Tracing GC can't. Reference counting, which is by definition a GC can. It's like insects vs bugs.

                And destructors are a specific language feature. No one says that they are a must have and if you don't have them then you can replace an RC with a tracing GC. Not that it matters, a ladybug is not the same as an ant, but they are both insects.

                • pizlonator a day ago ago

                  The best part of these conversations is that if I say “garbage collection”, you have zero doubt that I am in fact referring to what you call “tracing garbage collection”.

                  You are defining reference counting as being a kind of garbage collection, but you can’t point to why you are doing it.

                  I can point to why that definition is misleading.

                  Reference counting as most of the industry understands it is based on destructors. The semantics are:

                  - References hold a +1 on the object they point to.

                  - Objects that reach 0 are destructed.

                  - Destruction deletes the references, which then causes them to deref the pointed at object.

                  This is a deterministic semantics and folks who use RC rely on it.

                  This is nothing like garbage collection, which just gives you an allocation function and promises you that you don’t have to worry about freeing.

                  • gf000 a day ago ago

                    > https://web.eecs.umich.edu/~weimerw/2008-415/reading/bacon-g...

                    They are different approaches for the same thing: automatic memory management. (Which is itself a not trivial to define concept)

                    One tracks liveness, while the other tracks "deadness", but as you can surely imagine on a graph of black and white nodes, collecting the whites and removing all the others vs one by one removing the black ones are quite similar approaches, aren't they?

                    • pizlonator a day ago ago

                      You’re not going to convince me by citing that paper, as it’s controversial in GC circles. It’s more of a spicy opinion piece than a true story.

                      I agree that RC and GC are both kinds of automatic memory management.

                      RC’s semantics aren’t about tracking deadness. That’s the disconnect. In practice, when someone says, “I’m using RC”, they mean that they have destructors invoked on count reaching zero, which then may or may not cause other counts to reach zero. If you squint, this does look like a trace - but by that logic everyone writing recursive traversals of data structures is writing a garbage collector

                      • pjmlp 21 hours ago ago

                        A RC algorithm implementation using a cycle collector, or deferred deletion on a background thread, to reduce stop the world cascade deletion impact, is....

                  • pjmlp a day ago ago

                    Regarding being "deterministic",

                    CppCon 2016: Herb Sutter “Leak-Freedom in C++... By Default.”

                    https://www.youtube.com/watch?v=JfmTagWcqoE

          • pjmlp a day ago ago

            It surely is from computer science point of view, now that many prefer street knowledge, is another matter.

            • pizlonator a day ago ago

              Someone saying that they are the same isn’t “science”.

              What is science is programming language semantics, and by that science, RC and GC are different.

              • pjmlp 17 hours ago ago

                RC is a GC algorithm, naturally there are different implementations available.

                Semantics are well defined in research literature.

          • BoingBoomTschak a day ago ago

            Does it not collect garbage?

    • pjmlp 2 days ago ago

      US navy has weapons targeting systems on some battleships implemented in Java with realtime GC, equally France has missile tracking systems, also implemented in Java with realtime GC, courtesy of PTC and Aonix.

      https://www.militaryaerospace.com/defense-executive/article/...

      https://www.lockheedmartin.com/en-us/products/aegis-combat-s...

      https://vita.militaryembedded.com/1670-aonix-uss-bunker-hill...

      Not all GC are born alike, and in real life there isn't "insert credit to continue".

    • charcircuit 2 days ago ago

      Minecraft is the best selling game of all time, uses GC, and is an indie game.

      • delusional 2 days ago ago

        There's a bunch of caveats to that story. At one point (in one patch I recall) they got tired of passing around 3 floats separately for x, y, and z all the time, so they did what any reasonable programmer would do and created a "coordinate" structure.

        This created one of the worst performing partches of the game ever, and they had to back all the way out. They ended up just passing the separate floats around again.

        My takeaway is that GC doesn't have to be slow, it just imposes a bunch of new constraints on what can be fast.

        • indigo945 2 days ago ago

          The problem there is probably that Java cannot pass objects by value [1]. That incurs an additional layer of indirection when accessing the individual members of the struct, tanking performance.

          That's not a necessity, though - you can use a GC in languages that allow you to control whether structs get allocated on the heap or on the stack, and then you don't have this issue. For example, in Go, structs can be allocated on the stack and passed by value, or they can be allocated on the heap and passed by reference, and this is under the control of the application programmer [2].

          [1]: Actually, according to the Java spec, Java does not have pass-by-reference, and objects are always passed by value. However, that's just strange nomenclature - in Java parlance, "object" names the reference, not the actual range of memory on the heap.

          [2]: The language spec does not guarantee this, so this is technically implementation-defined behavior. But then, there's really only one implementation of the Go compiler and runtime.

        • gf000 2 days ago ago

          Value types would solve that issue flawlessly.

        • pizlonator a day ago ago

          That’s not the GC’s fault. If you wrote C++ code that malloced a vector object every time you wanted to create a new vector, it would be even worse.

          That’s Java’s fault for not having value types (though that’s changing soon maybe).

      • znpy 2 days ago ago

        Yeah but it’s a game category where is that’s viable

        • charcircuit a day ago ago

          What do you mean? A 3D game with a dynamic environment doesn't sound like the best category for GC. Or do you just mean that it was a game that didn't write extreme performance optimizations.

    • dafelst 2 days ago ago

      Unreal Engine has a GC for its internal object graph, so GC is already in use in a ton of games.

    • bjourne 2 days ago ago

      Not much. The book mostly covers theory and not platform-specific details. The explanations on various real-time gc algorithms are very thorough though.

    • pizlonator a day ago ago

      Unreal has an incremental GC

    • 01HNNWZ0MV43FF 2 days ago ago

      Wouldn't all the popular games based on Unity and written in C# count?

  • Agingcoder 2 days ago ago

    This is a truly remarkable book, and a must read for any engineer who depends on a gc . And if you don’t need a gc, the book starts by talking about allocators, which are actually very important too !

  • Verdex 2 days ago ago

    I had Hosking as a professor. Iirc, it was an okay experience. Compilers course I believe.

    When the handbook came out, I bought it because "hey, I know that guy". Ultimately, I don't think it's necessary, but having a more in depth knowledge of garbage collection and the problems in the space occasionally comes in handy.

    For example, what implication do finalizers have on garbage collection design? Reading about that was kind of an eye opener.

  • xenophonf a day ago ago

    I wish there was a big, friendly "buy now" link that would get me the print book and EPUB file. The site promotes the book. I'm not sure why they don't make buying it stupid simple.