Finding and fixing Ghostty's largest memory leak

(mitchellh.com)

620 points | by thorel 2 days ago ago

148 comments

  • quantummagic 2 days ago ago

    This is great news! Well done to everyone who helped sort it out. It was a problem noted by users in a thread here just last week, https://news.ycombinator.com/item?id=46460319

    While Claude Code might have been the reason this bug became triggered by more people, there are some of us who were hitting it without ever having used Claude Code at all. Maybe the assumption about what makes a page non-standard, isn't as black-and-white as presumed. And I wonder if the leak would have been triggered more often for people who use scrollback-limit = 0, or something very small.

    Probably not a huge deal, but it does seem the fix will needlessly delete and recreate non-standard pages in the case where the new page needs to be non-standard, and the oldest one (that needs to be pruned) already is non-standard and could be reused.

    • mitchellh 2 days ago ago

      > Probably not a huge deal, but it does seem the fix will needlessly delete and recreate non-standard pages in the case where the new page needs to be non-standard, and the oldest one (that needs to be pruned) already is non-standard and could be reused.

      This is addressed in the blog post.

      It is how the PageList has always worked, and also how it worked before with the bug, because during capacity adjustment we would see the wrong size. This shouldn't change any perceived performance.

      And as I note in the blog post, there are alternative approaches such as the one you suggested, but we don't have enough empirical data to support changing our viewpoint on that whereas our current viewpoint (standard sizes are common) is well supported by known benchmarks. I'm open to changing my mind here, but I didn't want to change worldviews AND fix the leak in the same go.

      • fartfeatures 2 days ago ago

        How come this isn't released as a hotfix / out of band patch but will follow the standard release cycle in March?

      • fragmede 2 days ago ago

        Of all the things to be impressed by you about, your patience is commendable. I'd be losing my shit if someone couldn't be bothered to read what I wrote and just spout off about something I'd addressed in my writing, but I suppose that's why your bank account has two commas and a bunch more. Thank you for everything. Can we go flying sometime?

        • Aurornis a day ago ago

          > I'd be losing my shit if someone couldn't be bothered to read what I wrote and just spout off about something I'd addressed in my writing

          In my experience that’s a universal feature of comment sections everywhere, and HN is not an exception. This is very common in HN comments which is why it’s important to always read the article, not just the comments.

        • ATMLOTTOBEER a day ago ago

          [flagged]

          • fragmede a day ago ago

            Hell yeah! You don't got any heros? No body you look up to or respect? Not even a little bit?

    • commandersaki 2 days ago ago

      > Well done to everyone who helped sort it out. It was a problem noted by users in a thread here just last week

      I'm feeling a bit lucky I was able to sneak in an issue during the beta phase, but it was a real reproducible one that led to a segfault.

    • macote 2 days ago ago

      The thread about memory leak is here: https://news.ycombinator.com/item?id=46461061

      • Maxious 2 days ago ago

        And the same diagnosis in the blog post was reported by a user in discussions a month ago but ignored https://github.com/ghostty-org/ghostty/discussions/9786#disc...

        • julien_p 2 days ago ago

          That doesn't sound like the actual issue, or am I not understanding it correctly?

          • dkdcio 2 days ago ago

            I think you’re correct. the reproduction isn’t very precise and the solution doesn’t seem right (I’m not seeing anything about the non-standard pages not being freed). I’d guess this was ignored because it was wrong…

    • larodi 2 days ago ago

      As a side note - Claude Code is making the CLI attractive in a renewed fashion - more than anything else did it last 20years.

  • jrpelkonen 2 days ago ago

    Great write-up. And, thanks mitchellh for Ghostty, I switched to it last year, and have not regretted it.

    However, I am a somewhat surprised that the fix is reserved for a feature release in a couple of months. I would have expected this to be included in a bug fix release.

    • msephton 2 days ago ago

      It's already released in the latest nightly build.

      • DrammBA 2 days ago ago

        Are the nightly releases the expected way to get timely bugfixes?

        • amazingman 2 days ago ago

          That is how software releases generally work. AFAICT this is not a bug with broad impact or security implications.

          • fartfeatures a day ago ago

            I guess thats arguable, a memory leak can make a system unpleasant to use although I accept it can be solved by repeatedly restarting the offending app.

  • reactordev 2 days ago ago

    The moment you started talking about pages, I was like: “Ok, obviously memory pooled” and yup, it is. Then I said “obviously ring buffered” and yeah, essentially your scroll back reuse. Then I knew exactly where the bug was before getting to that part, not freeing the pages memory properly and sure enough - bingo! With some great looking diagrams of memory space alignment.

    Kudos, that was a good read. Just remember that every time you do something novel, there’s potential for leaks :D

  • neobrain 2 days ago ago

    Funny timing, I moved to Ghostty this week and just today I ran into OOM crashes in Ghostty while developing a terminal UI app. Coincidentally this TUI has a tab bar that looks like this, where UTF8 icons are used for recognizability and activity indicators (using © and € as placeholders here):

        1|Flakes ©    2|Installed ©    3|Store © €    4|Security © €
       ──────────────────────────────────────────────────────────────
    This works fine normally, but resizing the terminal would quickly trigger the crash - easy to avoid but still annoying!

    I was already preparing myself to file a bug report with the easy repro, but this sounds suspiciously close to what the blog post is describing. Fingers crossed :)

    (EDIT: HN filters unicode, booo :( )

    • smoyer 2 days ago ago

      Why would I move to GhosTTY versus the terminal emulator that comes with my OS as it's not clear to me from the documentation?

      • neobrain 2 days ago ago

        I don't think I can do a better overview than https://ghostty.org/docs/about . It's not world-changing but simply a very polished, well-executed terminal.

        GPU rendering virtually eliminates typing latency. Most terminals that have it don't support native content like tabs, but Ghostty gets minimal latency without having to compromise on essentials since it uses native toolkits under the hood.

        The modern TTY has lots of protocol extensions that allow your CLI tools to do things like display high-resolution images. There's tons of good-quality color themes out-of-the-box (with a built-in browser for preview).

        Configuration is highly customizable but the defaults are good enough that you barely need it.

        • smoyer 2 hours ago ago

          I wish a couple of those paragraphs were on the home page!

  • jhhh 2 days ago ago

    This feels like a case of guessing at something you could know. There are two types of allocations that each have a size and free method. The free method is polymorphic over the allocations type. Instead of using a tag to know absolutely which type an object it is you guess based on some other factor, in this case a size invariant which was violated. It also doesn't seem like this invariant was ever codified otherwise the first time a large alloc was modified to a standard size it would've blown up. It's worth asking yourself if your distinguishing factor is the best you can use or perhaps there is a better test. Maybe in this case a tag would've been too expensive.

  • hotpotat 2 days ago ago

    @mitchellh what did you use for the memory visualizations? Looks nice, and the website plays well with mobile. Whats the stack?

    • mitchellh 2 days ago ago

      Static HTML/CSS generated by Opus 4.5.

      I like using AI for visualizations because it is one-time use throwaway code, so the quality doesn't matter at all (above not being TOTALLY stupid), it doesn't need to be maintained. I review the end result carefully for correctness because it's on a topic I'm an expert of.

      I produce non-reusable diagrams namespaced by blog post (so they're never used by any other post). I just sanity check that the implementation isn't like... mining bitcoin or leaking secrets (my personal site has no secrets to build) or something. After that, I don't care at all about that quality.

      The information is conveys is the critical part, and diagrams like this make it so much more consumable for people.

      • 63 2 days ago ago

        That's really cool. I was looking at them and thinking "I could probably make these with vanilla html/css but it'd be pretty tedious." Perfect use case for AI. I need to work on developing a reflex for it.

        • mjn 2 days ago ago

          I've also started doing this, and it's surprisingly enjoyable to both do and even to read. The end result is often more readable to me than using a 3rd-party JS visualization library, because I only need to know standard HTML/CSS concepts to understand what's going on. And a side benefit is smaller pages with less bitrot due to being able to skip the dependencies.

      • hotpotat 2 days ago ago

        That’s reasonable, thanks!

  • stephc_int13 2 days ago ago

    I've been following the development of Ghostty for a while and while I have the feeling that there is a bit of over-engineering in this project, I find this kind of bug post mortem to be extremely valuable for anyone in love with the craft.

    • trevorhinesley 2 days ago ago

      Over-engineered in what way?

      • nesarkvechnep 2 days ago ago

        It’s just a feeling, man.

      • cbmuser 2 days ago ago

        Having to introduce a new language stack to distributioms just to be able to build a terminal emulator is what I would consider over-engineering.

        • surajrmal 2 days ago ago

          So anything that uses a less popular language is considered over engineering? Distros support lots of different languages already and there are likely other packages built with zig already.

        • weebull 2 days ago ago

          A 50-ish MB build time dependency that doesn't need any special privileges or installation to run? That's over engineering? A binary release of just CMake is bigger than all of Zig.

  • bryancoxwell 2 days ago ago

    Super accessible write up as someone unfamiliar with Ghostty and terminal emulators in general. Thanks!

  • andrewaylett a day ago ago

    Let me see if I can understand this properly:

    There's a linear buffer of pages, most of which come from the pool. It's not clear to me under what conditions these are returned to the pool? Is it when the specific session terminates?

    When a non-standard page reaches the point of being recycled, it'll instead be re-added to the list but with a standard size. That effectively leaks the extra space above the standard size. But when the buffer is released (because the session ends?) the pool is also released, which releases all the standard sized pages but leaks the custom-sized ones?

    Which suggests that the issue may be even rarer than it initially looked to me: I tend to open a small number of sessions and then use them continuously, rather than starting new sessions during the lifetime of the process. If I never terminated a session, I would never fully leak the memory?

  • kepano 2 days ago ago

    Reliable reproductions are so valuable.

  • drob518 2 days ago ago

    Why not just use a circular buffer for the scroll back? Why use blocks at all if you’re just going to recycle them anyway? That said, great write-up.

  • a day ago ago
    [deleted]
  • dangoodmanUT 2 days ago ago

    waiting for someone to say "this wouldn't have happen if you chose rust"

    • woodruffw 2 days ago ago

      You’ll probably be waiting a long time, since Rust very explicitly doesn’t have “leak safety” as a constructive property. Safe Rust programs are allowed to leak memory, because memory leaks themselves don’t cause safety issues.

      There’s even a standard, non-unsafe API for leaking memory[1].

      (What Rust does do is make it harder to construct programs that leak memory unintentionally. It’s possible but not guaranteed that a similar leak would be difficult to express idiomatically in Rust.)

      [1]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.l...

      • tialaramex 2 days ago ago

        The specific language feature you want if you insist that you don't want this kind of leak is Linear Types.

        Rust has Affine Types. This means Rust cares that for any value V of type T, Rust can see that we did not destroy V twice (or more often).

        With Linear Types the compiler checks that you destroyed V exactly once, not less and not more.

        However, one reason I don't end up caring about Leak Safety of this sort is that in fact users do not care that you didn't "leak" data in this nerd sense. In this nerd sense what matters is only leaks where we lost all reference to the heap data. But from a user's perspective it's just as bad if we did have the reference but we forgot - or even decided explicitly not - to throw it away and get back the RAM.

        The obvious way to make this mistake "by accident" in Rust is to have two things which keep each other alive via reference counting and yet have been disconnected and forgotten by the rest of the system. A typical garbage collected language would notice that these are garbage and destroy them both, but Rust isn't a GC language of course. Calling Box::leak isn't likely to happen by accident (though you might mistakenly believe you will call it only once but actually use it much more often)

        I think the main part of Ghostty's design mentioned here that - as a Rust programmer - I think is probably a mistake is the choice to use a linked list. To me this looks exactly like it needs VecDeque, a circular buffer backed by a growable array type. Their "clever" typical case where you emit more text and so your oldest page is scrapped and re-used to form your newest page, works very nicely in VecDeque, and it seems like they never want the esoteric fast things a linked list can do, nor do they need multi-writer concurrency like the guts of an OS kernel, they want O(1) pop & push from opposite ends. Zig's Deque is probably that same thing but in Zig.

        • vlovich123 2 days ago ago

          The issue isn’t linked list vs dequeue but type confusion about what was in the container. They didn’t forget to drop it - they got confused about which type was in the list when popping and returned it to the pool instead of munmap.

          The way to solve this in Rust would be to put this logic in the drop and hide each page type in an enum. That way you can’t ever confuse the types or what happens when you drop.

          • hardwaresofton 2 days ago ago

            Was going to say this, but I don't think anyone actually wants to hear that Rust actually would have helped here.

            As you're saying, the bug was the equivalent of an incorrectly written Drop implementation.

            Nothing against Zig, and people not using Rust is just fine, but this is what happens when you want C-like feel for your language. You miss out on useful abstractions along with the superfluous ones.

            "We don't need destructors, defer/errdefer is enough" is Zig's stance, and it was mostly OK.

            Impossible to predict this kind of issue when choosing a project language (and it's already been discussed why Zig was chosen over Rust for Ghostty, which is fine!), so it's not a reason to always choose Rust over Zig, but sometimes that slightly annoying ceremony is useful!

            Maybe some day I'll be smart enough to write Zig as a default over Rust, but until that day I'm going to pay the complexity price to get more safety and keep more safety mechanisms on the shotgun aimed at my foot. I've got plenty of other bugs I can spend time writing.

            Another good example is the type vs type alias vs wrapper type debate. It's probably not reasonable to use a wrapper type every single time (e.g. num_seconds probably can probably be a u32 and not a Seconds type), but it's really a Rorschach test because some people lean towards one end versus the other for whatever reason, and the plusses/minuses are different depending on where you land on the spectrum.

            [EDIT] also some good discussion here

            https://ziggit.dev/t/zig-what-i-think-after-months-of-using-...

            • weebull 2 days ago ago

              > "We don't need destructors, defer/errdefer is enough" is Zig's stance, and it was mostly OK.

              There's more than that. Zig has leak detecting memory allocators as well, but they only detect the leak if it happens. Nobody had a reliable reproduction method until recently.

            • AndyKelley 2 days ago ago

              If you wanted to match Ghostty's performance in Rust, you'd need to use unsafe in order to use these memory mapping APIs, then you'd be in the exact same boat. Actually you'd be in a worse boat because Zig is safer than unsafe Rust.

              • hardwaresofton 2 days ago ago

                > If you wanted to match Ghostty's performance in Rust, you'd need to use unsafe in order to use these memory mapping APIs, then you'd be in the exact same boat.

                Yea, but not for all the parts — being able to isolate the unsafe and build abstractions that ensure certain usage parts of the unsafe stuff is a key part of high quality rust code that uses unsafe.

                In this case though I think the emphasis is on the fact that there is a place where that code should have been in Rust land, and writing that function would have made it clear and likely avoided the confusion.

                Less about unsafe and more about the resulting structure of code.

                > Actually you'd be in a worse boat because Zig is safer than unsafe Rust

                Other people have mentioned it but I disagree with this assertion.

                Its a bit simplistic but I view it this way — every line of C/Zig is unsafe (lots of quibbling to do about what “unsafe” means of course) while some lines of rust are unsafe. Really hard for that assertion to make sense under that world view.

                That said, I’m not gonna miss this chance to thank you and the Zig foundation and ecosystem for creating and continuously improving Zig! Thanks for all the hard work and thoughtful API design that has sparked conversation and progress.

                • AndyKelley a day ago ago

                  Thank you for the kind words.

                  > every line of C/Zig is unsafe

                  This is trivially false... for instance here's a line:

                      const pi = 3.14;
                  
                  It's actually a pretty small subset of the language that can cause unchecked illegal behavior.

                  Also IMO the word "safety" should include integer overflow. I don't agree that those kind of bugs are so unimportant as to not be checked in safe builds.

                  • hardwaresofton a day ago ago

                    > Thank you for the kind words.

                    Absolutely, I meant them.

                    > This is trivially false... for instance here's a line:

                    Yep, that was really wrongly stated on my part -- what I meant is that the kind of protections that "safe" Rust provides are not available anywhere in average lines of Zig code (though they can be detected with tooling, etc).

                    What I should have written is that I could easily write unsafe code anywhere in Zig (as in C). In practice of course most people don't because they're not trying to destroy their own computers, and most code is benign. Rust will at least save me from myself some of the time.

                    > Also IMO the word "safety" should include integer overflow. I don't agree that those kind of bugs are so unimportant as to not be checked in safe builds.

                    Rust does do some work to catch trivial overflows, but you're right that it does not catch any slightly more complex overflows, and that is certainly unsafe in a sense. I don't think any reasonable person would disagree with that.

                    Rust's answer to this of course is checked_{op}/wrapping_{op}/etc options, and that's what I often see in high quality codebases where it matters. Of course, this is a footgun that could have had a safety applied and it's too late now (AFAIK) to change the default to be always wrapping or something (also, I think people may oppose always checked for perf reasons).

                    [EDIT] Just to compare/make this more concrete, playgrounds:

                    https://zig.fly.dev/p/LGnrBGXPlVJ

                    https://play.rust-lang.org/?version=stable&mode=release&edit...

                    Rust in this case of doing something obviously wrong is at least a little more helpful -- the obvious overflow does not compile.

                    And of course you can get rust to do it like it allows (and what would be present in any codebase with real complexity):

                    https://play.rust-lang.org/?version=stable&mode=release&edit...

                    It's just that little bit of safety that makes it easy for me (personally) to default to Rust. Very possible that someday that won't be true.

                    [EDIT2] Also, somewhat under-discussed, but if Zig supported a bolt-on a "safety check compile mode" that ran with some stricter (maybe not quite borrow checking level) semantics, that would be pretty dope. Of course not something anyone should devote any real time to for a long time (or ever?) BUT it would trivialize a lot of these discussions maybe.

                    But in the mean time people just using what they're comfortable with/the feel they want is obviously fine.

                    • tialaramex 10 hours ago ago

                      If you want "overflow-checks" in release builds for the primitive integer types you can tell Cargo that you want this, some people do so. https://doc.rust-lang.org/cargo/reference/profiles.html

                      Although Rust provides Wrapping<i32> if you want that, in practice you don't want that, wrapping unsigned integers are occasionally useful and I've written code with Wrapping<u8> and Wrapping<u32> types, but wrapping signed integers basically never come up. However it is significantly faster and it remains well defined so that's why it was chosen for release builds.

                      • hardwaresofton 9 hours ago ago

                        Those are great points, thanks for mentioning this, re-enabling overflow checks for release builds would indeed make the code safer with only a config change.

                        It's great that there are lots of options other than wrapping as well, checked, saturating, etc -- that at the cost of a little inefficiency make code that is robust to such failures really obvious.

              • tialaramex 2 days ago ago

                I don't buy your theory that a Rust terminal would need to directly use mmap to deliver matching performance. In fact I doubt Ghostty's author would endorse this claim either, they've never tried any alternatives, they tried this and it works for their purpose which is a long way from other ways wouldn't work or would all be slower or whatever.

              • vlovich123 2 days ago ago

                Please don’t get defensive and spread silly FUD. You can be proud of what you’ve accomplished without feeling sad that a different language has strengths that yours doesn’t.

                Calling unsafe mmap APIs not only is unlikely to run into the corner cases where unsafe Rust is tricky to get right, there’s “millions” of crates that offer safe APIs to do so and it’s fundamentally not hard to write it safely (it would be very hard to write it to have any issues).

                And fundamentally I think Rust is much more likely to be easier to get high performance because the vast majority of safe code you write is amenable to the compiler performing safe optimizations that Zig just can’t do regarding pointer aliasing (or if it does brings all the risks of of unsafe Rust when the user annotates something incorrectly).

                • uecker 2 days ago ago

                  I don't think this is silly FUD. The article describes a scenario where the low-level abstractions itself was buggy in a subtle way, the comparison to "unsafe" Rust seems entirely fair to me. (edited for typos)

                  • tialaramex 2 days ago ago

                    With Rust you always could unsafely do whatever went wrong in somebody's C or Zig or whatever, but the question is whether you would. Rust's technical design reinforces a culture where the answer is usually "No".

                    I don't find the claim that weird low level mmap tricks here are perf critical at all persuasive. The page recycling makes sense - I can see why that's helping performance, but the bare metal mmap calls smell to me like somebody wanted to learn about mmap and this was their excuse. Which is fine - I need to be clear about that - but it's not actually crucial to end users being happy with this software.

                    • vlovich123 a day ago ago

                      I think we can agree that Mitchell knows what he’s doing and isn’t playing around with mmap just because. It’s probably quite important to ensure a low memory footprint. But mmap in rust is not extra risky in some weird mystical way. It’s just a normal FFI function to get a pointer back and you can trivially build safe abstractions around it to ensure the lifetime of a slice doesn’t exceed the lifetime of the underlying map. It’s rust 101 and there’s nothing weird here that can cause the unsafe bits here to be extra dangerous (in general unsafe rust can be difficult to get right with certain constructs, but it doesn’t apply here).

                      • tialaramex a day ago ago

                        I actually don't think I agree about mmap. Reading around it seems as though Mitchell had clever ideas for abusing mmap and those didn't work out. Now he's got mmap for the pages and it works so why replace it, but that does not mean you need mmap to deliver this performance and in fact I'd be extremely surprised if that were true as somebody who spent about a decade of his life mostly writing close-to-metal database code in C using mmap...

                        If you want a whole lot of bytes and you ask your allocator, do you know what almost any popular general purpose allocator will do on a vaguely decent modern Unix? Call mmap to get them for you. So at most you're cutting out a few CPU instructions worth of middle man.

                    • uecker 2 days ago ago

                      Also in C or Zig you do not need to create your own memory management using mmap. Whether this is necessary in this case or not is a different question.

                      In the end, if the Rust advantage is that "Rust's technical design reinforces a culture" where one tries to avoid this, then this is a rather weak argument. We will see how this turns out in the long run though.

                      • vlovich123 a day ago ago

                        The long run has already spoken. Go look at the reports out of Microsoft and Android. It’s screamingly clear that the philosophy of Rust that most code can be written in safe with small bits in unsafe is inherently safer. The defect rate plummets by one or two orders of magnitude if I recall correctly. C is an absolute failure (since it’s the baseline) and Zig has no similar adoption studies. You could argue it will be similar if you always compile releasesafe, but then performance will be worse than C or Rust due to all the checks and it’s unclear how big a while the places that aren’t dynamically checked are.

                        Oh and of course rust is inherently slightly faster because no reference aliasing is allowed and automatically annotated everywhere which allows for significant aggressive compiler optimizations that neither C nor Zig can do automatically and is risky to do by hand.

                        • uecker a day ago ago

                          I don't put too much wait on the self-reporting by Microsoft or Google. I agree though that the strategy to write safe bits and abstractions is good. What I know not to be true is the idea that similar strategies would not work also in C.

                          • vlovich123 a day ago ago

                            > What I know not to be true is the idea that similar strategies would not work also in C.

                            Is your argument that developers at MS and Google haven’t been trying to employ these strategies for existing C codebases? It’s a bold position to take and one I’d say devoid of evidence; all the evidence suggests it’s really hard to reason about ownership in complex systems and abstractions only help you do so error free up to a very limited point.

                            • uecker 9 hours ago ago

                              I know for sure that Microsoft does not, because they are not interested in C (and there compiler does not even fully support recent standards) and I assume the same thing about Google. I general, I do not think they write much C in the first place. I also think their use cases and priorities are different from others.

                        • AndyKelley a day ago ago

                          Microsoft and Google are on the Rust Foundation board:

                          https://rustfoundation.org/about/

                          They benefit by having more of the industry using technologies they control.

                          Studies from independent third parties would be less biased.

                          • vlovich123 21 hours ago ago

                            More FUD and guilt by association. Microsoft and Google are also major contributors to the C and C++ standards bodies. Microsoft also has C# and Google has Kotlin. I think claiming they control Rust is weak given the community organization structure within the project and claiming the studies are inherently biased because they provide some funding is exceedingly weak.

                            IMHO the onus is on you to present any contrary studies showing Rust's safety profile isn't as good as the studies indicate when compared with C++ or to demonstrate where Zig's safety profile in real world complex environments stacks up.

                            We can disagree on opinions, but you can't discard all experimental evidence in favor of no evidence, especially when the safety profile of Rust is backed by solid theoretical models as to why it would be safer.

                            To that point, AWS and Cloudflare have also adopted the Rust language for all new projects. I think that says something about the recognition that it really is much harder to write trivial memory vulnerabilities.

                      • tialaramex 2 days ago ago

                        > We will see how this turns out in the long run though.

                        Rust 1.0 was in 2015. This is the long run. And I disagree that safety culture is a "weak argument". It's foundational, this is where you must start, adding it afterwards is a Herculean task, so no surprise that people aren't really trying.

                        • uecker a day ago ago

                          I am not saying that safety culture is irrelevant, not at all. I am saying that if the advantage of Rust is the culture that emphasizes safety (or rather memory safety, if the Rust community cared about safety in general cargo would not exist in this form) then that is a weak argument.

                          I don't think 10 years ago there was a lot of Rust used, so I am not sure how relevant it is that 1.0 was released at this time.

                          • vlovich123 a day ago ago

                            The culture of Rust is pretty uniform both in terms of convention (lots of good examples to learn from) and automated tooling (eg cargo clippy can fix many constructs into cleaner versions).

                            But sure, ultimately any code you see is limited by the talent of the author. However the safety of that code is not - it’s limited by how many unsafe blocks they wrote which you can actually grep for.

                            • uecker 9 hours ago ago

                              This is a naive and dangerous view of "unsafe". The safety of surrounding code depends on the unsafe blocks not violating invariants of safe Rust, and the safety of "unsafe" blocks may rely on assumptions about the safe part. Also it relates only to memory safety, so if your code review is to grep for "unsafe" blocks you are doing it wrong anyway.

                      • vacuity a day ago ago

                        Of course, culture and technical design are both important for any language, but be specific. Despite the prevalence of tools that improve C's safety, writing C safely generally requires a culture of using those tools and other techniques. For better or worse, Rust's borrow checker is a clear demonstration of where Rust lies on the safety-freedom spectrum.

                  • vlovich123 a day ago ago

                    The low level abstraction was buggy because they forgot to free memory because they confused types, not because of mmap.

                    Thats completely orthogonal to the question and less likely in Rust because you would generally use an enum with Drop implemented for the interior of the variants to guarantee correct release.

                    And mmap is no more difficult to call in Rust nor more magically unsafe - that’s the FUD. The vast majority of Ghostty wouldn’t even need unsafe meaning the vast majority of code gets optimized more due to no aliasing being automatic everywhere and why the argument that “zig is safer than unsafe rust” is disingenuous about performance or safety of the overall program.

              • Zakis1 2 days ago ago

                [dead]

            • dnautics 2 days ago ago

              I don't know if this particular error would have been findable with zig-clr, but you don't need RAII. Errdefer/defer is enough, if you have an alogrithm checking your work.

              • hardwaresofton 2 days ago ago

                It’s not that you NEED RAII (or any other language abstraction), it’s that this case would have been avoided with that usage.

                Clearly, the current state of things was not enough.

        • aw1621107 2 days ago ago

          > I think the main part of Ghostty's design mentioned here that - as a Rust programmer - I think is probably a mistake is the choice to use a linked list. To me this looks exactly like it needs VecDeque, a circular buffer backed by a growable array type.

          This comment [0] by mitchellh on the corresponding lobste.rs submission discusses the choice of data structure a bit more:

          > Circular buffer is a pretty standard approach to this problem. I think it's what most terminal emulators do.

          > The reason I went with this doubly linked list approach with Ghostty is because architecturally it makes it easier for us to support some other features that either exist or are planned.

          > As an example of planned, one of the most upvoted feature requests is the ability for Ghostty to persist scroll back across relaunch (macOS built-in terminal does this and maybe iTerm2). By using a paged linked list architecture, we can take pages that no longer contain the active area (and therefore are read-only) and archive them off the IO thread during destroy when we need to prune scroll back. We don't need to ever worry that the IO thread might circle around and produce a read/write data race.

          > Or another example that we don't do yet, we can convert the format of scroll back history into a much more compressed form (maybe literally compressed memory using something like zstd) so we can trade off memory for cpu if users are willing to pay a [small, probably imperceptible] CPU time cost when you scroll up.

          [0]: https://lobste.rs/s/vlzg2m/finding_fixing_ghostty_s_largest_...

  • 2 days ago ago
    [deleted]
  • tk90 a day ago ago

    A couple weeks ago my Ghostty session crashed and found that it was using 40GB of RAM(!) - glad this was resolved!

  • Neywiny 2 days ago ago

    Edit: I'm getting a lot of down votes for this but nobody is saying why I'm wrong. If you think I'm wrong enough to down vote, please reply why.

    I don't understand why that is the preferred fix. I would have solved it other ways:

    1. When resizing the page, leave some flag of how it was allocated. This tagging is commonly done as the always 0 bits in size or address fields to save space.

    2. Since the pool is a known size of contiguous memory, check if the memory to be freed is within that range

    3. Make the size immutable. If you want to realloc, go for it, and have the memory manager handle that boundary for you.

    Both of those not only maintain functionality which seems to have been lost with the feature reduction but also are more future proof to any other changes in size.

    • bastawhiz 2 days ago ago

      I didn't downvote, but I suspect it's an easy answer: the fix was like four lines.

      At the end of the day, #1 and #3 both probably add a fairly significant amount of code and complexity that it's not clear to me adds robustness or clarity. From the fix:

      ``` // If our first node has non-standard memory size, we can't reuse // it. This is because our initBuf below would change the underlying // memory length which would break our memory free outside the pool. // It is easiest in this case to prune the node. ```

      https://github.com/ghostty-org/ghostty/commit/17da13840dc71b...

      #3, it seems, would require making a broader change. The size effectively is immutable now (assuming I'm understanding your comment correctly): non-standard pages never change size, they get discarded without trying to change their size.

      #2 is interesting, but I think it won't work because the implementation of MemoryPool doesn't seem like it would make it easy to test ownership:

      https://github.com/ghostty-org/ghostty/blob/17da13840dc71ba3...

      You'd have to make some changes to be able to check the arena buffers, and that check would be far slower than the simple comparison.

      • Neywiny 2 days ago ago

        Thank you. I think each of my options are pretty trivial in C. I guess what I'm not understanding for #3 is if size is immutable, how the size changed which caused the issue? The post said they changed the size of the page without changing the underlying size of the allocated memory. To me this is the big issue. There was a desync in information where the underlying assumption is that size tells you where the data came from and that the size of the metadata and the size of the allocation move in tandem across that boundary.

        #1 and #2 are fixes for breaking that implicit trust. #1 still trusts the metadata, #2 is what I'd consider the most robust solution is that not only is it ideally trivial (just compare if a pointer is within a range, assuming zig can do that) but it doesn't rely on metadata being correct. #3 prevents the desync.

        I really don't understand the code base enough to say definitively that my ways work, which is I guess what I'm really looking for feedback on. Looking at the memorypool, I think you're right that my assumption of it being a simple contiguous array was incorrect.

        ETA: I think I'm actually very wrong for #2. Color me surprised that the zig memory pool allocated each item separately instead of as one big block. Feels like a waste, but I'm sure they have their reasons. That's addCapacity in memory_pool.zig

        • bastawhiz 2 days ago ago

          I'm not 100%, but my understanding was that the non standard pages are always larger than the standard pages. If you need more than a standard page, you always get a freshly allocated non standard page. But when one was released, it was being treated as though it was standard sized. The pool would then reuse that memory, but only at a standard size. So every released non standard page leaked the difference between what was allocated and what was standard.

          Which is to say, I don't think it was actually being resized. I think it was the metadata for the page saying it had the (incorrect) standard size (and the incorrect handling after the metadata was changed).

          • Neywiny 2 days ago ago

            Yes that last point was what I meant. I see no reason that the metadata's size field should get updated without some realloc of the memory it points to. I think I'll need to look into the actual code to see what's going on there, though, because we may both be misunderstanding. It just seems very error prone to categorize how you free memory based on a field that by the time you get to `free` has no guaranteed relationship with where the memory came from. I think that should be fixed. What was done in the blog is more of a band-aid imo.

    • hotpotat 2 days ago ago

      I upvoted you because I would like to know the response to these approaches

      • Neywiny 2 days ago ago

        Thank you. Sometimes I get to like -4 or even -7 before it starts going up. It might be nice to graph it at some point to see my most varied comments. I'm at -2 right now

        23 minutes later I'm at +2

        6 minutes after, +5 +4min now +6, another 20 minutes +8. I think I'm in the clear

        • yakaccount4 2 days ago ago

          I just stopped caring about votes. It's often driven by inertia, and it can't differentiate a vote from someone who doesn't know anything vs a domain expert. Life is better once you stop caring about karma points.

          • Neywiny 2 days ago ago

            While very true and sound advice, fake internet points make dopamine go brrrrr

        • 2 days ago ago
          [deleted]
    • 2 days ago ago
      [deleted]
  • 2 days ago ago
    [deleted]
  • bschwarz a day ago ago

    How little guidance can you give Claude Code to a) find and b) fix this memory leak? Summoning @simonw

  • hotpotat 2 days ago ago

    speaking of claude code in Ghostty, I’ve noticed I can’t drag and drop images into the prompt when the session is within a tmux pane. I miss that, coming from the mac terminal app, which allowed me to do so. I’d be willing to look into this myself, but mention it in case someone already knows where to start looking.

    • 2 days ago ago
      [deleted]
  • liveoneggs 2 days ago ago

    claude code also has a weird thing in ghostty where it breaks copy-paste after exiting. `reset` fixes it but it's annoying

  • 2 days ago ago
    [deleted]
  • cyh555 2 days ago ago

    I wonder how a Rust-based terminal implements this without sacrificing performance.

    • 0xbrayo 2 days ago ago

      Someone check out alacritty source code and answer it for us

      • rrgok 2 days ago ago

        Ask Claude Code about it using Ghostty without this fix ;)

  • sean_pedersen 2 days ago ago

    Would this kind of bug have been catched by the Rust compiler?

    • autarch 2 days ago ago

      I was wondering about this myself. My guess is no, since AFAIK the only way to do this sort manual memory management is to use unsafe code. But there's also things like the (bumpalo)[https://docs.rs/bumpalo/latest/bumpalo] crate in Rust, so maybe you wouldn't need to do this sort of thing by hand, in which case you're as leak-free as the bumpalo crate.

  • gfyhthgyrfg 2 days ago ago

    [dead]

  • LgWoodenBadger 2 days ago ago

    The contrast between the attitude here https://news.ycombinator.com/item?id=46461860 and in this story is a bit wacky to me.

    • mitchellh 2 days ago ago

      What contrast? I stand by what I said there. I just re-read every point and I would say the same thing today and I don't think my blog post contradicts any of that?

      A user came along and provided a reliable reproduction for me (last night) that allowed me to find and fix the issue. Simultaneously they found the same thing and produced a similar fix, which also helped validate both our approaches. So, we were able to move forward. I said in the linked comment that I believed the leak existed, just couldn't find it.

      It also was fairly limited in impact. As far as Ghostty bugs go, the number of upvotes the bug report had (9) is very small. The "largest" in the title is with regards to the size of the leak in bytes, not the size of the leak in terms of reach.

      As extra data to support this, this bug has existed for at least 3 years (since the introduction of this data structure in Ghostty during the private beta). The first time I even heard about it in a way where I can confidently say it was this was maybe 3 or 4 months ago. It was extremely rare. I think the recent rise in popularity of Claude Code in particular was bringing this to the surface more often, but never to the point it rose to a massively reported issue.

      • 1a527dd5 2 days ago ago

        [flagged]

        • mitchellh 2 days ago ago

          Discussion upvotes, discussion activity, and Discord reorts. I read every discussion and have been doing this project specifically for a few years now. There is a stark difference between a widespread and common bug and something like this.

          Like I said, this bug has existed for 3 years at this point and Ghostty is likely used by hundreds of thousands if not a million+ people daily (we don't have any analytics at all but have some side signals based on terminal reports from 3rd party CLIs). Trust me when I say that when there is a widespread issue, we hear it MUCH more loudly. :)

        • dang 2 days ago ago

          Could you please follow the HN guidelines when posting here? They include "assume good faith." and "don't cross-examine".

          https://news.ycombinator.com/newsguidelines.html

          • 1a527dd5 2 days ago ago

            Will do my best :)

            • dang 2 days ago ago

              Appreciated!

        • masklinn 2 days ago ago

          Dupes are not deleted, you can just search for them and see that there are not that many of those, and that's with this not being the only unsolved memory leak (https://github.com/ghostty-org/ghostty/discussions/9314 is a different one).

    • masklinn 2 days ago ago

      Not really? In your link TFAA was saying they were convinced an issue existed but the number of impacted users was limited, no maintainer experienced the issue, and they had no reproducer. As of yesterday TFAA still had no working reproducer: https://github.com/ghostty-org/ghostty/discussions/9962#disc...

      In the meantime they apparently got one (edit: per their sibling comment they got it yesterday evening) and were finally able to figure out the issue.

      edit: https://github.com/ghostty-org/ghostty/discussions/10244 is where it was cracked.

    • resonious 2 days ago ago

      I think there's only a perceptible "attitude" difference if you are fired up by the fact that they are conservative about using the "issues" tab.

    • lateral_cloud 2 days ago ago

      There are some really strange people on HN.

      • esseph 2 days ago ago

        I sure hope so!

    • darkteflon 2 days ago ago

      Super weird take. Why treat the guy as if he’s a bad actor? All of the public evidence shows good faith on this issue and on the project in general. We’ve also had a clear explanation of why discussion precedes issue creation.

    • dang 2 days ago ago

      "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

      https://news.ycombinator.com/newsguidelines.html

    • txdv 2 days ago ago

      Only contrast I see is that he thought it was much more of a corner case which turned out to be not that true anymore since everyone started using claude code.

    • IshKebab 2 days ago ago

      Presumably that discussion is the reason this was fixed. Very bizarre bug tracking policy IMO.

  • vegabook 2 days ago ago

    [flagged]

  • rvz 2 days ago ago

    [flagged]

  • cyberax 2 days ago ago

    Ugh. Is it just me, or is anyone else feeling a tad uncomfortable that their terminal app needs a custom memory allocator that mucks with low-level page tags?

    • flumpcakes 2 days ago ago

      I am not sure on what your commented is based on, but in short: No? High performance software needs to deal with memory, and optimisations often will need some kind of direct control - as in this example where re-using memory is more performant than constantly churning with mmap.

      • sequin 2 days ago ago

        I honestly don't understand why a terminal emulator needs to be performant. Seems like peak bikeshedding to me.

        • yoyohello13 2 days ago ago

          A lot of developers use the terminal as their primary interaction with the computer. Nvim, tmux, etc. Having it be fast is an extreme quality of life improvement. For devs who only ever use the terminal integrated into their ide then it’s probably less important.

          • cbmuser 2 days ago ago

            Can you elaborate on that a bit, please?

            I have never found myself in the situation where my terminal emulator would be too slow and I‘m using it for the majority of my day-to-day work.

            I honestly never ran into a situation where I would habe blamed the terminal emulator for being too slow.

            • dkdcio a day ago ago

              not the same person but in the flow of doing things those little pauses (tens of milliseconds) do matter. I open/close nvim (and less-so tmux) a ton, and run lots of commands per day. I don’t want to wait

              and once you get used to things being that fast, it’s hard to go back (analogous to what people say about high-refresh screens/monitors)

              all that said the speed of the default mac terminal (and other emulators I tried) was always fine for me, performance was not why I switched to Ghostty

            • barnabee a day ago ago

              I think this kind of thing just bothers some people and not others.

              I first started to understand and notice update rates and responsiveness as a gamer playing 1st person shooters.

              I hate (ok, I find it a bit jarring) the jerky scrolling of a phone in battery save mode limited to 60(?) FPS. It’s so obviously not connected to your touch anymore.

              In terminals it’s things like the responsiveness fuzzy finders and scrolling that I really notice.

              I turn off animations everywhere I can.

              It’s not impossible to use something slower, but when everything feels instant it’s just much more pleasant, smoother, and feels more productive as a result of the computer working at whatever speed my brain does.

        • drob518 2 days ago ago

          Well, perhaps “performant” isn’t the word you should be using. All code should be performant, where performant is defined as performing at an acceptable level. You might be tempted to then ask if it needs to be ultra high performance? That’s a better question but still off the mark. The correct question is whether YOU need an ultra high performance terminal emulator? If you don’t, you’re free to not use it. I haven’t found a need for it myself, for instance, and I still use the vanilla MacOS term. But that doesn’t mean someone else hasn’t wanted a faster term than the MacOS term and I wouldn’t throw shade on them for scratching that itch, even if I don’t share it.

        • RickHull 2 days ago ago

          https://ghostty.org/docs/about

          > Ghostty is a terminal emulator that differentiates itself by being fast, feature-rich, and native. While there are many excellent terminal emulators available, they all force you to choose between speed, features, or native UIs. Ghostty provides all three.

          > In all categories, I am not trying to claim that Ghostty is the best (i.e. the fastest, most feature-rich, or most native). But when I set out to create Ghostty, I felt all terminals made you choose at most two of these categories. I wanted to create a terminal that was competitive in all three categories and I believe Ghostty achieves that goal.

          > Before diving into the details, I also want to note that Ghostty is a passion project started by Mitchell Hashimoto (that's me!). It's something I work on in my free time and is a labor of love. Please don't forget this when interacting with the project. I'm doing my best to make something great along with the lovely contributors, but it's not a full-time job for any of us.

        • homebrewer 2 days ago ago

          Scrolling and searching through megabytes of output is often useful. Sometimes you don't expect it and can't prepare for it in advance.

        • usertty 2 days ago ago

          I also didnt get it until I tried ghostty and saw the results of the command appear before even taking my finger off the enter key

        • syntheticnature 2 days ago ago

          You've missed all the posts where people complain about a terminal emulator taking 1ms longer to respond to a keystroke than their preferred one, haven't you?

    • yoyohello13 2 days ago ago

      Frankly, I wish more software prioritized performance this much.

    • speed_spread a day ago ago

      You're not alone. Correctness first. Such complicated schemes should be backed with repeatable benchmarks so that their purported gains can be challenged later by simpler techniques. Too often clever optimizations with marginal gains make it to production and become maintenance liabilities.

  • llmslave3 2 days ago ago

    I hate to say it, but this probably would not have happened in a garbage collected language.

    GC languages are fast these days. If you don't want a runtime like C# (which has excellent performance) a language like Go would have worked just fine here, compiling to a small native binary but with a GC.

    I don't really understand the aversion to GC's. In memory constrained scenarios or where performance is an absolute top priority, I understand wanting manual control. But that seems like a very rare scenario in user space.

    • surajrmal 2 days ago ago

      Why do you think trippling the memory usage of a program is an acceptable tradeoff? It's not just GC pauses that are problematic with gc languages. Some software wants to run on systems with less than 4GiB of RAM.

    • p-e-w 2 days ago ago

      I agree that garbage collection is fine and Go indeed has an amazing garbage collector. Unfortunately, it also has the worst type system of all mainstream languages created in the 21st century, so the benefits are rarely worth the drawbacks.

      • llmslave3 2 days ago ago

        Go's type system is fine. This kind of comment is just pointless and goes against HN rules.

  • ComputerGuru 2 days ago ago

    The number of people here on HN gaslighting those that said they ran into this bug an challenging them to prove it was real..

    • mariusor 2 days ago ago

      As you could see from TFA, getting a reliable reproduction case was the tricky part of fixing this bug, so "asking to prove it's real" is just a mean way of saying asking for reproduction steps, not gaslighting.

  • gethly 2 days ago ago

    Should have used Odin instead of Zig.

    • tialaramex 2 days ago ago

      How exactly would using a different unfinished programming language have helped?

  • KaoruAoiShiho 2 days ago ago

    What's the best claude code terminal? I'm not sure if ghostty is it, which one can sync to iphone / android tablet for remote use of the same session?

    • surajrmal a day ago ago

      Sharing a session is independent of the terminal emulator itself. Use tmux for that. There are a handful of good terminal emulators. Weztern, alacritty, and kitty are popular. I use. Tiling window manager so I prefer to avoid tabs and use alacritty for that reason.