Shrinking While Linking

(tweag.io)

42 points | by ingve 6 days ago ago

25 comments

  • lionkor a day ago ago

    > Back in the day, your compiler or assembler would turn each source file into an “object” file containing the compiled code.

    Lots and lots of code is still C and C++. That's not really "back in the day".

    • dcminter a day ago ago

      I'd be very surprised if the C/C++ code being written today was comparable in volume to the collective pile of JavaScript, Java, C#, Go, Python, and PHP.

      Contrast to the heyday of C and C++ when not much else got a look in (Pascal perhaps? Perl probably...)

      I think it's fair.

      • kragen a day ago ago

        Yes, surely much more code is being written in other languages than in C, but probably more C is being written now than ever before, too.

        • dcminter a day ago ago

          Sure, but I think that "back in the day your compiler would ..." is a reasonable characterisation.

          Back in the day, yes, your compiler almost certainly would do that. Now, your compiler might possibly do that. For most working stiffs it won't though.

          Ok, my side of the shed's painted now :D

          • kragen 19 hours ago ago

            I agree.

      • lionkor 20 hours ago ago

        C++ is still wildly popular and will remain so for the foreseeable future. Your JavaScript interpreter/JIT compiler is written in it, so is your Java JVM, and I don't need to mention CPython in more detail.

        Those all get compiled into object files and then linked.

        • dcminter 14 hours ago ago

          But not by most devs. Back in the day it was true for most devs.

      • happyweasel a day ago ago

        hmm how can I reuse this useful Go library in python... Oh I can't.. hmm and how can I reuse this useful java library in php ? Oh I can't. Oh and which of the programming languages you mentioned can and do use C libraries? All of them.

        Reminds me of that coworker who thought that OpenCV was basically written in python.

        • dcminter a day ago ago

          I'm not claiming that there is no C or C++ out there. But it's such a nit pick when for most developers, no, their day-to-day work absolutely does not involve the creation of object files.

          Sure, akshuwally, there are still C and C++ devs out there. Meanwhile a friend has just embarked upon a career as a pro COBOL developer. What of it?

          Edit: Also, in the spirit of akshewally, I have just googled up this monster! My word, PHP and Java AND XML... it's like the unholy trinity of HackerNewsbane... https://php-java-bridge.sourceforge.net/pjb/

        • zigzag312 13 hours ago ago

          NativeAOT compiled C# library can create C compatible exports which can be used with any language that supports C libraries.

  • nextaccountic 21 hours ago ago

    The real problem are generics. They cause a blowup in size because they are instantiated at every type combination that is actually called. This can sometimes be a performance boost (typically not because it avoids an indirect call, but because it enables optimizations such as inlining). But it can also makes code slower (not by much), or have little effect on performance

    Rust lets you choose between generics and trait objects, but this is a viral change that sometimes means that large sections of code must be rewritten. There is also an optimization that turns generics into virtual dispatch if deemed beneficial, but I'm not sure how well it works

  • kragen a day ago ago

    These are extremely practical tips on using binutils to shrink your libraries. It might also be worthwhile to compile the library without per-function sections in the first place, which hopefully can be done without patching rustc. On many platforms this produces significantly smaller code in GCC.

    • jneem 20 hours ago ago

      I had a quick look and couldn't find a way to turn off per-function sections in rustc. But I think it's a pretty good default for rust, because the unit of compilation is pretty large and so it's common to have a lot of unused functions. It's really only a problem for distributing static libraries, since binaries and shared libraries already lose their per-function sections.

      • kragen 19 hours ago ago

        I would rather say it's really only potentially beneficial for distributing static libraries, since binaries and shared libraries already lose their per-function sections. So they pay the cost of the fluffier code without getting much benefit, if any.

  • Joker_vD a day ago ago

    > A static library is nothing but a bundle of object files, wrapped in an ancient and never-quite-standardized archive format.

    To this day I'm astonished that it's not just tar. Or pax. Or even cpio! Or literally any file format that has any other use.

    • sureglymop a day ago ago

      Wasn't there first 'ar' as a general purpose archiver and later 'tar', "tape archiver"?

      Why do you find it surprising that the archive format from that time was used to archive a bunch of files?

      I wasn't alive but I'm pretty sure ar wasn't only used for this purpose in unix.

      • electroly 20 hours ago ago

        > Why do you find it surprising that the archive format from that time was used to archive a bunch of files?

        It's surprising because we still use it today, not because it was used at the time.

      • Joker_vD a day ago ago

        Because e.g. a.out didn't survive and was replaced? Several times, even?

        • sureglymop 15 hours ago ago

          That is a good point. Although I can only guess, it intuitively makes sense to me why the executable format would more quickly evolve than the archive format which "just works" still.

    • xyzzy_plugh a day ago ago

      tar/pax are kind of terrible formats. They are hard to implement correctly. I'm glad they are not used more often.

      cpio is pretty reasonable though.

      zip is actually pretty great and I've been growing increasingly fond of it over the years.

      • Joker_vD a day ago ago

        The thing is, there is always tar(1) even in the most basic of distributions. And everyone uses tar.gz's or .bz2's or whatever for distributing all kinds of things, so tar is pretty ubiquitous. But the moment you want to do some C development, or any binutils-related, nope, install and use ar(1) which is used for literally one single purpose and nothing else. Because reasons.

        • hyperman1 16 hours ago ago

          Im not sure how ar does it, but tar has no centralised directory. The only way to get file 100 is to walk trough the 99 files before. This kills random access speed.

      • yjftsjthsd-h 20 hours ago ago

        > tar/pax are kind of terrible formats. They are hard to implement correctly. I'm glad they are not used more often.

        I'll grant you "kind of terrible", but what's hard to correctly implement about tar? It's just a bunch of files concatenated together with a tiny chunk of metadata stuck on the front of each.

        • electroly 19 hours ago ago

          Having never done it myself, I don't know, but I do know that the "microtar" library I picked up off GitHub is buggy when expanding GNU Tar archives but perfect when expanding its own archives. Correctly creating one valid archive is a lot easier than reliably extracting all valid archives. The code appeared competent, I assume tar just has a bunch of historical baggage that you can get wrong or fail to implement.