Make zip files smaller with zip shrinker

(evanhahn.com)

61 points | by zdw 3 days ago ago

43 comments

  • lifthrasiir 19 hours ago ago

    While not very popular, ECT [1] is (still?) the best solution in this space and has been my go-to tool for this purpose.

    [1] https://github.com/fhanau/Efficient-Compression-Tool

    • idoubtit 13 hours ago ago

      I had no heard of ECT, but I'm not impressed. I've just benchmarked it against two others PNG optimizers, and here are the file sizes for default and max levels:

          1985457 oxipng-o6.png
          2030036 oxipng-o2.png
          2125459 ect-o9.png
          2144598 ect-o3.png
          2169351 optipng-o7.png
          2215086 optipng-o2.png
          2218326 original.png
      
          oxipng 9.1.5
          OptiPNG version 7.9.1
          Efficient Compression Tool Version 0.9.5
      
      BTW, I could not compile ECT on my Linux system, because its CMake config was too old. I used the Windows release through Wine, but it shouldn't change the results above.

      I tried to apply ECT to a few .gz files, but it complained it was not compatible, and I did not dig further.

      [edited for a typo s/I/it/]

    • futune 19 hours ago ago

      I use ect on a monthly basis, at least. Especially for png files. It's pretty great!

      • zamadatix 15 hours ago ago

        Yeah, for how well it does with PNGs it really doesn't get nearly as much attention as the other tools for the same do.

    • useyourloaf 15 hours ago ago

      Thank you for the pointer!

  • Wowfunhappy 15 hours ago ago

    Obviously, the purpose of this tool isn't to preserve 100% compatibility. Things like removing empty directories makes that clear.

    But, why would you remove comments? Presumably, if those are there, they were added for a specific reason. And the author acknowledges the space savings are minimal.

    • gwbas1c 14 hours ago ago

      > Things like removing empty directories makes that clear.

      I hope that's disabled by default. Something like: "turning this option on may reduce file size by a small percent, but could impact compatibility."

      I suspect the option will be much more useful with file formats that are zip under the hood, where it's easier to test the small subset of applications that read those files and/or update the file specification.

  • MrDOS 13 hours ago ago

    Ken Silverman (of Build Engine fame) has written a few deflate-centric compression utilities[0]. The PNGOUT recompressor is the most famous of these (and is very good – it practically always beats OptiPNG), but the suite also includes a .zip archive recompressor called KZIP. I'd be curious to see how ZIP Shrinker compares to this tool.

    [0]: https://advsys.net/ken/utils.htm

  • akx 21 hours ago ago

    > Typically, other archives like .tar.bz2 can be smaller. But those aren’t backwards-compatible!

    Is there any point for (new) .bz2 archives in the era of Zstd?

    • j16sdiz 20 hours ago ago

      Tooling ?

      It takes years for bzip2 be in every Linux Distro, and we _still_ doing gzip.

      LZMA / xz tool are start to get more support, but they are nowhere near universal.

      No idea when how long zstd will need.

      • strenholme 18 hours ago ago

        xz is pretty universal across POSIX and clones though. It comes with any modern Linux distro, Busybox even has an .xz decompressor, so `tar xvJF file.tar.xz` does the right thing in *NIX land, which I presume includes MacOS with Brew.

        For Windows systems, 7-zip (.7z, similar compression to .xz) is a free download for Windows 10, and Windows 11 can open up a .7z file with a simple double click.

        .zip and .gz no longer need to be used here in 2026.

        • Dwedit 14 hours ago ago

          GitHub won't let you upload a 7z file as an attachment for the issue tracker. Thus forcing me to use an inferior and obsolete compression format.

        • lstodd 17 hours ago ago

          .zip is used as a seekable container with some compression. There is no replacement comparable in simplicity. 7z is overcomplicated, compressed tar is not seekable.

          .gz/deflate is used when something very cheap and very fast is needed. xz/lzma is quite often too slow or requires too much memory even on decompression.

          so no, .zip and .gz are very much needed in 2026.

          • adapiz 16 hours ago ago

            Compared to xz and even parallel xz, gzip and parallel gzip are just better if speed is more important. The compression is not superior but already good if you consider just the uncompressed data. For long term storage, it makes sense, to invest the extra time for better compression but if it's about transfer time, you might end up with a overall longer processing time instead of just a longer transfer time because of a worse compression ratio. It's like with image formats: Pick the right one for your use case.

            • MrDrMcCoy 10 hours ago ago

              If you add zstd to the comparison matrix, it wins on both speed and compression ratio. Its adoption is quickly catching up to xz as a result, and I expect it to approach gzip in availability in a few years.

        • jgalt212 15 hours ago ago

          gzip is very fast, universally supported, and good enough. It will be around for ever.

          you need python 3.14 for zstd.

    • Am4TIfIsER0ppos 19 hours ago ago

      Debian? Did they discover it yet?

      • sigio 17 hours ago ago

        I think it's been in since debian 11... at least 12, it's been in my default ansible playbooks for a while.

  • luzifer42 6 hours ago ago

    I made once a maven plugin which reprocesses jar files. It allows to remove extra content such as comments and directories. In addition, it handles nested zip files to increase their compress-ability. And all the features can be toggled individually.

    https://luccappellaro.github.io/2015/03/01/ZopfliMaven.html

  • jurgenkesker 19 hours ago ago

    APKs need to be zipaligned, I don't see that mentioned.

  • KerrickStaley 14 hours ago ago

    You can also make ZIP files smaller by switching the compression from Deflate to Zstandard. In the one case I tried this, this resulted in a 60% file size decrease [1]. Unfortunately Info-ZIP which provides the unzip command hasn't had a release in 18 years, so it doesn't support this newer compression/decompression method. You have to use 7-Zip instead.

    [1] https://github.com/UKGovernmentBEIS/inspect_ai/pull/3145

    • idoubtit 13 hours ago ago

      What is the open standard?

      As far as I know, the ISO standard for zip only specifies two compression methods: "store" (no compression) and "deflate". If I follow that, when I create a zip file, I know it's not performant, but at least it's almost universal (except for file ownership, permissions, character encoding and anything modern).

      The corporate PKWARE has added other compressions to their original zip software, but those are not in the standard. They will not work for an EPUB, a LibreOffice file, etc. If I want a good compression, I reach for zstd (often through `tar`) or 7z if I want more portability.

    • Dwedit 14 hours ago ago

      Then it's not a zip file anymore.

      Just like if you modified PNG files to use zstandard instead of deflate, but otherwise be identical, it's still not a PNG file anymore.

      • tiagod 14 hours ago ago

        That's not true. Zip files have supported other compression algorithms since the late 90s.

      • giancarlostoro 14 hours ago ago

        I guess its PNG v2 then? ;)

  • billpg 16 hours ago ago

    Do any formats using ZIP as the underlying format use ZIP comments for metadata? Unless there's a lot of compressors leaving "Zip file generated by MySuperZipper™" then I imagine any comments left were probably done for a good reason.

    • ebolyen 15 hours ago ago

      I'm not aware of any, but it wouldn't be insane to build a seekable deflate implementation by defining offsets in a zip comment. This would leave the zip file backwards compatible to usual decompression while allowing internal seeking within an individual file if the decompressor was aware of this index.

      • mxmlnkn 14 hours ago ago

        For seekable gzip indexes in zip, there SOZip: https://github.com/sozip/sozip-spec . However, it stores the indexes as files succeeding the actual file entry. To hide these index files and avoid extraction, they are not listed in the central directory, but a linear scan of the local headers, which some wrongly-behaved ZIP tools do, or which might be necessary for recovering broken ZIP files, would find those hidden indexes.

  • seritools 17 hours ago ago

    > This has the side effect of removing empty directories

    yeah, this will inevitably break things. excluding those from the directory stripping shouldn't be too hard (TM)

  • Sweepi 13 hours ago ago

    so this tool: - Strips away comments, metadata and directories(!!) - re-compresses the data with deflate (on presumably higher setting)

    makes me feel uneasy that sth. which does lossy compression(metadata is lost) is called "ZIP Shrinker". Hope nobody gets surprised by this.

    The real solution is to use lzma(2).

  • etrez 11 hours ago ago

    Cool project! Now, Zip-Ada's ReZip does much better, even if you stick with the Deflate compression scheme. For Zip archives, you have more compression schemes available (BZip2, LZMA, ...) and even much better results.

  • ChrisNorstrom 17 hours ago ago

    I know you meant well but...

    "It deletes empty folders" and "Let me know if this is a problem for you"

    NEVER DO THAT. I know you meant well, but the first rule of any program is to NEVER automatically delete something without informing the user. NEVER. Users keep empty folders for structure, reminders, or placeholders because software will dump files into it later when it's run. If it was there when they zipped it up, it should be there when they unzip it. Otherwise they'll check the before and after and it will show some folders missing, create confusion, and the user will run off trying to find out if anything else is missing.

    Example: A user zips up a program. Some programs are coded to look for a folder and dump files into it, if the folder is missing the program will fail. I've had that occasionally over the years. Not all programs will recreate a missing folder.

    • Ekaros 17 hours ago ago

      One thing I dislike about git is that it really does not support empty folders well. Even though they might make sense lot of time. Either now or for future. There is decent reasons to have empty folders.

      • svth 16 hours ago ago

        I just work around it with a .gitkeep file.

        • ebolyen 15 hours ago ago

          Seems we need a .zipkeep file then.

          Just kidding, I don't see how the overhead of the directory entry is even remotely enough to warrant removal. Most of the magic can be left to efficient DEFLATE compatible blocks and removing entries not in the central directory in the first place (ZIP files can support concatenation of new data so long as you re-write the central directory at the end of the file).

    • sumtechguy 14 hours ago ago

      Yeah that probably should just be an option. Basically the default is to least mangle the zip file. Where the most extreme is turned on by flags. One of those could be 'remove empty folders'.

  • stuaxo 19 hours ago ago

    Nice, interesting to see if it helps docx much.