Pre-commit hooks are broken

(jyn.dev)

79 points | by todsacerdoti 14 hours ago ago

72 comments

  • rurban 7 minutes ago ago

    Not fundamentally broken, just broken on certain use cases where'd I have to do something like

      prek uninstall; g rbc; prek install
    
    eg. (using the typical aliases)
  • nrclark 10 hours ago ago

    This was a really interesting read. I'd highly recommend it for anybody who's setting up (or currently maintains) a pre-commit workflow for their developers.

    I want to add one other note: in any large organization, some developers will use tools in ways nobody can predict. This includes Git. Don't try to force any particular workflow, including mandatory or automatically-enabled hooks.

    Instead, put what you want in an optional pre-push hook and also put it into an early CI/CD step for your pull request checker. You'll get the same end result but your fussiest developers will be happier.

    • Mic92 10 hours ago ago

      I can second that. If there are multiple commits: https://github.com/tummychow/git-absorb is handy to add formatting changes into the right commit after commits already happened.

      • oxryly1 an hour ago ago

        It looks like git absorb rewrites history. Doesn’t that break your previously pushed branch?

    • eru 9 hours ago ago

      > This includes Git. Don't try to force any particular workflow, including mandatory or automatically-enabled hooks.

      And with git, you can even make anything that happens on the dev machines mandatory.

      Anything you want to be mandatory needs to go into your CI. Pre-commit and pre-push hooks are just there to lower CI churn, not to guarantee anything.

      (With the exception of people accidentally pushing secrets. The CI is too late for that, and a pre-push hook is a good idea.)

      • darkwater 7 hours ago ago

        A good analogy is: git hooks are client-side validation; CI is server-side validation, aka the only validation you can trust.

      • normie3000 9 hours ago ago

        > with git, you can even make anything that happens on the dev machines mandatory

        s/can/can't?

    • PunchyHamster 7 hours ago ago

      > I want to add one other note: in any large organization, some developers will use tools in ways nobody can predict. This includes Git. Don't try to force any particular workflow, including mandatory or automatically-enabled hooks.

      you will save your org a lot of pain if you do force it, same as when you do force a formatting style rather than letting anyone do what they please.

      You can discuss to change it if some parts don't work but consistency lowers the failures, every time.

      • dxdm 7 hours ago ago

        Enforcement should live in CI. Into people's dev environments, you put opt-in "enablement" that makes work easier in most cases, and gets out of the way otherwise.

        • tyleo 6 hours ago ago

          Agreed, my company has some helper hooks they want folks to use which break certain workflows.

          We’re a game studio with less technical staff using git (art and design) so we use hooks to break some commands that folks usually mess up.

          Surprisingly most developers don’t know git well either and this saves them some pain too.

          The few power users who know what they’re doing just disable these hooks.

  • Simplita 10 hours ago ago

    I’ve seen similar issues once hooks start doing more than fast checks. The moment they become stateful or depend on external context, they stop being guardrails and start being a source of friction. In practice, keeping them boring and deterministic seems to matter more than catching everything early.

  • lemonlime227 9 hours ago ago

    To bring up jujutsu, `jj fix` (https://docs.jj-vcs.dev/latest/cli-reference/#jj-fix) is a more refined way of ensuring formatting in commits. It runs a formatting command with the diff in stdin and uses the results printed to stdout. It can simplify merges and rebases history to ensure all your commits remain formatted (so if you enable a new formatting option, it can remove the need for a special format/style fix commit in your mutable set). Hard to go back to pre-commit hooks after using jj fix (also hard to use git after using jj ;) ).

    • conradludgate 9 hours ago ago

      The downside currently (although I've been assured this will be fixed one day) is that it doesn't support running static analysis over each commit you want to fix.

      My git rebase workflow often involves running `git rebase -x "cargo clippy -- --deny=warnings"`. This needs a full checkout to work and not just a single file input

      • lemonlime227 5 hours ago ago

        Yeah, to add some context for people reading this, jj fix works best for edits local to the diff, and it’s meant for edits mostly. With some trickery you could run some analysis, but it’s not what jj fix is meant for right now.

        The intended future solution is `jj run` (https://docs.jj-vcs.dev/latest/design/run/), which applies similar ideas to more general commands.

    • dbt00 9 hours ago ago

      Came here to mention jj fix. It is a fundamentally more elegant way of doing things.

  • conradludgate 8 hours ago ago

    I've worked in several projects where running the tests locally automatically install pre-commit hooks and I've wanted to commit warcrimes because of it.

    Don't do that, just dont.

  • thomashabets2 9 hours ago ago

    I feel like I found better git commands for this, that don't have these problems. It's not perfect, sure, but works for me.

    The pre commit script (https://github.com/ThomasHabets/rustradio/blob/main/extra/pr...) triggers my executor which sets up the pre commit environment like so: https://github.com/ThomasHabets/rustradio/blob/main/tickbox/...

    I run this on every commit. Sure, I have probably gone overboard, but it has prevented problems, and I may be too picky about not having a broken HEAD. But if you want to contribute, you don't have to run any pre commit. It'll run on every PR too.

    I don't send myself PRs, so this works for me.

    Of course I always welcome suggestions and critique on how to improve my workflow.

    And least nothing is stateful (well, it caches build artefacts), and aside from "cargo deny" no external deps.

    • 000ooo000 9 hours ago ago

      Only a minor suggestion: git worktrees is a semi-recent addition that may be nicer than your git archive setup

  • 000ooo000 9 hours ago ago

    Your hook can't observe a simple env var, if you are stepping off the happy path of your workflow? E.g. `GIT_HOOK_BYEBYE` = early return in hook script. Article seems a little dramatic. If you write a pre-commit hook that is a pain in your own arse, how does that make them fundamentally broken?

  • a_t48 8 hours ago ago

    Running on the working tree is mostly okay - just `exit 1` if changes were made and allow the user to stage+commit new changes. It isn't perfect but it doesn't require checking out a new tree.

    • jynelson 6 hours ago ago

      this completely breaks `git add -p`.

  • tkzed49 10 hours ago ago

    Thank you. I don't need to "fix" a commit before it ends up on a remote branch. Sometimes I expect a commit to pass checks and sometimes I don't. Frankly, don't even run pre-push hooks. Just run the checks in CI when I push. You'd better be doing that anyway before I'm allowed to push to a production branch, so stop breaking my git workflows and save me the time of running duplicate checks locally.

    Also, if most developers are using one editor, configure that editor to run format and auto-fix lint errors. That probably cleans up the majority of unexpected CI failures.

    • eru 9 hours ago ago

      Pre-commit and pre-push hooks are something developers can voluntarily add (or enable) to shorten the latency until they get feedback: instead of the CI rejecting their PR, they can (optionally!) get a local message about it.

      Otherwise, I agree, your project can not rely on any checks running on the dev machine with git.

      • tkzed49 an hour ago ago

        Appreciate the perspective. I've worked on projects where hooks are auto-configured, and pre-commit is just never something that's going to agree with me.

        I prefer to be able to push instantly and get feedback async, because by the time I've decided I'm done with a change, I've already run the tests for it. And like I said, my editor is applying formatting and lints, so those fail more rarely.

        But, if your pre-push checks are fast (rather than ~minutes), I can see the utility! It sucks to get an async failure for feedback that can be delivered quickly.

      • PunchyHamster 7 hours ago ago

        In our case same hook is re-ran on server side; the pre-commit hook is purely to increase velocity

        ... and cos most people using git will have to take a second if the hook returns to them "hey, your third commit is incorrect, you forgot ticket number"

    • hbogert 8 hours ago ago

      I don't want roundtrips to my CI which easily takes a minute and pushes me to look at yet another window. Pre-commit hooks save me so much time.

  • badgersnake 9 hours ago ago

    Yep, all that and they’re also annoying. Version control tools are not supposed to argue - do what you’re told. If I messed up, the branch build will tell me.

    • thomashabets2 9 hours ago ago

      Is that the difference between forced pre commits vs opt in? I don't want to commit something that doesn't build. If nothing else it makes future bisects annoying.

      But if I intend to squash and merge, then who cares about intermediate state.

      • normie3000 9 hours ago ago

        > I don't want to commit something that doesn't build.

        This is a really interesting perspective. Personally I commit code that will fail the build multiple times per day. I only care that something builds at the point it gets merged to master.

        • hbogert 8 hours ago ago

          so basically, not adhering to atomic commits. That's fine if it's a deliberate choice, but some people like me think commits should stand on their own.

          (i'm assuming your are not squashing when merging, else it's pretty much the same workflow)

          • normie3000 5 hours ago ago

            > i'm assuming your are not squashing when merging, else it's pretty much the same workflow

            I AM squashing before merging. Pre-commit hooks run on any commit on any branch, AFAIK. In any serious repo I'd never be committing to master directly.

          • bawolff 8 hours ago ago

            Honestly, i find that a really weird view. I use (Local) commits for work in progress. I feel like insisting on atomic commits in your local checkout defeats the entire purpose of using a tool like git.

            What do you do when you are working on something and are forced to switch to working on something else in the middle of it?

            • thomashabets2 7 hours ago ago

              I'm merely the grandparent commenter, not the one you replied to directly, but I can tell you what I do for checkpointing some exploratory work or "I'll continue this next week".

              I usually put it on a branch, even if this project otherwise does all its development on the main branch. And I commit it without running precommits, and with a commit message prefix "WIP: ". If it's on a branch you can even push it to not lose work if your local machine breaks/is stolen.

              When it's time to get it into the main branch I rebase to squash commits into working ones.

              Now, if my final commit history of say 3 commits all actually build at each commit? For personal projects, no. Diminishing returns. But in a collaborative environment: How fun will it be for future you, or your team mates, to run bisect if half the commits don't even build?

              I have this workflow because it's so easy to add a feature, breaking 3 tests, to be fixed later. And formatting is bad. And now I add another change, and I just keep digging and one can end up in a "oh no, how did I end up here?" state where different binaries in the tree need to be synced to different commits to even build.

              > I feel like insisting on atomic commits in your local checkout defeats the entire purpose of using a tool like git.

              WIP commits is hardly the only benefit of git or other DVCS over things like subversion.

            • NekkoDroid 7 hours ago ago

              > What do you do when you are working on something and are forced to switch to working on something else in the middle of it?

              `git stash` is always an option :) but even if you want to commit it, you can always undo (or `--amend`) the commit when you get back to working. I personally am also a big fan of `git rebase -i` and all the things it allows me to fix up in the history before merging (rebasing) in to the main branch.

              • bawolff 6 hours ago ago

                All of those are things i would refer to as making a commit :)

            • hbogert 7 hours ago ago

              I interpreted the parents post as: as long as my combination of commits results in something working before getting merged, it's fine.

              Local wip commits didn't come to mind at all

              • bawolff 6 hours ago ago

                Well we are in a discussion about pre-commit hooks. Pre-commit hooks run on local wip commits.

                • thomashabets2 4 hours ago ago

                  Well, unless you inhibit them with `-n`. Which I would for WIP commits.

    • OJFord 7 hours ago ago

      The first step of which I usually have as pre-commit run --all-files (using the third-party tool of the same name as git feature) - so running locally automatically on changed files just gives me an early warning. It can be nice to run unit tests locally too, btw.

  • wolfi1 8 hours ago ago

    why do people rebase so often? shouldn't it be excluded from the usual workflows as you are losing commit history as well?

    • geon 8 hours ago ago

      To get a commit history that makes sense. It’s not supposed to document in what order you did the work, but why and how a change was made. when I’m knee deep in some rewrite and realize I should have changed something else first, I can just go do that change, then come back and rebase.

      And in the feature branches/merge requests, I don’t merge, only rebase. Rebasing should be the default workflow. Merging adds so many problems for no good reason.

      There are use cases for merging, but not as the normal workflow.

      • cluckindan 7 hours ago ago

        That is just not true. Merging is so much less work and the branch history clearly indicates when merging has happened.

        With rebasing, there could be a million times the branch was rebased and you would have no idea when and where something got broken by hasty conflict resolution.

        When conflicts happen, rebasing is equivalent to merging, just at the commit level instead of at branch level, so in the worst case, developers are met with conflict after conflict, which ends up being a confusing mental burden on less experienced devs and certainly a ”trust the process” kind of workflow for experienced ones as well.

        • geon 3 hours ago ago

          The master branch never gets merged, so it is linear. Finding a bug is very simple with bisect. All commits are atomic, so the failing commit clearly shows the bug.

          If you want to keep track of what commits belongs to a certain pr, you can still have an empty merge commit at the end of the rebase. Gitlab will add that for you automatically.

          The ”hasty conflict resolution ” makes a broken merge waaaay harder to fix than a broken rebase.

          And rebasing makes you take care of each conflict one commit at a time, which makes it order by magnitudes easier to get them right, compared to trying to resolve them all in a single merge commit.

          • cluckindan 2 hours ago ago

            Linear history is nice, but it is lacking the conflict resolutions. They are never committed, and neither are the ”fix rebase” instances.

            Having a ”fix broken merge” commit makes it explicit that there was an issue that was fixed.

            Rebase sometimes seems like an attempt at saving face.

            • geon 19 minutes ago ago

              That’s the whole point. You do it properly, so there IS no conflict.

    • mcny 8 hours ago ago

      I write really poopy commit messages. Think "WIP" type nonsense. I branch off of the trunk, even my branch name is poopy like

      feature/{first initial} {last initial} DONOTMERGE {yyyy-MM-dd-hh-mm-ss}

      Yes, the branch name literally says do not merge.

      I commit anything and everything. Build fails? I still commit. If there is a stopping point and I feel like I might want to come back to this point, I commit.

      I am violently against any pre commit hook that runs on all branches. What I do on my machine on my personal branch is none of your business.

      I create new branches early and often. I take upstream changes as they land on the trunk.

      Anyway, this long winded tale was to explain why I rebase. My commits aren't worth anything more than stopping points.

      At the end, I create a nice branch name and there is usually only one commit before code review.

      • rkomorn 8 hours ago ago

        Isn't your tale more about squashing than rebasing?

        • OJFord 7 hours ago ago

          Any subsequent commits and the branch are inherently rebased on the squashed commit.

          Rebasing is kind of a short hand for cherry-picking, fixing up, rewording, squashing, dropping, etc. because these things don't make sense in isolation.

          • rkomorn 7 hours ago ago

            I guess my point is that I disagree that rebasing should be shorthand for all these things that aren't rebasing.

        • bawolff 8 hours ago ago

          Personally i squash using git rebase -i

    • loglog 2 hours ago ago

      I don't want to see any irrelevant history several years later, so I enforce linear history on the main branch in all projects that I work on. So far, nobody complained, and I've never seen a legitimate reason to deviate from this principle if you follow a trunk based release model.

    • xen0 5 hours ago ago

      Your real commit history is irrelevant. I don't care too much about how you came to a particular state.

      The overall project history though, the clarity of changes made, and that bisecting reliably works are important to me.

      Or another way; the important unit is whatever your unit of code review is. If you're not reviewing and checking individual commits, they're just noise in the history; the commit messages are not clear and I cannot reliably bisect on them (since nobody is checking that things build).

    • hbogert 8 hours ago ago

      why would you lose commit history? You are just picking up a set of commits and reapplying them. Of course you can use rebase for more things, but rebase does not equal losing commit history.

    • UqWBcuFx6NV4r 8 hours ago ago

      I think that only the most absolutely puritan git workflows wouldn’t allow a local rebase.

    • bawolff 8 hours ago ago

      Because gerrit.

      But even if i wasn't using gerrit, sometimes its the easiest way to fix branches that are broken or restructure your work in a more clear way

    • marginalia_nu 8 hours ago ago

      The sum of the re-written changes still amount to the same after a rebase. When would you need access to the pre-rebase history, and to what end?

      • seba_dos1 4 hours ago ago

        Well, sometimes you do if you made a mistake, but that's already handled by the reflog.

    • nacozarina 8 hours ago ago

      really; keep reading about all the problems ppl have “every time I rebase” and I wonder what tomfoolery they’re really up to

      • seba_dos1 4 hours ago ago

        Unlike some other common operations that can be easily cargo-culted, rebasing is somewhat hard to do correctly when you don't understand git, so people who don't understand git get antagonistic towards it.

    • PunchyHamster 7 hours ago ago

      If it is something like repo for configuration management I can understand that because its often a lot of very small changes and so every second commit would be a merge, and it's just easier to read that way.

      ... for code, honestly no idea

  • burnt-resistor 6 hours ago ago

    A workflow that works well is one that takes the better ideas from Meta's "hg"+"arcanist"+edenfs+"phabricator" diff and land strategy. Git, by itself, is too low-level for shared, mostly single-source-of-truth yet distributed dev.

    Make test cases all green locally before pushing, but not in a way that interferes with pushing code and they shouldn't be tied to a particular (D)VCS. Allow uploading all of the separate proposed PRs you want in a proposed "for review" state. After a PR is signed-off and sent for merging, it goes into a linearizing single source of truth backed by an automated testing/smoke testing process before they land "auto-fast-forwarded" in a mostly uncontrolled manner that doesn't allow editing the history directly. Standardization and simplicity are good, and so is requiring peer review of code before it's accepted for existing, production, big systems.

    Disallow editing trunk/main/master and whenever there's merge conflict between PRs, manual rebasing of one or the other is required. Not a huge deal.

    Also, have structured OWNERS files that include people and/or distribution list(s) of people who own/support stuff. Furthermore, have a USERS file that keeps lists of people who would be affected by restarting/interrupting/changing a particular codebase/service for notification purposes too. In general, monorepo and allowing submitting code for any area by anyone are roughly good approaches.

  • nine_k 9 hours ago ago

    A bit less enraged: pre-commit hooks should be pure functions. They must not mutate the files being committed. At best, they should generate a report. At worst, they could reject a commit (e.g. if it contains a private key file included by mistake).

    • normie3000 9 hours ago ago

      > e.g. if it contains a private key file included by mistake

      Thanks - this is the first example of a pre-commit hook that I can see value in.

      • seba_dos1 4 hours ago ago

        Remember that such key will be copied into the repository on `git add` already and will stay there until garbage collected.

    • Ferret7446 8 hours ago ago

      In my experience pre-commit hooks are most often used to generate a starting commit message.

      To put it more bluntly, pre-commit hooks are pre-commit hooks, exactly what it says on the tin. Not linting hooks or checking hooks or content filters. Depending on what exactly you want to do, they may or may not be the best tool for the job.

      To put it even more bluntly, if you are trying to enforce proper formatting, pre-commit hooks are absolutely the wrong tool for the job, as hooks are trivially bypassable, and not shared when cloning a repo, by design.

  • Dunedan 7 hours ago ago

    The pre-commit framework [1] abstracts all these issues away and offers a bunch of other advantages as well.

    [1]: https://pre-commit.com/

    • jynelson 6 hours ago ago

      the pre-commit framework does not abstract away “hooks shouldn’t be run during a rebase”, nor “hooks should be fast and reliable”, nor “hooks should never change the index”.

      • Dunedan 3 hours ago ago

        Not sure how you got to that conclusion, as the pre-commit framework does indeed abstract them away. Maybe you're confusing it with something else?

        > hooks shouldn’t be run during a rebase

        The pre-commit framework doesn't run hooks during a rebase.

        > hooks should be fast and reliable

        The pre-commit framework does its best to make hooks faster (by running them in parallel if possible) and more reliable (by allowing the hook author to define an independent environment the hook runs in), however it's of course still important that the hooks themselves are properly implemented. Ultimately that's something the hook author has to solve, not the framework which runs them.

        > hooks should never change the index

        As I read it the author says hooks shouldn't change the working tree, but the index insteead and that's what the pre-commit framework does if hooks modify files.

        Personally I prefer configuring hooks so they just print a diff of what they would've changed and abort the commit, instead of letting them modify files during a commit.

        • jynelson 2 hours ago ago

          > Ultimately that's something the hook author has to solve, not the framework which runs them.

          correct. i'm saying that hook authors almost never do this right, and i'd rather they didn't even try and moved their checks to a pre-push hook instead.

  • odie5533 7 hours ago ago

    They are annoying to setup and maintain and contain footguns. I will still use them with prek though because they save dev cycles back-and-forth with CI more than they hurt. I aim to have the hooks complete in under 1 second total. If it saves even a single CI cycle, I think that's a win time wise.

  • PunchyHamster 7 hours ago ago

    This article is very much "you're holding it wrong"

    > They tell me I need to have "proper formatting" and "use consistent style". How rude.

    > Maybe I can write a pre-commit hook that checks that for me?

    git filter is made for that. It works. There are still caveats (it will format whole file so you might end up commiting changes that are formatting fixed of not your own code).

    Pre-commit is not for formatting your code. It's for checking whether commit is correct. Checking whether content has ticket ID, or whether the files pass even basic syntax validation

    > Only add checks that are fast and reliable. Checks that touch the network should never go in a hook. Checks that are slow and require an update-to-date build cache should never go in a hook. Checks that require credentials or a running local service should never go in a hook.

    If you can do that, great! If you can't (say it's something like CI/CD repo with a bunch of different language involved and not every dev have setup for everything to be checked locally), having to override it to not run twice a year is still preferable over committing not working code. We run local checks for stuff that make sense (checking YAML correctness, or decoding encrypted YAMLs with user key so they also get checked), but the ones that don't go remote. It's faster. few ms RTT don't matter when you can leverage big server CPU to run the checks faster

    Bonus points, it makes the pain point - interactive rebases - faster, because you can cache the output for a given file hash globally so existing commits during rebase take miliseconds to check at most

    > Don't set the hook up automatically. Whatever tool you use that promises to make this reliable is wrong. There is not a way to do this reliably, and the number of times it's broken on me is more than I can count. Please just add docs for how to set it up manually, prominantly featured in your CONTRIBUTING docs. (You do have contributing docs, right?)

    DO set it up automatically (or as much as possible. We have script that adds the hooks and sets the repo defaults we use). You don't want new developer to have to spend half a day setting up some git nonsense only to get it wrong. And once you change it, just rerun it

    Pre-push might address some of the pain points but it doesn't address the biggest - it puts the developer in a "git hole" if they have something wrong in commit, because while pre-commit will just... cancel the commit till dev fixes it, with pre-push they now need to dig out knowledge on how to edit or undo existing commits

    • seba_dos1 4 hours ago ago

      > they now need to dig out knowledge on how to edit or undo existing commits

      This knowledge is a crucial part of effective use of git every day, so if some junior dev has to learn it quick it's doing them a favor.