Unrecognized Escape Sequence

An increasingly grumpy blog about software engineering

Improving your life with pre-commit

What is pre-commit and how can it help me?

If you’re not aware of pre-commit, it’s one of a variety of tools that you can use to run a series of tests and checks on your code at the point of committing to git. From the pre-commit site:

[pre-commit] is a multi-language package manager for pre-commit hooks. You specify a list of hooks you want and pre-commit manages the installation and execution of any hook written in any language before every commit

To paraphrase, pre-commit offers you a way of cleaning and standardising commits to your codebase before they actually get into a repository. You can banish forever those commits with messages like “Fix formatting”, “Apply coding standard” or “Run linter”. Now these mistakes can be automatically prevented before the commit is made.

Not bad right? What’s more, if you want to go to the next level you can even define a coding style and have pre-commit enforce it. Of course this is somewhat more of a minefield given the politics around coding styles, but used correctly, this can really, really help improve the cleanliness/consistency of your codebase.

I’ve turned some notes I took when testing pre-commit into a short tutorial.

Getting started with pre-commit

Here’s an walkthrough getting pre-commit up and running on a git repository. Note you will need Python3 and pip3 installed.

  1. Installing pre-commit locally

    As simple as:

    $ sudo pip3 install pre-commit
    
  2. Configuring pre-commit

    In each of my git repositories, setup the pre-commit hook, and create a sample configuration file:

    $ pre-commit install
    $ pre-commit sample-config > .pre-commit-config.yaml
    
  3. The test run

    Now we can run pre-commit and see what it thinks of the code in the repo:

    $ pre-commit run --all-files
    [INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
    [INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
    [INFO] Once installed this environment will be reused.
    [INFO] This may take a few minutes...
    Trim Trailing Whitespace.................................................Failed
    hookid: trailing-whitespace
    
    Files were modified by this hook. Additional output:
    
    Fixing <filename not shown>
    Fixing <filename not shown>
    Fixing <filename not shown>
    Fixing <filename not shown>
    
    Fix End of Files.........................................................Failed
    hookid: end-of-file-fixer
    
    Files were modified by this hook. Additional output:
    
    Fixing <filename not shown>
    Fixing <filename not shown>
    Fixing <filename not shown>
    Fixing <filename not shown>
    Fixing <filename not shown>
    
    Check Yaml...........................................(no files to check)Skipped
    Check for added large files..............................................Passed
    
    

    Yikes! So many issues. Thankfully they’re relatively small-fry. But then all I’m running here is the default sample set of linters. As you can see, what’s being caught here are largely whitespace issues: I’ve left trailing whitespace on some lines, and in some cases I’m not ending my files with exactly one line break (a Posix standard if you weren’t aware).

    And as a good linter should these simple, unambiguous things have been fixed for me. I just need to re-stage the files and re-commit.

    Note that I ran this command manually for the sake of an example, but the real point of pre-commit is that you don’t actually need to run this command. As a git hook, it is run automatically whenever I commit to this repo. Before the actual commit happens. You know, “pre-commit”.

  4. Adding some more useful tools

    Next let’s modify the .pre-commit-config.yaml for this repo, and add some more specific linters.

    The current, default config file looks like this:**

    # See https://pre-commit.com for more information
    # See https://pre-commit.com/hooks.html for more hooks
    repos:
    -   repo: https://github.com/pre-commit/pre-commit-hooks
        rev: v2.2.1
        hooks:
        -   id: trailing-whitespace
        -   id: end-of-file-fixer
        -   id: check-yaml
        -   id: check-added-large-files
    

    You can see the whitespace-based linters that made the fixes to my code earlier.

    The repo I’m working with is mostly Python code, with a handful of utility shell scripts, so let’s add some stuff to lint those files:

    # See https://pre-commit.com for more information
    # See https://pre-commit.com/hooks.html for more hooks
    repos:
    -   repo: https://github.com/pre-commit/pre-commit-hooks
        rev: v2.2.1
        hooks:
        - id: trailing-whitespace
        - id: end-of-file-fixer
        - id: check-yaml
        - id: check-added-large-files
        - id: check-symlinks
    
    -   repo: https://github.com/jumanjihouse/pre-commit-hooks
        rev: 1.11.0
        hooks:
        - id: shellcheck
        - id: shfmt
    
    -   repo: git://github.com/detailyang/pre-commit-shell
        rev: 1.0.4
        hooks:
        - id: shell-lint
    
    -   repo: https://github.com/pre-commit/mirrors-autopep8
        rev: v1.4.4
        hooks:
        - id: autopep8
    

    I’ve added a couple of shellscript linters, shellcheck, shfmt and shell-lint. I’ve also added a symlink checker as the repo may end up with a few of those knocking around, and god knows they can cause havoc when they go awry.

    More interesting is the autopep8 linter which checks my Python code conforms to the PEP8 standard.

    Now when I run pre-commit I get some more stuff going on:

    $ pre-commit run --all-files
    Trim Trailing Whitespace.................................................Passed
    Fix End of Files.........................................................Passed
    Check Yaml...............................................................Passed
    Check for added large files..............................................Passed
    Check for broken symlinks............................(no files to check)Skipped
    Test shell scripts with shellcheck.......................................Passed
    Check shell style with shfmt.............................................Failed
    hookid: shfmt
    
    [RUN] shfmt -l -i 2 -ci docs/regenerate-api-docs.sh
    [FAIL]
    
    docs/regenerate-api-docs.sh
    
    The above files have style errors.
    Use "shfmt -d" option to show diff.
    Use "shfmt -w" option to write (autocorrect).
    
    Shell Syntax Check.......................................................Passed
    autopep8.................................................................Failed
    hookid: autopep8
    
    Files were modified by this hook.
    

    At first I thought it might be overkill adding various shell script linters, but as you can see shellcheck passed, but shfmt failed. Unfortunately not all hooks work in the same way, and here I have to run the provided command to find and fix the exact problem.

    Next up, we can see that autopep8 has run and made some automatic fixes to my python code. Neat!

    There’s just one problem. PEP8 is a pretty good code standard to my mind, however I’ve always found that a couple of the rules are a bit aggressive. I’m not keen on the rather severe line length limit, so I usually disable that, but more of a problem here is that one of the changes autopep8 is making is breaking my unit tests.

    As a brief aside, in Python unit tests sometimes you want to mock a whole module before it gets imported. As a result you sometimes have some code to do this before the import. This breaks PEP8 rule E402.

    This seems a bit of a nasty problem, as now I have no way to commit my code without these PEP8 changes breaking my unit tests.

    So here’s my workaround for this. I’ve reconfigured the PEP8 linter to give my unit tests a little more slack than the production code, by running autopep8 in two diffent modes. The default (strict) settings are used on my production code, and then slightly relaxed settings are used for the files containing unit tests.

    -   repo: https://github.com/pre-commit/mirrors-autopep8
        rev: v1.4.4
        hooks:
        - id: autopep8
        name: autopep8-default
        exclude: .*/tests/.*
    
    -   repo: https://github.com/pre-commit/mirrors-autopep8
        rev: v1.4.4
        hooks:
        - id: autopep8
        name: autopep8-unit-tests
        args: ["-i", "--ignore=E226,E24,W50,W690,E402"]
    

    Notice that the first entry excludes the tests directory, and the second entry ignores some of the default PEP8 warnings.

    Now when I run pre-commit, we’re looking good:

    $ pre-commit run --all-files
    Trim Trailing Whitespace.................................................Passed
    Fix End of Files.........................................................Passed
    Check Yaml...............................................................Passed
    Check for added large files..............................................Passed
    Check for broken symlinks............................(no files to check)Skipped
    Test shell scripts with shellcheck.......................................Passed
    Check shell style with shfmt.............................................Passed
    Shell Syntax Check.......................................................Passed
    autopep8-default.........................................................Passed
    autopep8-unit-tests......................................................Passed
    
  5. Getting others involved

    After committing all these changes to the repo, the next step is to get other developers using it, and then finally to run pre-commit as part of the continuous integration build system (in my case simply add a call to pre-commit into my CI system on Jenkins).

Conclusion

Linters and other automatic means of checking your code should be an essential part of any software development process, but they are an easy thing to forget or to ignore. Using pre-commit is a simple and powerful way to improve the quality of your code. What’s not to like? The key is getting agreement across the team on which linters and rules to use, and evolving the configuration over time.

My next steps are to investigate what C++ hooks are available and then start running pre-commit on my C++ repos, and see what errors come up there. The real work begins…