Practices for Software Projects

code ..... September 6, 2022

Later: 2024 Blogging Intentions
Earlier: Adding Tail Call Optimization to A Lisp Written in Go

What follows is a selection of practices which I find helpful when working with software projects, whether in public or in private, in solo or in collaborative work. Many of these practices are quite standard, but some are, at best, unevenly practiced by either commercial or academic software teams. Although they are especially important for collaborative work, I like to follow these practices on my own projects, both as a matter of professional habit, and because I find them beneficial. (This post is essentially an expanded version of a gist which I made a few months ago, and which I have been endeavoring to adhere to in my open source work ever since.)

Checklist

I’ll start with a checklist I use, then go into details for each item. I try to check these items for each repository/project I work on.

Are there unit tests?
- Are they maintained?
- Are they run often by developers…
- … and by builds that are triggered by every commit pushed to the group repository?
- Is test coverage sufficient, to the extent agreed by the team?
Is there a README for the project? Does it show:
- What the project is for?
- How to build it and run tests?
- An example snippet showing it in action?
- Links to further documentation, if any?
- A license / copyright information?
Are builds generally repeatable? Does firing the same build several times for a recent commit yield the same result each time?
Is an issue tracker being used to track bugs and planned work?
Is there any relevant documentation that can be auto-generated?
Is the software versioned?
- Does your application / library know what version it is?
- Is the process of updating the version and releasing the code automated?
Are the following steps fully automated and executable via Make?
- Running unit tests (make test … can be broken into fast and slow targets)
- Running code style checks and/or automated reformatting (make lint)
- Releasing the software, including building any needed artifacts (make release)
- Running all the tests in Docker, to ensure consistency with the automated build (make docker)

Automated Tests

Let’s start with the most important item. Writing and continuously running automated tests is always my first line of defense against defects. To quote Michael Feathers,

Code without tests is bad code. It doesn’t matter how well written it is; it doesn’t matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we don’t really know if our code is getting better or worse.

In what follows, “production code” means the code that actually does what your program is trying to do.

There’s a lot to be said about how to write tests, about the difference between unit tests and integration tests, about test frameworks, mocks, fakes, stubs, and so on. These discussions shade into ones about architectural patterns, object orientation, organization of software into layers, etc. To my view, these are all secondary points. The main point is, is our software safe to change? Only the presence, or absence, of tests, and the extent to which our tests cover our production code determine the answer to this question.

Test-Driven Development

When starting to write tests, a common question is, What should I test? This question is usually easiest to answer before you’ve written the code you want to test. In that case, the the answer is:

Write the tests that force you to write code that implements the functionality you want.

In other words, don’t write production code that a failing test hasn’t forced you to write. This is the key to test-driven development (TDD).

Note that I wrote “a failing test,” above, rather than “failing tests”: the process is best done in tiny increments, as follows:

Add a small increment of test code, which makes the tests fail (“red”).
Add a small increment of production code, which makes the tests pass (“green”).
Refactor as needed to keep the design of the code clean and simple (repeat 3 in small steps, as needed, before going back to step 1).

Ideally, each step in the “red…green…refactor…” cycle is as small as possible. It should be the “red” step which actually pushes you towards the functionality you want – that failing test is a statement about where you want the production code to go.

If you find yourself writing multiple tests at once, or a more complex test than is strictly needed at the moment, then stop and try to break the problem into smaller steps.

If you only write production code in response to a failing unit test, then you know that code has both a reason to exist, and test coverage for that code. See Kent Beck’s Test-Driven Development By Example for a more detailed explanation of this process, or try it for yourself. I found when I adapted this pattern, about 14 years ago, my productivity, the reliability of my software, and my enjoyment of writing it, all increased noticeably.

Probing With Tests

On the other hand, maybe you’ve inherited some code from somebody else. Or maybe you’ve written a bunch of untested code as a prototype, and you want to consider it production code. In these situations, I like to poke at the code in question using tests. These exploratory tests can be a way to build understanding about how the code works, and to document your findings. While reading the code under study, I may find an area that looks like a defect: in that case, I write tests to try to show the defect (or prove its absence). Some of these more speculative tests may not live on permanently in my test suite, but they usually do.

Importantly, whenever a defect is found, I always try to write a test which detects that defect before I fix it, to guard against the defect ever recurring.

Tradeoffs

TDD affects development speed – for very exploratory work, such as building prototypes, it can be slower, though I find it usually helps velocity as projects grow in complexity. It definitely, in my experience, helps add reliability.

I like the analogy of rock climbing. If I’m scrambling over a short wall, I don’t need a rope. Summiting a tall mountain face definitely calls for serious protection. Bouldering close to the ground, not so much, but be careful that your “bouldering” code doesn’t sneak into your “summiting” code.

There are times when strict TDD is less critical, and times when I find it especially important. Obviously, for exploratory coding on a prototype, TDD can be overkill. Lispers armed with REPLs tend to be especially comfortable with exploratory coding, but it is important to shore up our REPL-driven snippets with tests, lest we lose the ability to make changes safely. (A favorite technique for this situation is to comment out bits of code and see if your existing tests fail, then TDD your way back to bringing the code back in.)

One place where I find TDD especially helpful is when the programming is more difficult than usual. With TDD, I handle the simple cases first, and gradually add sophistication to my production code as I add tests. As Robert Martin puts it: As the tests get more specific, the production code becomes more general.

If your software is hard to test, step back and take a look at the design. A difficult testing story is often a symptom of an overly-coupled design. Picking apart the ball of yarn can be difficult, and demands experience, but there are established techniques for doing so. The book Working Effectively with Legacy Code, by Michael Feathers, treats this challenge in detail.

While orienting your code around testability can help somewhat with the design of that code (by encouraging less coupling, etc.), the practice of TDD by itself will not guarantee a good, understandable, maintainable design, as Rich Hickey explains in his talk Simple Made Easy (which I highly recommend). The wrong tests can actually slow your ability to make design changes – one needs to consider design throughout the process, and be willing to rework both the tests and the production code as needed if the design needs improving.

Speed

A key point about all these tests: they must be fast. Fast, as in running the entire test suite takes less than a couple seconds. When you start relying on tests as your safety net, you can only move as fast as that net lets you move. If the tests take 10 seconds, a minute, 10 minutes to run, progress will be slow. I often run my tests multiple times a minute, literally every time I save a file in the directory I’m working in. Or, if I am writing in a dialect of Lisp such as Clojure or Common Lisp, I run the entire test suite with a couple of keystrokes and expect instant visual feedback directly in my editor. The good news is that most of the tests you write can be very fast. Slow tests can be a design smell – or at least a sign that you’re testing at too high a level, through too many dependencies. It can take time to do the needed refactoring to make your tests fast, but it is worth the investment. Organizing your software in layers and using mocking, dependency injection, or polymorphic dispatch¹ to implement simple test harnesses for individual layers can be helpful here.

Some tests are intrinsically slower. Examples include tests through a database, or generative / fuzzing tests that programmatically generate many examples and make assertions about program behavior while running against those examples. These tests can run separately, such as after every commit, or prior to a release. Invest in separating out slow tests when needed.

Clean Tests

All code in your repository has a maintenance cost, including tests. I try to keep test code as clean as production code. Because tests can be repetitive, often with subtle variation between test cases, it is important to use the abstractions available in your programming language to make the test as clean and as readable as possible. Squeezing out repetitive idioms is one area where high-level languages such as Python, Clojure, or Lisps have an advantage. But clean tests should be a goal regardless of language.

Tests by example, also known as table-driven tests, are an excellent way to get clean tests. While Clojure’s are macro (trivially portable to other Lisps) gives the cleanest implementation I know of, table tests are also common in Go and probably in several other languages as well. It is worth discovering if your language offers this construction, or, if not, perhaps creating it, if possible.

Test Frameworks

Though some communities (Java, Python) gravitate towards the xUnit style of test frameworks, I find them to be overkill for most small projects, and for some large ones. Test frameworks try to address common areas of repetition in tests, but the abstractions your project needs in order to write clean tests may or may not be a natural fit for an existing, heavyweight framework. The job of your test framework is to make sure that every assertion in your test suite is run, and that the build fails if any of the assertions fail. Good examples of lightweight test frameworks include clojure.test for Clojure and 1AM for Common Lisp (perhaps with the addition of my are macro). Go’s testing package is another.

In some cases, a few test functions, a trivial test runner, and judicious use of assert are all you need.

Test Coverage

Tools are available for most languages which will generate test coverage reports. I am not a fan of strictly defined coverage targets which fail test suites if coverage is below some fixed amount. Nevertheless, I think that inspecting code coverage from time to time can be helpful in identifying areas that need better tests.

Discuss with your team what fraction of coverage you want to shoot for in your project. Some projects made up of mostly pure functions can get to 100% easily, whereas others, perhaps heavy on error handling, UI code, or integration with other services which are hard to mock, settle at something less, perhaps 70%-80%.

Monitor your repositories for slippage in coverage, and use coverage reports to inform your work as you write code.

Style Checks

It is my experience working with a variety of junior and senior engineers that standardizing coding styles leads to more readable code and higher quality overall. Therefore, I like to run “linters” at the same time I run my tests. Linters are tools which check code style according to agreed-upon guidelines and which raise an error when the standard isn’t met. Examples of linters are lint in C, pycodestyle (formerly pep8) in Python, and kibit and bikeshed in Clojure.

In addition to or in place of a linter, an automated code formatter, such as gofmt for the Go programming language, or clang-format for C, can reformat source code to adhere to community style rules. Since they fix style violations automatically, they have the benefit of not slowing down your workflow or miring developers in endless formatting discussions during code review, etc. The Go story is especially successful in this regard, with the result that formatting discussions are largely absent in the Go community.

Most code formatters and style checkers are configurable. These options should be stored in a configuration inside the repository. As the team’s preferences and needs evolve, the configuration options can be discussed and updated via pull requests or according to your team’s code review procedures.

Automated Builds

Once the testing / linting rhythm is well established, automated builds shore up the practice further. Every project should have a primary build which is triggered on a build machine, known commonly as a Continuous Integration (CI) server, whenever anybody pushes to the master branch. This build runs the automated tests for the project, to ensure that no broken code gets shipped, or, if it does, that it can be noticed and fixed as soon as possible.

Rarely is the code I’m writing destined to run on the machine I’m writing it on. Achieving fidelity between development and production environments is frequently a challenge, though it is easier for some languages than for others. Regardless of the language or target platform, I’ll try to achieve as much parity as possible, generally using Docker for builds. The Dockerfiles for these projects tend to be simple: a few lines to install any needed dependencies, a line to copy the files into the container, and a line to run the build tasks using make (more on that below).

Since I host my repositories on GitHub, I use GitHub Actions for my builds, generally running them within Docker except for highly standardized builds such as for Go programs.²

The history of the build success or failure can be easily seen on the GitHub Actions page for that build. If the history is not full of mostly green builds, your developers are probably not running their tests locally often enough, or there may a discrepancy between build and development environments that needs addressing.

I make sure that the README has a build badge which shows the most recent build status. These badges may seem frivolous, but I feel they do give some insight into how well a project is maintained. Any build failures also trigger an email to the committer, and optionally to any people “watching” the repository on GitHub.

Repeatability

Builds should be repeatable. That is to say, if a build fails for particular commit, it should always fail, and if it succeeds for a particular commit, it should always succeed. Any other behavior makes collaboration and troubleshooting harder (“it worked when I tried it / on my machine”) and adds noise which slows the team down. Tests that fail only sometimes are usually a sign of race conditions, particularly insidious bugs which tend to be hard to identify and troubleshoot but which generally boil down to defects in the tests in question and/or in the production code.

To reiterate: unpredictable builds should be viewed as bugs, and should not linger unaddressed.

READMEs

I like every repository to have a README file, written in Markdown³. The file explains:

what the project is for;
how to acquire, build and test it;
gives an example snippet showing the project in action;
provides links to further documentation, and
has a license, along with a possible legal disclaimer (important for open source projects where you may receive less legal cover, and financial benefit, than for software written for the benefit of an employer).

The example snippet shown in the README is especially important. It not only gives a brief snapshot of your software in action, but it also gives a feel for what it is like to use it and what it actually does. Examples can be found here and here.

Whether for open source or closed/private work, stating license terms clearly can be helpful. GitHub makes this especially easy now. For closed-source, restricted repos I add a simple reminder in place of a license: “© <year> MyOrganization. All rights reserved.” For open source projects, I generally choose a fairly permissive license, such as the one from MIT.

On a personal note, for my own projects, I try to give every repository its own unique artwork or photograph which appears at the top of the README, an image ideally at least tangentially related in to the project (this can be a challenge for a software project). I find that it adds to my enjoyment while working on the software, and helps me keep track of what repo I’m looking at at any given time. It also ties my creative practices (painting, drawing, photography) with my work in software development, areas of my life which otherwise rarely overlap.

Versions and Releases

Your project should have versioned releases. The process of making releases should be entirely automated. Typically, I make a release by tagging the software in Git, and pushing the tags to GitHub. Depending on the language and target platform, this could involve other steps, like making a tarball, triggering a build on GitHub which makes the release files available for download, or starting a deployment process to a staging environment.

Your program, Web service, app, or library should know and be able to report what version it currently is.

The release process should be scripted and have the following steps:

Make sure no code is uncommitted in the local directory
Get the most recent version, e.g. v0.0.99
Determine the new version, e.g. v0.0.100
Update the local software so that it can report the new version when asked
Commit the updated local software, e.g. git commit -am "Release v0.0.100"
Tag the new software, e.g. git tag v0.0.100
Push the new tag, e.g. git push --tags
Increment the local software version to indicate its “tainted” (off-version) state, based on the new tagged version, e.g. v0.0.100-dirty.
Commit the local software again, e.g. git commit -am "Taint post v0.0.100 release"

An example release script can be seen here.

Any released artifacts (tarballs, jar files, etc.) should reflect the release number, so if you’re generating release artifacts locally, it should be done after Step 4, and before Step 8, above. Otherwise, do it on your CI server based on the tag you push in Step 7.

This automation can take some effort up front to set up. But it removes a major source of resistance to doing releases, and allows for tighter feedback loops. An important component to this practice is to keep the software releasable at all times – use feature flags or other mechanism to “hide” new features under development from users or otherwise indicate their preliminary nature. If software is always releasable, and the automated tests are trustworthy, it will be safer to quickly deploy bug fixes quickly when things go wrong, and easier to track down bugs in production if they ever occur.

If you can package your software using the target platform’s packaging system (Homebrew, apt, Clojars, PyPI, etc.), it goes a long way to making installation easier, particularly when onboarding new developers. This is typically easier for open source projects, but can be helpful for private repositories as well (an example is private Maven repositories for the Clojure and Java ecosystems).

A side note on semantic versioning: there are many passionate opinions about how to do semver correctly. If you follow my open source repositories, you’ll see mostly versions like “v0.0.123” rather than “v1.10.1”. I personally believe that semver is overrated – in particular, breaking changes (so-called MAJOR revisions) are both hostile to users and usually avoidable … and should therefore be extremely rare. This talk by Rich Hickey explains the viewpoint far better than I can.

My advice is to avoid breaking changes whenever possible. One strategy for this is to version your APIs, so that older and newer versions coexist together. Another is to simply call an existing project done, and releasing a completely new project with extensive changes, rather than breaking the old one for existing users.

Issue Tracking

Your organization probably already does issue tracking, either through JIRA or some similar system. If you don’t, consider using GitHub Issues to track bugs and future work. I use these for my open source projects, both to encourage other people to file issues, and to give visibility into my to-do list. When I make a commit, I tag the issue number in the commit. For example,

git commit -am "Fix horrible race condition.  Fixes #66."

GitHub automatically cross links these commits with the issues themselves, and in the case of the above example actually closes the ticket automagically. JIRA can be configured to do the much same. This kind of tracking can make it easier to understand what changes were made when, and for what reason. In some kinds of environments (e.g., highly regulated ones), this kind of change-tracking is a firm requirement.

Other Helpful Practices

Using a Common Build Tool

I frequently work in Python, C, Clojure, Go, and Common Lisp. Each of these languages has its own toolchain, which may be more or less standard for the language, but I find that using make to automate most common tasks makes it very easy for me to remember how to carry them out.

Here are some make targets I write for most projects:

make test runs all the unit tests locally.
make lint does any style checks.
make on its own runs any tests and linting / style checking (this typically matches what runs on the build server after every push).
make release does everything needed to cause a release to be created.
make docker runs tests and linting steps inside a Docker container.

Though make is an old tool, with a strange, if powerful, syntax, it is very fast, and relatively simple to use for the above use cases. (I find many modern build tools to be slower and more awkward to use than Make for common operational tasks.) The practice of using Make in this way can be helpful in larger organizations, where several different languages may be in use, and where developers may switch from project to project with some regularity. Some of my repos which use this pattern are l1 (Go), smallscheme (Python), oatmeal (Clojure), cl-oju (Common Lisp), lexutil (Go) and hbook (Common Lisp).

Getting “Invisible Feedback”

This trick relates to the “fast feedback” requirement mentioned earlier and applies primarily to local development. If you can get your computer to speak out loud, you can run your builds in another terminal window and not even look at that window most of the time. For example, on my Mac, I have the following running much of the day:

conttest 'make && say ok || say fail'

Here conttest is a continuous testing tool I wrote, which runs the supplied argument, a shell command, every time the current directory is updated. When I save a file I’m editing, I instantly hear the result. If my tests are doing what I think they should be doing in the moment, I can simply hear the results as I go, without looking away from the code. Once again, this little genie is especially helpful when my tests are fast.

Auto-Generating Documentation

Look for opportunities to automate your documentation. Examples include API documentation generated from the source code, literate programs, and automating updates to READMEs based on program output (for example, see how the L1 README is updated).

Summary

Although these practices require some up-front investment, I find that they increase my efficiency overall, especially when I’m working with other collaborators. Good, clean operational practices like these can cut back on common sources of pain when working on long-running software projects.

Acknowledgments

My views have been influenced by discussions with former colleagues at OppFi and OpinionLab over the past eight years. Timothy Coleman wrote the original version of the bumpver release script linked above.

Examples include using protocols in Clojure, generics in Common Lisp, and interfaces in Go. ↩︎
I prefer to run even Clojure builds in Docker, since I’ve been bitten by subtle differences between my local and the target deployment environments). ↩︎
I actually prefer Org Mode for my own writing, but GitHub’s Markdown support is better than its support for Org Mode. ↩︎