UnSemVer!
For better or worse, most software ecosystems use Semantic Versioning (SemVer) to resolve dependencies. Semantic Versioning is a convention, between library authors, library consumers, and build tools.
It can be tempting to believe that versioning is "solved", that, given it's ubiquity, SemVer is the right way to version software. But I find it instructive to explore how the world might might be different.
Recap of SemVer
Before we discuss any problems with SemVer, let's quickly cover what SemVer is.
The official description of SemVer can be found at semver.org. SemVer is a convention for naming software versions. Version names consist of three numeric components separated by periods. Each one of these components has a name and an associated situation when it is incremented.
For example the version:
1.3.5
- Major Version - Changes when you make incompatible API changes
- Minor Version - Changes when you add functionality in a backward compatible manner
- Patch Version - Changes when you make backward compatible bug fixes
Semantic versioning also defines precedence between versions.
While not included in the spec, Semantic Versioning is accompanied by tooling which handles the work of translating the consumers wishes into resolved packages. For example the library consumer may write 1.*.* to indicate they want any release with a major version 1. Or >=2.0.1 to indicate they want a version at least as new. The next time you build your software, this tooling will select an appropriate version to include.
The Problem
One of the assumptions in SemVer is that you can identify breaking changes in advance. This is impossible. Sure, you can detect some categories of breaking changes with static analysis. But the concept of a breaking change is context dependent and often up to human discretion. And since humans are imperfect, we are bound to mislabel a release as non-breaking.
To illustrate the issue this causes let's look at an example:
Let's say we incorrectly release a breaking change as 1.4.0. As soon as this happens build tools around the world download and use the latest (mislabeled) version. Maybe the new release breaks their builds, maybe it doesn't – it depends on the specifics. Now is a time to start panicking!
"Can we just delete 1.4.0 and try again?"
Not really. There are already consumers who depend on the new version. Some of them may even depend on the freshly broken APIs. We can't just take away the library!
"Can we leave it as is and publish a new 1.4.1 which "reverts" to the same contents as 1.3.5?"
This is actually what SemVer.org proposes. But it's not perfect. Everyone upgrading from 1.4.0 → 1.4.1 will experience a breaking change. That is specifically contrary to the semantics of SemVer.
"So what's the path forward?"
If you're unsatisfied with the official solution of a "revert" version, then you'll need a new strategy. You need to introduce additional data for use during dependency resolution. We might want to mark 1.4.0 as deprecated so that build tools skip over that version. But if a user specifically asks for 1.4.0 they could still access it.
This is not a novel solution, it exists in many ecosystems already.
- In Python, PEP 593 introduced a "yanked" boolean on therelease. (2019)
- In Javascript, NPM has the concept of "deprecations". This was perhapsintroduced in response to the left-pad incident, but I haven't found discussionsupporting this claim.
- In Rust's Crates.io you can similarly "yank" a release.
- This feature is missing in Java's Maven/Gradle. Library authors areexpected to publish a new version with a patch bump and hope users donot depend on
1.4.0.–>
A Solution
I'm a bit miffed that Semantic Versioning loses its semantics so easily. So long as breaking changes are easy to create, and fallible humans pick version numbers, we will have versions that violate their semantics.
Once we error, there is no elegant way out. A release named 1.3.5 offers no room to later change its relation to neighboring releases. This is the fundamental feature of SemVer, the name comes with resolution information baked-in.
To solve this problem we have to decouple the address of the artifact from its resolution. We want the ability to release a version and then later change its relation to other versions. Imagine an alternative world:
In this world:
- Each release of a library can be referenced by a hash of the artifact.
- At the top level, a library includes
resolution.jsonfile which contains mappings from semantic versions to hashes. - The
resolution.jsonfile may also include tombstone entries, which tools may use to indicate the released version is no longer part of the release chain.{ "1.0.0": "abc123", "1.3.5": "bcd234", "1.4.1": { "type": "tombstone", "msg": "This version was tombstoned. Please see https://example.com for more information", "hash": "cde345" }, } - Users may specify a release either by it's hash or a semantic version string.
- When using Semantic Versioning build tools consult
resolution.jsonand download the artifact with the correct hash. - Build tools skip over tombstone records unless that is the only resolvable version. Then an appropriate error is presented to the user.
This proposal is similar to existing "yank"-ing behavior described above The main difference is that we no longer refer to releases by their semantic version, we can instead refer to them by hash once they have been tombstoned/deprecated.
In theory this decoupling supports more complicated dependency resolution strategies. Instead of resolution.json we could have resolution.scm and execute a sandboxed script to translate a user specified version into the actual artifact. While this is enticing, I haven't found a use-case for this flexibility.