What’s in a (strong) name?

Package managers are brilliant. The obvious win is the convenience of versioning your dependencies without having to commit binaries to Git. Lots of people use package managers for third party dependencies, but I’m going to talk about using them for components within an organisation. This is where package managers fulfil their true potential – making you write better code.

For example, imagine the desktop app I’m writing uses a shared component, let’s call it Shared.Utils. There’s a bug in my app and I suspect that it’s caused by Shared.Utils.

Compiling from source:
I tweak the loop counter to start from 0 instead of 1, then run it in the UI and see if the problem still happens. That doesn’t work, so I make another random change and check the UI again.

Consuming a package:
Creating a NuGet package and updating to it in my desktop app as I tweak things isn’t as easy to do. I could write a script to make it more convenient, but that’s effort, and feels a bit wrong. So instead I figure out what values are passed to Shared.Utils and write a test in Shared.Utils for the behavior I’m expecting. If it passes, then I’ve just improved the test coverage of a feature I depend on, and increased my understanding of it. If it fails I can fix the issue with some rapid feedback, and I have a test to make sure the bug isn’t reintroduced.

Hopefully the end result of the second option sounds better. Over time Shared.Utils will become better tested. Not only that, but the behaviors that get tested are precisely the ones that products depend on. So if someone makes a change in Shared.Utils and breaks my product, it’s at least partly my fault for not having added a test in there. All this means people can make changes to these shared libraries with less fear.

Gotcha. Obviously I’ve painted a rather rosy picture there, so here comes the downside. Now that the fear of change has gone, Shared.Utils keeps changing – I obviously want the latest version since it has all the latest bugfixes. What I didn’t mention, is that my desktop application also depends on Shared.Controls, and Shared.Xaml, both of which depend on Shared.Utils.

Versioning hell

Now I’m left with 3 standard options:

  • Use assembly binding redirects to pretend the later version of the assembly is actually the earlier version at runtime.
  • Always take the latest on everything.
  • Take a dependency on multiple versions of Shared.Utils – use the global assembly cache (GAC) to store them and aliases to ensure the correct one is used.

The first option feels like a hack. That’s because it is. The customer’s machine is a bit late to be applying hacks. The scheme is workable for some applications, but there’s the problem of MissingMethodExceptions when you use the hack to unify assemblies with different APIs. The second and third option require lots of commits and increases the sequential build time for a minor change or bugfix to get into your code base. The third option is also cumbersome because of the aliases (which generally baffle tools like ReSharper) and an installer that needs to deal with installing to the GAC. Any one of these schemes is usable, let’s see how semantic versioning can help alleviate some of the pain of each.

Semantic versioning – assembly version as an API version

In .NET, assemblies can be strongly named. A strong name requires a name, a version and a public key token. MSDN tells us that “Assemblies that have the same strong name should be identical”. However, Microsoft has broken this rule themselves in the past and in order to neatly solve the dependency problems described, we’ll have to bend the rule to: “Assemblies that have the same strong name should have identical public APIs”.

Crucially, this means that even though we’ve compiled against a strongly named assembly, we could load a different one in its place – so long as it’s public members are the same. That makes sense, and it means the compiler still stops us from getting the likes of MissingMethodExceptions. So we could just make our assembly versions ApiMajor.ApiMinor.0.0. Whenever we detect a public API change we increment ApiMajor for a breaking change (removing/changing a member) and ApiMinor for a backwards compatible change (adding a member rare edge cases aside). With the caveat that modifications to the assertions in any existing tests are also a change, which must be judged on a case by case basis.

Of course the AssemblyFileVersion should still be unique so we can keep track of assemblies, for example ApiMajor.ApiMinor.Build.0., and if we version the containing package as ApiMajor.ApiMinor.Build then we’ve adhered to semver.org, so consumers know what they’re getting. This means that we don’t need to use the GAC, and we only need to do the tedious upgrade when the API changes. Our NuGet packages can always safely depend on [major.minor.build, major.minor+1.0)

Assembly version as a lower bound on API version
Sometimes it turns out that we actually want to change our public API reasonably often, which means the disadvantages of the third point above are still a lingering problem. We can take this a step further and make the AssemblyVersion ApiMajor.0.0.0 –
package dependencies would now be [major.minor.build, major+1.minor.0). This allows many additions to the public API while keeping the assembly backwards compatible. All assemblies with a major version of 4 will contain at least the public members that were in 4.0.0.0. If we load an old version in the place of a newer one though, the compiler can’t save us from our idiocy, so we need a way to make sure that doesn’t happen.

Fortunately, we can rely on NuGet to only allow us to install compatible packages. That way, as long as we use update-package we should get a set of packages which work together. This means only removing a public API member requires the tedious bubbling through of dependency updates – because you genuinely need to check the removed member wasn’t used in those projects.

Everything is good, except that we’ve broken even our modified rule about AssemblyVersions thus losing the compiler’s help. The rule is starting to get a bit complicated: “Of two assemblies that have the same strong name,the one with a later file version should have a superset of the other’s public API”. So we can only use this version if we have solid testing in place (for example verifying that NuGet dependency constraints are adhered to).

Versioning our versioning

We’ve now got two schemes which can work reasonably well for projects with different rates of change. Now the problem is, how to figure out which version of versioning a dependency is using, in order to decide which constraints to add to our nuspec file.

Put it in the package.
We can put it in the tags or the description field of the nuspec. We could store some scheme name or version, but to be as general as possible we could store the lower bound on the next nuget package version in our scheme that will have a:

  • Different assembly version
  • A minor API change
  • A major API change

Even if you don’t do semantic versioning, make this information available for your package is valuable, that way people can at least tell if you change your versioning scheme in future.

Obviously it’s only a matter of time until someone makes a mistake in the versioning, so at minimum we need to be able to check no-one made a mistake, but why stop there?

Public API equality checker

Ideally we could extract the public API from source files and store it in a canonical format. This would allow us to fail fast if the manually set version is incorrect. Once we know in what way the public APIs are different to the last build we can actually calculate the next version and put it in the AssemblyInfo.cs file to remove the manual aspect.

Unfortunately I wasn’t aware of any project which would allow me to easily extract a public API from source files (in the future I may attempt this using Roslyn), so I opted for extracting the public API from a compiled assembly. This serves the purpose for checking manual entry, though obviously doesn’t fail as fast.

Since my checker already used Mono.Cecil, I also added the functionality to retroactively change its version number, but this part feels a bit hacky to me, and I’m reluctant to deploy it to a real system.

I like @kevinfromireland’s idea of NuGet being able to check your scheme during upload. It could let you override your declared scheme to account for screwups, but mark the update with a warning icon.

Nuspec Updater

The scheme chosen relies on NuGet to select the right dependency versions, which means we need to feed that information to the updater in the first place. Due to its neat handling of XML and the small size of the task, I wrote a powershell script to do this. It simply propagates version numbers from packages.config into the nuspec, something that I wish the NuGet spec command could do for me. It sets dependency version ranges based on the scheme described, if the version numbers have 3 digits.

Test code analyzer

There’s a subtle type of public API change I haven’t talked about. Changing the meaning of a method/class/event/property, without changing its signature. If there are projects dependent on untested behavior then it’s impossible to know the difference between a behavior change that fixes a bug, and a behavior change which causes a bug. Naively assuming that required behavior is tested, we could look for changes in existing tests on lines with the word “assert”, and count that as a API breaking change. I haven’t attempted to implement any such analyzer, as it sounds pretty flaky. It seems preferable to add a manual override for the previously described mechanism and make it clear to people when to use it.

Summary

There are 3 steps to improving your versioning, each a useful step in its own right

  • Declare your versioning scheme
  • Use semantic versions for your packages
  • Automate checking your scheme is enforced, using something like ApiChange
  • Use the same AssemblyVersion for semantically compatible packages