Oh what a tangled web, or, Maven dependency management

Written by Andrew on September 23rd, 2010

I was at a presentation last year by Arnaud Heritier, of the Maven core team, who advised the following best practices for dependency management (slide 23 in the presentation):

  • Define all dependencies you are using – and no more!
  • Cleanup your dependencies with mvn dependency:analyze

The first item is really two pieces of advice, both excellent.

  1. Your POM’s dependencies should list all of the artifacts that your code uses directly. This apparently obvious rule is very easy to violate unintentionally—and indeed unknowingly. Here’s an example: suppose you are writing a toString() method, and you ask your IDE’s autocomplete facility to look for ToStringBuilder. Your IDE looks on its classpath, which includes all your dependencies both direct and transitive. You haven’t defined a dependency on commons-lang, but you have defined one on spring-context, which depends on commons-lang. So your IDE finds ToStringBuilder in the project classpath, and adds the import statement. It doesn’t warn you, because it doesn’t know the difference between a direct dependency and a transitive dependency. Bam! You’ve got a used but undeclared dependency in your project, a small but unpredictable landmine(*) waiting for the right trigger to set it off and fail your build. It will make your build fail the day that your dependency on spring-context disappears, or changes to a version that no longer pulls in commons-lang.(* In any project that’s been around a while, a cluster bomb would be a better analogy for what’s likely to be lurking in the dependencies section of its POM.)
  2. Your POM’s dependencies should not list any artifacts that your code does not need. This rule is also obvious and easily broken. It happens whenever you cease to use any classes from a library, but fail to remove the dependency from your POM. You don’t necessarily realise when you’ve removed the very last reference to a library from your project, and your IDE won’t tell you. If you break this rule, it won’t fail your build directly, but it will pollute your POM with unused artifacts. Since you have to define the version that you want of each of these unused artifacts, it also increases the chance of a version conflict. This happens you declare dependencies on A, that you really do use, and X, version v, that you don’t use, and A also declares a dependency on X, version v’. Maven will enforce version v of X, which may or may not be compatible with A.

So what about the advice to use dependency:analyze? This will compare the classes referenced from your Java source with the dependencies declared in your (effective) POM, and flag up any discrepancies between the two: that is, artifacts that you have declared in your POM but do not use, and artifacts that you use in your code but do not declare (and have got away without declaring because they are pulled in as transitive dependencies of something else).

The dependency:analyze goal can be a useful tool, but it gives inadequate protection against the problems mentioned above.

  • One aims to catch mistakes as early as possible. The solution needs to be sought, not in an analyser run post-hoc, but in IDE tooling. If the IDE were aware of the distinction between direct and transitive dependencies, it would know not to import a class without checking that its JAR was declared in the POM. And it could do that yellow wavy underlining thing when you imported something that was not directly listed in the POM. It could also, perhaps, warn you when you excised from your project’s source code the last reference to an artifact. This would be far superior to occasionally running, or more likely forgetting to run, a dependency analysis tool and hand-processing its output. (The dependency analysis tool has to be launched manually. There is no point running it automatically as part of the build, since no-one will read its output unless it fails the build upon warnings, and you cannot let it fail the build, since it generates spurious warnings — see below.)
  • On top of that, it simply doesn’t work very well.
    • It will tell you that you have declared a dependency but not used it, when in fact the class is referenced in a configuration file (e.g. Spring), and failing to declare the dependency would cause a runtime ClassNotFoundException (or is it NoClassDefFoundError? I forget. Anyway, you know the one I mean.)
    • Conversely, it will fail to tell you that you have used a dependency without declaring it, if the only reference to the class is in a configuration file.
    • It will tell you that you are using a dependency without having declared it, when the dependency is referenced only from generated code. As an example, if you use YFWSF (Your Favourite Web Services Framework) to call a webservice, you’ll probably use yfwsf-maven-plugin to generate the client-side source code from the WSDL during the generate-sources phase. This source code will reference classes from, say, jaxb-api. The dependency:analyze goal will therefore give a warning, unless you put a jaxb-api dependency in your POM. However, you should not put that dependency in your POM, since the generated source code is effectively an artefact of YFWSF and not of your project, and the transitive dependency on jaxb-api declared by YFWSF is sufficient.
    • Conversely, it won’t warn you if you’ve declared a dependency that is only referenced by generated code.

If the Maven meta-model allows it, fixing the problem with generated code would be relatively simple. It would be enough to add a flag to make it ignore either generated code, or code under /target (which should come to the same thing; some people generate source code under /src, but they deserve all that is coming to them).

Detecting non-Java references to classes is an entirely hairier proposition. It isn’t feasible to understand the configuration formats of every single tool capable of instantiating a class referenced in a non-compile-checked manner. It might be possible to run a simple plain-text search across certain text-based files (XML, properties), looking for the fully qualified names of any of the classes contained in direct dependencies (to rule out apparent “declared, not used” errors), or any of the classes contained in transitive dependencies but not in direct dependencies (to catch “not declared, but used” errors).

In all of that, I haven’t even talked about the <dependencyManagement> section. The above only applies to the <dependencies> section, which is where you define what your project really uses. What Maven calls dependencyManagement serves to define the versions that you want for artifacts It’s a bit confusing: an artifact can be listed

  • in both dependencyManagement and dependencies: it’s a dependency of the project, and will be propagated transitively to projects that use this project; the version number must be given under dependencyManagement, but should not be given under dependencies; the version used will be the one specified under dependencyManagement.
  • in dependencies but not dependencyManagement: it’s a dependency of the project, and will be propagated transitively to projects that use this project; the version must be given. This is poor practice because it’s preferable to group all the version management together in dependencyManagement.
  • in dependencyManagement but not dependencies: it’s not a dependency of the project; if one of the project’s dependencies requests it, the version requested will be overridden by this one; if it’s not even a transitive dependency, there will still be no error nor warning from dependency:analyze (or anything else).

Our architect put me onto the practice of keeping the version numbers in the dependencyManagement section only, and keeping them out of dependencies. On a multi-module project (i.e. most projects), you have a single dependencyManagement in the parent project, which ensures that all modules use the same versions of their dependencies. The downside of this is that you have to keep skipping between child and parent POMs when you add or remove dependencies, and this further burdens the task of keeping track of which dependencies you are really using.

It is even worse when child and parent are on a different release cycle: when you change a dependency, you have to (a) change the child’s parent to the latest snapshot of the parent, (b) add the artifact to the parent’s dependencyManagement, (c) build the parent, (d) commit the parent’s POM, (e) add the artifact to the child’s dependencies, (f) code what you needed, (g) integrate the changes to the child into the version-control trunk (i.e. make sure tests pass), (h) perform a release of the parent, (i) change the child’s parent to the newly released version, (j) commit the child’s POM again. You could leave the child inheriting from a snapshot version of the parent, but you won’t be be able to release as long as that’s the case, and I’ve learnt that it’s a bad idea to put impediments in the way of a release.

It is worse still when the parent’s release cycle includes other sub-modules, which may have work in progress on them. If you do want to share a dependencyManagement section between projects that are related but on separate release cycles, I strongly recommend either having a grand-parent project that contains only the dependencyManagement but no modules, and exists on its own release cycle (so that you can modify and release it quickly when you need, without impacting sibling projects which can carry on inheriting from the previous version), or using the <type>pom</type> <scope>import</scope> technique.

 

Leave a Comment