Border Crossings

Devoxx France roundup

Sun, 22 Apr 2012 00:00:00 +0000

The first French edition of Devoxx was, first of all, a resounding success for the organisers: sold out with 1200 participants, I think we can be confident that there will be a second edition. It was also a success for the participants, or at any rate for me, in that what I got out of going there was worth the entrance ticket (notwithstanding that I went on my own time and at my own expense). Hats off to the organising team for the incredible amount of work they must have put in (for free), for having the courage to make their dream come true, and also because the logistics were top notch. Which is no mean feat, when twelve hundred people have to be looked after.

What I got out of it

What I got out of the conference was (1) the content of the talks, and (2) feeling part of a community. Getting a huge shot of new shiny stuff in the company of like minded people is a fantastic motivation boost. By the way, I can’t recommend the conference to non French speakers: a quarter of talks were in English, but in practice you really needed to understand French to get most of the benefit.

There were technical talks where I got introductions to some tools and techniques that I don’t see in my daily work, or haven’t yet tried. But what I really valued were the talks - mostly keynotes - that purvey strategic insights. There’s so much going on in the software development world that it’s hard to pick out trends, especially when you’ve got your head down in everyday work and are only listening with half an ear.

To my mind, that’s the biggest benefit of a big independent conference: in two or three days, you can take the pulse of a whole community.

What I didn't get out of it

I had been expecting to get some networking benefits, even though I’m a hopeless networker. But the style of the conference doesn’t favour striking up conversations with complete strangers, unlike the Agile conferences I’ve been to. You do meet friends of friends that you didn’t already know; so I met a couple of people from Geneva that I hadn’t known, or put a name to, before. But I didn’t really need to go to Paris for that.

The high point

The high point was Neal Ford’s keynote “Abstraction distractions”. Hilarious and tremendously useful at the same time. I love a talk that makes you better at thinking.

Is it worth going next year?

If you didn’t go last year and are wondering whether to go next year, I’d say: go if you want a big conference, vendor-sponsored but independently organised, with lots to learn, mostly technical but also strategic, and you understand French. Go if you want to feel good about being a programmer. Go there to network, unless you’re hopeless at it (like me) and/or hardly know anyone else who’s going (surely impossible). Don’t go there if you don’t understand French, unless you’ve got an invitation; you’ll only get 25% of the benefit.

What should they keep next year? What should they change?

They should keep: almost everything, and in particular

the format: 2-3 days is exhausting enough, and longer than that is too hard to justify to employers and families
the rigourous selection of speakers
the invited keynote speeches from big names (more of those, if possible)
the very professional venue and catering and logistics (despite the expense)
the corporate sponsorship (which I dislike, but without which there wouldn't be the money to pay for professional facilities)

They should change: only minor details, like

the lobby which didn't have enough space for 1200 people to mingle and move between sessions all at once
the corporate keynotes sold to sponsors, even if that reduces sponsorship revenues. Sponsors should get regular slots in parallel with other sessions, so that we have the choice of going to listen or not. They will still get an audience, but only insofar as they have something to offer beyond a sales pitch.
drinking water availability - it was too hard to find any this year.

That’s the end of my round-up. In case they’re useful to you, I’m including below my more-or-less raw notes from several of the sessions I attended, namely

Fier d'être développeur - Pierre Piezzardi
Manipulation de bytecode pour les nuls
invokedynamic them all - Rémi Forax
Kanban pour les nuls
Behind the scenes of day-to-day development at Google - Petra Cross
Pour un développement durable
IBM talk on mobile apps
Portrait du développeur en "The Artist" - Patrick Chanezon
Abstraction distractions - Neal Ford
sizeOf in Java
TestNG parce que vos tests le valent bien
Overview of Guava
.NET for Java developers

Fier d'être développeur - Pierre Piezzardi

This was a call to arms for Agile principles of short positive feedback loops and focus on business value. (In some parts of the world it might have seemed superfluous, but not in France, alas.)

Manipulation de bytecode pour les nuls

See also a slide tutorial by Charles Nutter: Bytecode manipulation for dummies.

This talk was a series of tasters of 4 different techniques for manipulating bytecode: ASM, AspectJ, JByteMan and JooFlux.

By the way, the first thing to do if you’re going to look at bytecode is learn its notation for types.

I’ll skip most of the ASM example, because ASM is very low level and it’s unlikely you’d need to use it directly in enterprise software. It’s worth noting that ASM provides a reverse engineering utility: given an existing compiled class, it will generate for you the Java code that would generate that bytecode using the ASM library.

I’ll also skip AspectJ because I’m not up to summarising aspects in one or two sentences, and it’s well-known and well-documented enough already.

JByteMan was the most interesting section of the talk for me. This is a tool that can perform manipulations rather like AspectJ, except that instead of doing them statically at compile or load time, it connects to a running JVM and injects them into the already-running code. It’s controlled by a simple DSM that allows you to define conditions upon which injection should happen, and an (extensible) library of injection actions.

JByteMan is particularly interesting for certain types of tests, because it can be used to trigger failures that are hard to similate otherwise. For instance, you could simulate a disk full error by declaring that the Nth execution of a certain method should throw an IOException. The pattern for this kind of usage is: run a coverage check on your unit (and integration) test suites, identify the branches that are uncovered and are hard to get into with standard unit testing, and slap a JByteMan annotation on the test to set off the case you need.

The final tool they showed, JooFlux, is under development. It replaces invokeXyz bytecodes by invokeDynamic and allows you to plug in a method of your choice as target of the invokeDynamic. This makes it more JIT-friendly than JByteMan since it does not modify method code, but redirects the method call chain. Thus it doesn’t invalidate inlinings compiled by the JIT compiler. Which is nice, but not particularly important in a test context, so I’m not sure what are the use cases for JooFlux. Since it works via invokeDynamic, it’s only usable from Java 7 onwards, but that doesn’t matter because frankly I can’t imagine that anyone who was forbidden from moving to Java 7 would be allowed to use something like this.

The downside of all these bytecode manipulation techniques - and we already see it with tools that use them, like Hibernate or Spring - is that you no longer have a direct relationship between your Java source and what really happens on execution. This makes debugging difficult or impossible, which means logs are your only hope when problems come up. And in my experience, no-one takes logging seriously until after the production bugs have already happened. So I asked a question about the difficulty of visualising the real execution path, and how to overcome it. The speakers agreed that it is difficult, and there isn’t a canned solution, though you could write your own java agent that could capture the bytecode actually executed. (Getting java agents running in production isn’t aways on the cards, though.)

Invokedynamic them all - Rémi Forax

This was a very technical talk about JIT internals, and I humbly admit to having understood less than 10% of it.

The first takehome message from this talk is that I have absolutely no idea what is really going on when I run some Java code in a JVM. I thought I had some vague idea about how bytecode works, but when the JIT swings into action, the things that happen don’t have much anymore to do with bytecode. Tying this in to Neal Ford’s advice to know one abstraction below the one you usually use, reminds me that I need to learn more about JVM internals.

The second takehome message is that if I want to have the slightest hope of understanding what invokedynamic does, I need to learn how normal method invocations work at the bytecode level.

The third takehome message is that you can see the inlining and the generated code done by the JIT, if you really want to optimise performance.

A few specific points from the talk follow below. I may well have misunderstood any or all of them.

NB All this applies to JVM v7+ since that’s the version that introduced invokedynamic. NB (2) JRuby is the 1st language to use invokedynamic and give feedback (on performance, notably) to the JVM developers. Groovy is close behind. NB (3) You can’t write any Java code that will compile to an invokedynamic question.

Some useful JVM flags:

-XX:+PrintCompilation Prints out what the JIT is compiling …but you need to know how to read the output!

-XX:+PrintInlining Inlining of a method called via an interface: TypeProfile indicates that the virtual call has been inlined. inline(hot) indicates frequently called methods that have been inlined. The JIT collects profiling information to help it optimise. If it notices that a virtual call always goes in fact to the same concrete class, it will inline a direct call to the concrete implementation.

-XX:+PrintAssembly Not available with standard VM, the VM has to be compiled in fastdebug mode! And a few other tricks…

The output is erm… Well, it’s assembly language, with some comments added.

By the way, did you know that the JIT profiles the frequency of execution of alternative if() branches and reorders them so that the most frequently executed test comes first? I didn’t.

Hotspot makes C++ copies of all the Java objects it uses. When things get moved between heap generations upon GC, the JVM has to walk through all the objects and update addresses. When there’s a NPE in JIT-compiled code, the machine code actually explodes and the JVM is able to catch the problem, identify the point where the null pointer access occurred, and reconstitute the stacktrace for the NPE. Whew.

In the case mentioned above where the JIT inlines the fact that a virtual method is always called on the same concrete class, it similarly causes a fault in the machine code which the JVM catches and de-optimises the code (it returns to interpreted mode). If I understood it rightly.

A word more about how it works. The JIT is going to inline cases where there are multiple possibilities in the code, but at execution only one case comes up. It will write a check that ensures we are in that case, and will inline (JIT-compile) only the code for the case that’s really executed.

Second part of talk: invokedynamic itself

See java.lang.invoke for the classes that deal with invokedynamic by reflection. CallSite, Lookup, MethodType If I understood rightly, you can use this API to make invokedynamic calls from Java.

We got a runthrough of how to build invokedynamic with this API and a bit about how the JIT optimises the resulting bytecode. I understood almost none of it. You can tell that Rémi is a university professor rather than a teacher: instead of spoon-feeding nice digestible morsels of learning, he delivers great chunks of it and leaves his students to figure out how to digest them. This is certainly good training for the students, but it’s kind of sadistic at a conference.

Kanban pour les nuls

Just a few random notes for this one.

Kanban is: - kanban cards - a kanban system: a production process, a set of rules, a “flux tiré” of kanban cards - the Kanband method (with a capital K)

5 fundamental practices - visualise (kanban board - visible without having to open JIRA, read a dashboard…) - limit Work In Progress - policy: explicit and visible rules, adopted (and owned?) by the team. e.g. Definition of done pinned up on the top of the kanban board. - measure - improvement. Getting stuck triggers dialogue. Collaborative. Using pragmatic and/or scientific methods.

Starting point is the existing process. You don’t start, as in Scrum, but erasing your existing process and installing a new one from scratch. Nonetheless, although the speakers didn’t really say it, Freddy Mallet told me afterwards that following Kanban does imply some radical organisational changes.

The kanban board: card colour used to manage granularity or categorisation. Divided into columns by step. Max no of items per column.

The limit on WIP per col means that you work in “flux tiré” (pulled flow?). If all the columns are full, when you move something to “done” it creates a free slot on the penultimate col. You can then move something from the preceding column to that one, and so on backwards through the process. Items are pulled through the stages by other items that pass into Done.

The focus is on finished work, which goes hand-in-hand with working in pull mode. Each step in the chain requests work from the preceding step.

A key issue is how long an item takes from when it is first requested to when it is terminated (delivered in production).

The objective is for the team to settle into a working rhythm. There’s a daily meeting to do a roundup of ongoing work. It’s the most important among a number of ceremonies whose purpose is to manage the work rhythm.

Measuring: in Kanban a number of things are measured - number of ongoing items, time taken for an item to be done, production rate. The measurements help flag up opportunities for improvement.

Unlike scrum, where everything fits into the imposed rhythm of the sprints, in Kanban different activities (e.g. delivery, retrospectives) can take place at different rhythms.

By doing stats on the data on items, you can predict how long a given item will take, e.g. 80% chance of delivering within 5 days, ~100% chance of delivering within 7 days.

Kanban is not so much a method of production, as a method of accompanying change (to the production process).

Behind the scenes of day-to-day development at Google (Petra Cross)

This was a run-through of Google’s development process, which is a hybrid Agile method although apparently not everyone at Google likes to use the A-word. I noted a few points that struck me.

50% of Google’s code base changes every month!

When a feature is developed, the set of code changes is always sent for code review by another Googler. They eat their own dog food (the GMail team uses the pre-release GMail version for their corporate email). New features are typically activated for a small fraction of users (canary) and results analysed before activating globally. Analysed means that performance figures are checked… They start from user stories (with a BDD-like expression) but developers don’t work on user stories; the stories are broken down into tasks before development work starts. For both tasks and user stories, an acceptance test is part of its definition. And defines the complete scope of the work to be done - i.e. only work that’s needed to pass the acceptance test will be performed.

They have an “icebox” which looks a bit like a product backlog (many of the things on it will never get done) and a backlog (things that are planned for doing within the iteration). Backlog items are prioritised.

They don’t do standups.

Some teams at least use real whiteboards with real physical post-its that travel across the swim lanes. They own the post-its by sticking their photo on top.

They don’t add tasks to the ongoing iteration.

They do use planning poker! I.e. relative estimation (points), with velocity. However, to synchronise diverging estimates, they aren’t allowed to say anything involving a value judgement (like “it’s really difficult”), but only to detail the work that is involved in the task. If they don’t know enough to estimate, they play the question-mark card. Then the highest and lowest estimators list the work involved, and the questioner can also ask yes-no questions if that’s not enough. All of this removes any possibility of getting into open-ended discussion, and avoids emotionalising the debate. Consequently the backlog estimation sessions are rapid (20 mins weekly).

The objective of the estimation sessions is only secondarily to get visibility on how much will be done by the end of the iteration. Most of all, its purpose is to get agreement about what needs doing.

Developers aren’t allowed to cherry-pick the backlog: they have to take the highest priority remaining task, even if there’s another below that they could do more easily. The point of this is to encourage knowledge spread among the team by having everyone work on everything (avoids “knowledge silos”).

They have retrospectives monthly, with a simple 3-coloured post-it system: good - bad - suggestions for improvement. I got the impression that they don’t vary their retrospective format, which they probably should.

Pour un développement durable

A good talk with plenty of good advice in it, but nothing new for me.

Given that most of the cost of a project is in the maintenance phase, an investment in quality during the initial development phase can easily be recouped if it reduces subsequent maintenance costs.

The cost of quality assurance is visible, but the cost of low quality is rarely considered.

The speaker has been developing a model contract for Agile software services companies in France: http://contrat-agile.org/

IBM talk on mobile apps

Round-up of the different approaches for mobile dev: native, web, hybrid (I think this means running in the web browser but with components that give access to the native APIs), and cross-platform toolkits.

The talk’s very enterprise oriented of course. The challenges for businesses include the fact that mobile platforms are evolving faster than business development cycles. (Think of how often new versions of Android come out, and how often new Android devices are released.) The challenge for mobile development in general is platform fragmentation, which is why native development, though it gives the best result, is so expensive.

The second half of the talk was a plug for IBM’s mobile development sleep, and my thoughts wandered. When I said that we shouldn’t be forced to sit through sponsors’ talks, this was the kind of thing I meant.

Portrait du développeur en "The Artist"

A funny, fast-moving and close to the bone talk about the woes of corporate software developers in France. With a ray of light at the end.

According to the speaker, the balance between client and server swings every fifteen years or so and we’re in one of those swings. After many years of everything happening on the server and the client (browser) being a relatively dumb renderer, we’re now going back to a client-server model where there are real components on the browser and the server is just providing the data.

The end of the talk transitioned rather neatly into a plug for cloud computing in general and Cloud Foundry in particular. The speaker is from Cloud Foundry “The Open Paas” - open source, Apache license. This means that you can install your own private cloud, and that 3rd parties have added support for languages as divergent as PHP, Erlang…

Abstraction distractions - Neal Ford

Easily the best talk of the conference.

Note: by abstractions, we mean “mental models”.

The talk is about the dangers of forgetting that the abstractions we work with are abstractions only and not the real thing.

How do you store something so that it’ll be readable in 100 years? What format do you use?

Some really powerful examples to illustrate the following lessons.

Lesson 1: Don’t mistake the abstraction for the real thing Lesson 2: Always know the abstraction 1 level below your usual level. Lesson 3: Once internalised, abstractions are really hard to shake off. Lesson 4: Abstractions are both walls and prisons. Lesson 5: Don’t name things that expose underlying details. (E.g. the floppy disk icon on a “save” shortcut. E.g. the lpstr prefix on Windows API, meaning “long pointer to null terminated string”; when they moved to 32 bit the “long” part became obsolete.) Lesson 6: Good APIs are not merely high-level or low-level, they’re both at the same time. Lesson 7: Generalise the 80%, get the other 20% out of the way [check]

Re Lesson 6, a good abstraction should leak in well-defined ways -> onionskin abstractions. This means that when the abstraction fails you, it is easy and natural to go down to the level below and achieve what you need.

At this point he made a huge dig at Maven. He distinguished between composable tools vs contextual. I failed to pick up a clear definition of this dichotomy, and can’t find words to express it myself, though I sense approximately what it means. As examples, composable tools are bash, rake, gant, emacs, vi; contextual are powershell, ant, maven, eclipse or visual studio. Languages are composable, frameworks are contextual.

Contextual gives you behaviour out of the box, contextual intelligence, but less flexibility. Composable gives you implicit behaviour (not out of the box), but better building blocks, so more flexibility. I realise that the frameworks vs libraries debate is the same issue.

Another example of a contextual tool: MIcrosoft Access. In these tools, 80% of what the user needs is really easy. A further 10% is really hard because you have to armtwist the tool to get what you want. (I’m reminded of Hibernate here too.) The final 10% is just impossible because you can’t dig deep enough under the abstraction to get it.

When you’re using these tools, you’re reluctant to give them up when you have trouble, because you’ve invested in it. But every such tool has a tipping point, after which it is “never, ever wonderful again.” Keep an eye out for that tipping point, and leave the tool behind when it arrives.

Things that are composable and that are onionskin APIs: git, DSLs.

sizeOf in Java

This talk was given by one of the Terracotta staff working on ehcache, and explained the efforts they’ve made to measure size of Java objects as a prerequisite to dimensioning the cache by object size. I’m not 100% sure I got all the numbers below right.

An object on the heap, on top of its fields, has an object header of 2 words (8 bytes on a 32bit JVM).

However, its fields don’t consume exactly their own size: object size is aligned on word boundaries. So adding/removing a byte field (8 bits) might add/remove 32 bits to/from the object size - or have no effect at all.

The VM arranges object fields, first grouped by class in the hierarchy (parent first), then for each class starting with primitives in reverse order of size, and finishes with the OOPs (other object pointers). However if the first field in a subclass is a long, it has to be aligned and this can leave a “hole” between that long and the end of the parent class’s fields. In this case the JVM will squeeze in any smaller fields that will fit.

You can find out programmatically how an object is laid out. There’s some code to do it in the ehcache code repository.

Because of compression, objects don’t take 2x more size in 64bit JVMs. Pointers are only 32bit if they can all be fit into 4GB address space.

Another approach for finding object size in memory: Sun’s Unsafe class (Oracle/Sun JVM only).

Third approach is JVMTI -> java.lang.instrument.Instrumentation.getObjectSize(Object). You need a Java agent to obtain an instance of the Instrumentation interface… which means launching the JVM with a -javaagent argument: not always easy to get in production.

Ehcache didn’t want to use the java agent approach, so they used AttachAPI which was introduced in 1.6. It allows you to attach to a running Java process - including the same one you’re already in - and add an agent to it. But it’s not on the classpath under Windows or Linux - although it is there in the JDK (JRE?) install!

There’s one gotcha in all the above, which is that the CMS GC algorithm also adds an overhead to objects, used to flag whether they’re available for collection. So you also need to count that.

New version of Ehcache allows you to specify cache sizes (and total size limit for all caches) in terms of bytes rather than number of elements, using the parameter maxBytesLocalHeap. It has one (big) draweback: if an object appears in several cached object graphs, its size gets counted in every one. There is an annotation that you can use to tell it about shared objects.

TestNG parce que vos tests le valent bien (quickie)

Things I’ve tried: dependencies between tests (briefly), parameterized tests.

Things I haven’t tried: listeners and factories. Factories allow you to generate TestNG tests on the fly.

The ability to introduce dependencies between tests allows you to do stuff like, have a test that deploys the built webapp to Tomcat and then make your Selenium tests depend upon it. If the deployment fails, you don’t have 101 tests failing (1 for the deployment and 100 for Selenium). Only the genuinely failing test is marked failed, the others are skipped.

Guava

Base classes

Preconditions: checkNotNul(), checkArgument()… Essentially assertions for calling at entry into methods, except they throw NPE or IllegalArgumentException or IllegalStateException, instead of AssertionError.

Objects.toStringHelper() very similar API to Apache ToStringBuilder.

Stopwatch gives nanosecond elapsed (so you don’t have to subtract start from end) times.

An improved String.split(): Splitter with its own mini-fluent API for specifying detailed behaviour. Joiner concatenates Strings optionally skipping or defaulting nulls.

CharMatcher is a base class with a bunch of methods like removeFrom(), retainFrom()…

Optional<T> is (IIRC) an alternative to null with some API to replace the missing value by a default. Advantages/uses: makes it explicit that you’re dereferencing a potentially non-existing thing; allows you to differentiate “It’s really not there” from “I don’t know” cases; is a useful wrapper to put nullable values into non-null-accepting collections.

Function<F, T> is a one-way transformation from F to T. Predicate<F>… we know what that does, right? Common use: filtering collections.

Collection classes

FluentIterable with chainable methods: skip(), filter(), transform(), and querying methods: contains(), toImmutable() and extraction methods: first(), firstMatch()…

You can do FP with this, but it often ends up longer than imperative Java…

Some extra data structures: Multiset<E> (= bag) can have multiple instances of same element. Multimap<K, V> but values are collections (if no value for the key, it returns empty collection, never null). BiMap<K1, K2> maps both ways, both keys and values are unique. Table<R, C, V> is equivalent to Map<R, Map<C, V>>, with a choice of implementations (sparse or dense).

Immutable collections for all JDK and Guava collection types. (Unlike Collections.unmodifiableXyz(), it makes a totally immutable (shallow) copy rather than an unmodifiable view of a fundamentally modifiable collection.)

ComparisonChain allows chaining comparisons for Comparator, only executes the chain up to the first difference, and makes the comparison order explicit (you can see reverse comparisons easily).

Ordering is an alternative, more FP approach to comparing. You start with a basic Ordering and then use the fluent API to adjust it: reverse(), compound(Comparator), onResultOf(Function), nullsFirst() which sorts nulls at front, nullsLast()… At the end of the method chain a Comparator is returned. But you can also perform operations on Iterables, like asking if they are already ordered.

Hashing API

Object.hashCode() has some interesting limitations. When you compose hashcodes from multiple objects, the intermediate value is truncated to 32 bits at each stage. You really want to truncate just at the end. Also, it isn’t pluggable, i.e. doesn’t separate “what to hash” from “which hashing algorithm”. These limitations make it OK for in-memory hash maps but not other uses.

In Google’s hashing API you start by getting a hash algorithm, pushing the data into it and retrieving the hash at the end - the calling code looks very like ToStringHelper utilisation. The hashcode you get at the end is an object and you can retrieve int, long, etc from it.

There’s also a goodFastHash() which gives you the best current algorithm - values should not be persisted as they might be invalidated if the algorithm changes.

There’s a caching API.

.NET for Java developers

Not many people in the room. Looks like Java developers are as parochial as they’re made out to be.

.NET was polyglot right from the start (even if the start of .NET was a good many years after the start of the JVM…): now there’s C#, VB, F#.

C# has acquired a number of features that Java could reasonably be jealous of (as well as not having some things that Java does have). Lambdas, auto properties, anonymous objects… The yield keyword is rather nice. In an iterable, it lets you only construct the values when the iterator really visits them, rather than building a whole list in memory when you don’t know if you’re going to iterate through to the end.

var keyword is a syntactic sugar that activates type inference - it replaces the type declaration. E.g. “var thing = [expression that returns an int]” instead of “int thing = …”. (Like Scala only not so good, basically)

C# lambdas make collection manipulation far more succinct. They run through some examples of FP-style collection manipulation. Rather reminiscent of Scala…

Default methods on interfaces - hey, that sounds awfully like Scala traits, too.

Class properties with implicit getters and setters - why, that’s a bit like Scala or Groovy.

If the FP stuff using lambdas is too messy for you, there’s a query language LINQ that operates on collections. And the compiler knows what it means and static-type-checks it. Wow. Compare that to HQL. Erm, or JPA query language. Or JPA criterion API.

There are also dynamics, and I’m not quite sure what they are. But they appear to be useful in C#’s XML integration which looks really rather nice indeed.

They finished off by observing that, since syntax and VM details are a relatively small part of software development, .NET and Java development have more in common than they have differences, especially as process is concerned.

Living fossils, or, How do you preserve old development environments?

Thu, 21 Jul 2011 00:00:00 +0000

What’s the average lifespan of a corporate software application? I’m feeling too lazy to find a serious study, but a licked finger in the air says that a quarter or so live out a decade.

We all know that a decade is a long time in IT, but one tends to forget just how long, until one encounters a living fossil - an application that hasn’t evolved in the last ~~hundred million~~ ten years.

I recently came face to face with this - written back when Struts was a mere gleam in Craig McClanahan’s eye

Now, there are some applications that survive by evolving, through regular maintenance releases or new functionality, for years on end. Like living organisms, these applications become a hodgepodge of recent innovations and ancient parts.

Those are not living fossils. A living fossil is the kind of application that has run untouched in production for years, until suddenly someone somewhere (not a programmer, obviously) decides a label needs changing, or that the number of thingummies that can be added to a doobrey shall no longer exceed fifteen. And then suddenly, BANG! this thing arrives on your plate, as a developer, and you have to figure out how to modify it.

(In passing, I’d like to mention how deeply I hate being asked to do maintenance on an application of which I understand neither the code nor the business logic.)

And it’s at that moment that you realise the last development was done three years ago, the person who did it has left, and there is absolutely no documentation concerning how to run the application in a test server on the developer’s machine (it’s a web application). You can find out what application server it’s running on in production, but oops - the license for the IDE plugin to run that particular server locally expired years ago. Maybe it ran under Tomcat too? Ah no - it doesn’t. Should you spend an unknown amount of time figuring out how to get it to? How long was this maintenance estimated to last? Two days? How long have you already spent getting a grasp on things? Three hours? Hmm…

Of course, you don’t absolutely have to run it on your own machine. You could just build the deployable artifact and get it deployed to the dev environment and test your changes there. (Let’s assume, for the sake of argument, that there’s a working build script in the project that doesn’t depend on some artifact on your local machine that no-one thought to check in.) It won’t exactly make for a lightning fast code-execute cycle, but you only have a very minor change to make. (What’s that you ask? No, no unit tests, and not testable code. Don’t be silly.)

Oh, and wait a minute. The application servers are in a DMZ so you won’t be able to connect a debugger, and you’ll have to email the systems team every time you want a copy of the logs. Better cross your fingers that you guess right first time, eh? (Did I mention that you didn’t know anything at all about the code or its business logic until about two hours ago?)

The nub of the problem

I’m not only writing all of this to let off steam. The dismal story above must happen every day in hundreds of organisations around the world. The fundamental problem is that what’s under source control isn’t enough to continue developing an application; it only allows you to build it. To develop, you need tools, and often specific versions of those tools. But software tools change, and developers upgrade their machines and install new versions or different tools. If the organisation doesn’t take conscious countermeasures, it will find itself unable to make trivial modifications to its older and less-frequently-maintained applications, since the effort required to reconstitute a development environment is out of all proportion to the value of the enhancement. Or, in the worst case, as I discovered, it can become simply impossible: your license can expire and the supplier no longer be in business.

What happens in your organisation? Does this sort of problem occur? Has the organisation looked for solutions? Does it have a strategy? And if you’ve found a magic bullet to solve the problem, do write and tell me.

Postscript

My own recent experience wasn’t quite as bad as I made it sound, since the previous developer was still available and helped me out. And it all ended happily: I did guess right, and my change worked first time. I did manage to – whoops! – cause a blocking bug in production, but that was fixed within the hour, not least because I still had a working development environment on my machine.

The Elements of Style

Thu, 16 Jun 2011 00:00:00 +0000

William Strunk Jr. would have written beautiful code.

The Elements of Style, by Strunk & White, is a style guide for English prose, widely used in writing classes in the US but less known elsewhere. I was introduced to it many years ago when I was attempting to become a biologist: my PhD supervisor, in despair at the inscrutability of typical scientific writing, would implore all his students to read it.

Reading it again recently, I was struck by how applicable it is to programming. Sure, writing clearly is a useful skill for programmers, since all of us find ourselves doing technical writing from time to time. But I mean that it was striking how much of what makes good prose also makes good code.

Caution

Notwithstanding the qualities of the book, a word of warning is needed. Some of the book’s prescriptions, especially the most specific, are idiosyncratic (or, depending whom you believe, wrong.) They should be read, not as laws of grammar, but as opinionated style advice. They do not dispense with the need to apply one’s own judgement.

Clarity and concision

Every guide to good style preaches clarity and concision. Ernest Gower’s The Complete Plain Words springs to mind as an example in British English. But none puts its own advice into practice so completely as The Elements of Style.

Like The Pragmatic Programmer’s “Don’t repeat yourself”, the book’s maxim is just three words long: “Omit needless words.”

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that he avoid all detail and treat his subjects only in outline, but that every word tell.

One could add, “a software program no unnecessary lines of code.”

Further parallels

Several other, more specific, items of advice have obvious parallels in programming style:

Abbreviations and acronyms

Do not take shortcuts at the cost of clarity. ...Write things out. Not everyone knows that MADD means Mothers Against Drunk Driving, and even if everyone did, there are babies being born every minute who will someday encounter the name for the first time. They deserve to see the words, not simply the initials... Many shortcuts are self-defeating: they waste the reader's time instead of conserving it (p80, my emphasis).

Compare with Steve McConnell’s advice from Code Complete:

The most important consideration in naming a variable is that the name fully and accurately describe the entity the variable represents. An effective technique for coming up with a good name is to state in words what the variable represents. Often that statement itself is the best variable name. It's easy to read because it doesn't contain cryptic abbreviations, and it's unambiguous. (2nd edition, p260, my emphasis)

Negatives

Strunk & White:

Put statements in positive form. Avoid ... the weakness inherent in the word not... [T]he reader is dissatisfied with being told only what is not; the reader wishes to be told what is... [I]t is better to express even a negative in positive form. [For example:] “forgot” [is better than] “did not remember.” (p20-21)

Programmers soon learn the risks of cumulating negatives in boolean statements. Expressing negatives in positive form simplifies such situations, as recommended here:

Double negations (or worse) should be avoided. To help avoid double negations, boolean methods should be given positive names such as legalMove or gameOver, not negative ones such as illegalMove or gameNotOver.

Revise, rewrite, refactor

Two suggestions that call to mind refactoring:

Revise and rewrite. (p72)
Clarity, clarity, clarity. When you become hopelessly mired in a sentence, it is best to start fresh... Usually what is wrong is that the construction has become too involved at some point; the sentence needs to be broken apart and replaced by two or more shorter sentences. (p79)

Exactly the same could be said about writing method bodies. Cf Martin Fowler’s Refactoring, which is all about revising and rewriting code, why it should be done, and how to go about it.

Here are some more of Strunk & White’s recommendations that could equally well be taken as advice on composing and decomposing methods:

[R]emember that paragraphing calls for a good eye as well as a logical mind. Enormous blocks of print look formidable to readers, who are often reluctant to tackle them. Therefore, breaking long paragraphs in two, even if it is not necessary to do so for sense, meaning, or logical development, is often a visual help... Moderation and a sense of order should be the main considerations in paragraphing.

Keep related words together... The writer must... bring together the words and groups of words that are related in thought and keep apart those that are not so related. (p28)

Express coordinate ideas in similar form. This principal, that of parallel construction, requires that expressions similar in content and function be outwardly similar. The likeness of form enables the reader to recognize more readily the likeness of content and function. (p26)

Paragraph composition uses the same skills as breaking up long methods into understandable chunks (cf the Long Method smell in Refactoring, p76) - though methods benefit from a name, as though one were to place a subheading on every paragraph.

Cohesion

Keeping related words together (above) is closely related to the idea of cohesion in routines (Code Complete, p168), and to the “Single Level of Abstraction Principle” (SLAP). The first says that a routine should do one thing and one thing only. The second states that the lines of code in a method should all be expressed at the same level of abstraction: you shouldn’t mix, for example, lines expressing a business rule with lines having a purely technical significance like writing to a file. Rather you should refactor the lower-level code into its own method and give the method a name that is at the appropriate level of abstraction for the code from which you are calling it.

“Express coordinate ideas in similar form” crops up in Kent Beck’s Implementation Patterns (p15) as “symmetry”. (If you’ll excuse an off-topic hat-tip, it’s also reminiscent of Christopher Alexander’s principle of Alternating Repetition from The Nature of Order; Alexander’s work concerns architecture but has much influenced computer science.)

The underlying principle

Of far less obvious relevance to programming are the chapters dealing with such matters as the placement of commas, or the difference of meaning between “nauseous” and “nauseating.” And yet, and yet… What guides every single one of these recommendations – even the most whimsical – is an underlying principle: to bring the form of what you write as close as possible to its meaning. In programming, this principle is not merely relevant; it is critical. It is the foundation of comprehensibility. For this reason, I think that the most valuable aspect of the book is not its variously-reliable prescriptions on points of grammar, nor its excellent style advice, nor even the model of crisp writing that it provides. Its most valuable lesson is the adoption of clarity, precision and brevity as ideals to aim for. Infusing oneself with this spirit can do only good to one’s code, and it is for that reason that I commend the book to you, my fellow programmers.

Solve foreign-key problems in DBUnit test data

Tue, 15 Feb 2011 00:00:00 +0000

If you create small per-test datasets, as DBUnit advises, you’ll get intermittent build failures due to foreign-key violations.
This post explains (1) why this happens, (2) why small per-test datasets are still a good idea, and (3) one simple way to get around the problem.

NOTE A reader wrote that this solution no longer works as of DBUnit 2.5. I am leaving this post up, since it still gets hits, and may still be of some help finding a solution.

<em>NB When I searched for solutions to this problem, I discovered that other kinds of foreign-key problem come up with DBUnit.  Some people have circular dependencies in their relational database schemas, which stops DBUnit from loading the test data.  If such is your case, I'm sorry to say that this post won't help you with it, and your best option is probably to just take yourself outside and shoot yourself now.  (Although some people seem to chosen instead to disable foreign key checking during test runs.)</em>

What causes the foreign-key violations

The cause of the problem is simple, and illustrated by a trivial example. Suppose you have two entity classes, HitchHiker and SpaceShip. The HitchHiker table has a foreign key that references SpaceShip. The test data for HitchHikerDaoTest contains lines from both tables, whereas the test data for SpaceShipDaoTest contains only lines from SpaceShip.

DBUnit’s default setup operation, CLEAN_INSERT, wipes data from every table occurring in the test dataset and then inserts the lines listed in that dataset. When SpaceShipDaoTest runs, DBUnit will start by deleting everything in the SpaceShip table. If any HitchHikers are currently riding in the SpaceShips that are about to be deleted, the database will object to their untimely eviction (I’m not sure whether the error message will read like Vogon poetry, though).

If you start from an empty database, and execute SpaceShipDaoTest and then HitchHikerDaoTest, you’ll be fine; but if you do it in the other order, your build will fail. It’s that second-worst kind of bug, the unpredictable kind, since you don’t (usually) specify the order in which tests run. After all, they’re supposed to be independent! So you may well find that you have no problems for months on end, until one day you get an error running individual tests in a particular sequence, or Maven changes the order in which it runs your tests on the CI server, and BOOM!

Why you should still use small independent datasets

It’s tempting to circumvent the problem by using a single monolithic dataset for all your integration tests. I’ve tried this, and I advise against it. A big data file is hard to work with: you waste a lot of time scrolling around looking for the line you need, and it’s very hard to follow and understand foreign-key relations. Worse still: by modifying the data to make one test pass, you can easily accidentally break another one. The larger the dataset and the test suite become, the more fragile they get, and the more painstaking it becomes to modify them.

How to avoid the foreign-key problem with small independent datasets

One working but unsatisfactory solution would be to pad out every XML dataset with the list of all tables touched in the test suite. It’s unsatisfactory because the only way to add a table into a FlatXmlDataSet is to list a line of that table – a FlatXmlDataSet can’t contain empty tables – and there’s no justification for polluting the test data with lines from tables that are not part of the test.

The solution I found was to use a DTD to clean tables before tests. Every XML file has different contents, but they all reference a single DTD which lists all the tables involved in the test suite. The DTD is easy to generate from the database schema, and useful for auto-complete and catching typos in column names, so you should probably already be using one. The code to exploit its contents is very simple:

private IDataSet loadTestDataWithDtdTableList(String dtdFilename)
 throws IOException, DataSetException, SQLException {

    Reader dtdReader = new FileReader(new ClassPathResource(dtdFilename).getFile());
    IDataSet dtdDataset = new FlatDtdDataSet(dtdReader);
    FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder();
    builder.setMetaDataSet(new DatabaseDataSet(dbUnitConnection, false));
    IDataSet xmlDataset = builder.build(asFile(xmlFilename));
    return new CompositeDataSet(dtdDataset, xmlDataset);
}

How it works: DBUnit provides a facility to load a dataset from a DTD. This dataset contains all the tables listed in the DTD, but of course empty of data. The DTD dataset is then combined with a FlatXmlDataSet representing your test data. The graphic below illustrates the composite dataset that would be produced for the SpaceShip example.

If you have dictionary tables whose contents never change, you can and should leave them out of the DTD as well as out of the XML datasets, to improve test performance a little.

One further detail: you should close the FileReader after test setup. I couldn’t find a hook into the end of the test setup operation (short of writing my own DatabaseOperation), so I saved the reference as a member variable and hooked the close() call into the tear-down phase of the test.

NB For a more complete code example, see this Gist snippet of a base class for TestNG+Spring+DBUnit tests that adds the above-described DBUnit setup operation to Spring’s TestNG helper class.

Happy database testing!

Unit testing fundamentals

Tue, 01 Feb 2011 00:00:00 +0000

This article describes the principles of unit testing, rather than the technical details. It’s aimed at beginners. Experienced unit-testers will get little from it, except perhaps the pleasure of pointing out mistakes.

What is a unit test?

A unit test is a test that tests a single unit of functionality, in isolation from all others, for a single set of conditions, and has a binary pass/fail result. For object-oriented languages, that means testing a single method on a single class with a single set of parameters. It is written by the developer, at approximately the same time as the code, for its purpose is satisfy the developer that said code behaves as she believes it should. (I believe there are huge advantages to writing the unit test immediately before the code it tests, but that isn’t the subject of this article.)

In practice, it is impossible to test the code of a class in total isolation from all other classes. Even the simplest method uses classes from the Java standard library, not to mention classes from other libraries and classes from the application under development. With each of these dependencies, we have only two options. We can use it as is, or we can isolate the dependency by replacing it with a placeholder object, known as a stub, or mock, or fake, or double, or spy (the differences between these terms are subtle, and often ignored. I’ll elaborate shortly).

Placeholders

If a placeholder is used, it needs to be created for the purpose, and often told how to behave. This needs to be done within the test code, before calling the method under test. After calling the method under test, you may also need to interrogate your placeholder to find out whether it was called as you expected.

Is it a mock or a stub?

A short terminological digression: Whether your placeholder is strictly speaking a mock, stub, or spy depends on whether you tell the placeholder how to behave, or ask it how it was treated.

If you simply tell it how to behave in certain circumstances (e.g. "always return 3 from countFoos()"), it's a stub.
If you don't much mind how it behaves, but interrogate it afterwards to find out whether certain methods were called on it, and with what parameters, it's a spy.
If you do both, and especially if you construct the object such that it will explode if it is not called according to your expectations, then it's a mock.
"Double" is a catch-all term for placeholder objects.

(Here’s a brief article from Martin Fowler outlining the taxonomy, though I’m not convinced anybody really uses the terms “fake” and “dummy” with as much precision as those definitions provide.)

Off-the-shelf packages

Occasionally you find off-the-shelf packages of mock objects or stubs provided to assist testing of commonly-used classes (such as network or UI packages). Be wary of them: such a static mock framework is rather a heavyweight dependency to add to your project, and nothing guarantees that it will be maintained as assiduously as the product it mocks. You might easily find yourself trapped in an obsolete version of the product, because the mock framework is no longer maintained and you can’t afford to rewrite all the test code that uses it. Furthermore, modern on-the-fly mock generators have reduced the amount of work that static mock frameworks can save you.

Mock object generators

I recommend, then, that you to generate mocks using a mock-object generator, and for Java the generator I recommend is Mockito. That is the only recommendation for a specific technology you’ll find in this article, and I’m making it because (as described in this presentation), Mockito’s approach where you specify only those behaviours you need, and afterwards verify only those interactions you need, encourages less brittle and easier-to-read tests, as compared to an older mock generator like EasyMock which requires you to specify all behaviours and expected interactions up-front.

Masochism

Of course, you can also write your placeholders entirely by hand. But the tedium of it will just drive you to write your own mock-object framework. Don’t do this (unless, that is, the framework you write is so good that you release it and it supplants all the existing ones).

Dependency isolation requires dependency injection

Since we want to keep test and production code separate, we can only replace a dependency by a mock or stub during testing if that dependency is injected into the object under test. We cannot replace those that the tested object instantiates or retrieves explicitly by itself.

Thus, isolating dependencies involves extra effort: design effort in the tested class, to ensure that its dependencies are injectable, and coding effort in the test code, to prepare the mocked dependencies. When, then, should we do it, and when should we content ourselves with the real object? We can find the answer by looking at what we aim to achieve with a unit test.

The utility of unit tests

Each unit test tells us whether a particular method call with particular parameters behaved as expected, or not. One use of this is to detect bugs. But integration and acceptance tests will also detect bugs. The particular value of a unit test is that:

It helps us find the cause of a bug, since whenever a test fails, the incorrect code must necessarily be in the tested method - insofar as we have taken the trouble to isolate the tested method.
It finds bugs as soon as they are written, insofar as we launch them very frequently during coding (to encourage which, they need to be quick of execution).

Which dependencies to isolate?

To optimise our effort, we should therefore aim to eliminate those dependencies that might:

make the test fail when the tested class is correct (or make it pass when the tested class is faulty), thus reducing the value of the information that a test failure gives us
- Uncertain behaviour: classes that we don't trust to be bug-free (typically the ones we've written ourselves), or whose behaviour we don't fully understand (this sometimes happens when getting to grips with complex UI or persistence frameworks)
- Indeterminate behaviour: most often, classes whose behaviour is time-sensitive, such that a particular behaviour cannot predictably be reproduced with the original dependency.
- States that are difficult to trigger: for example, we might want to test how our class behaves when a write to the filesystem throws an IOException.
slow down the test, thus reducing the incentive to launch tests often
- Accesses to underlying resources: database, remote webservices, the filesystem...

Armed with these criteria, we can decide, for each dependency in a class, whether to mock. In practice, we don’t even need to think about it in many cases:

Should not be mocked: objects from most library classes, especially those in java.lang and java.util, apart from the exceptions below; any objects with no real behaviour, such as parameter objects, data transfer objects and other bean-type objects.
Should be mocked: persistence classes; objects that provide user input; objects backed by network resources; timers; anything not trusted to be totally bug free (i.e. outside of a well-used library class); anything whose behaviour may change, such as a service returning real-world data whose content varies from one day to another
Grey area: anything that is merely more difficult to obtain or set up than the equivalent mock/stub, but not less reliable or significantly slower to run. The maintenance burden of creating mocks for a test needs to be balanced against the difficulty of using the real object. Further, using the real object makes it more likely that the dependency will behave under test as it does in production.
Integration tests: when it's impossible to isolate dependencies
There is one further limitation to isolating dependencies. In some cases it is technically impossible either to dependency-inject or to mock:
- you can only use DI with an object whose instantiation you control. Any framework that creates objects itself, even if the class of the created object is one you've written, does not give you an opportunity to inject dependencies.
- you can only create mocks of non-final classes (which is another reason no-one mocks java.lang.String, aside from the pointlessness of such an act). In these cases, your only option is to do an integration test. Integration tests, by my definition, are any tests written by developers, as unit tests are, that produce an pass/fail result, as unit tests do, but that cover more than one significant class in the system or its libraries. If you test a data access object with a real underlying database, that is an integration test, since you are simultaneously testing your DAO code (sometimes minimal or auto-generated), your ORM mapping and your database schema. And - not the least important - you're testing their coherence with respect to each other. Good practice for integration tests is to have a the minimal set necessary to make sure that different elements like ORM mapping and database schema are coherent. You need at least this much, because integration tests will catch things that unit tests never can: incompatibility between elements (a property added in a persistence entity bean and the ORM mapping, but missing from the database...) and configuration errors. You don't want any more than that, because integration tests tend to require more work to set up and maintain -- and take more time to run.

Maven profile best practices

Wed, 05 Jan 2011 00:00:00 +0000

Maven profiles, like chainsaws, are a valuable tool, with whose power you can easily get carried away, wielding them upon problems to which they are unsuited. Whilst you’re unlikely to sever a leg misusing Maven profiles, you can cause yourself some unnecessary pain. These three best practices all sprung from first-hand, real-world suffering:

The build must pass when no profile has been activated
Never use <activeByDefault>
Use profiles to manage build-time variables, not run-time variables and not (with rare exceptions) alternative versions of your artifact

I’ll expand upon these recommendations in a moment. First, though, let’s have a brief round-up of what Maven profiles are and do.

Maven Profiles 101

A Maven profile is a sub-set of POM declarations that you can activate or disactivate according to some condition. When activated, they override the definitions in the corresponding standard tags of the POM. One way to activate a profile is to simply launch Maven with a -P flag followed by the desired profile name(s), but they can also be activated automatically according to a range of contextual conditions: JDK version, OS name and version, presence or absence of a specific file or property. The standard example is when you want certain declarations to take effect automatically under Windows and others under Linux. Almost all the tags that can be placed directly in a POM can also be enclosed within a <profile> tag.

The easiest place to read up further about the basics is the Build Profiles chapter of Sonatype’s Maven book. It’s freely available, readable, and explains the motivation behind profiles: making the build portable across different environments.

The build must pass when no profile has been activated

(Thanks to Nicolas Frankel for this observation.)

Why?

Good practice is to minimise the effort required to make a successful build. This isn’t hard to achieve with Maven, and there’s no excuse for a simple mvn clean package not to work. A maintainer coming to the project will not immediately know that profile wibblewibble has to be activated for the build to succeed. Don’t make her waste time finding it out.

How to achieve it

It can be achieved simply by providing sensible defaults in the main POM sections, which will be overridden if a profile is activated.

Never use <activeByDefault>

Why not?

This flag activates the profile if no other profile is activated. Consequently, it will fail to activate the profile if any other profile is activated. This seems like a simple rule which would be hard to misunderstand, but in fact it’s surprisingly easy to be fooled by its behaviour. When you run a multimodule build, the activeByDefault flag will fail to operate when any profile is requested, even if the profile is not defined in the module where the activeByDefault flag occurs.

(So if you’ve got a default profile in your persistence module, and a skinny war profile in your web module… when you build the whole project, activating the skinny war profile because you don’t want JARs duplicated between WAR and EAR, you’ll find your persistence layer is missing something.)

activeByDefault automates profile activation, which is a good thing; activates implicitly, which is less good; and has unexpected behaviour, which is thoroughly bad. By all means activate your profiles automatically, but do it explicitly and automatically, with a clearly defined rule.

How to avoid it

There’s another, less documented way to achieve what <activeByDefault> aims to achieve. You can activate a profile in the absence of some property:

[xml]

!foo.bar

[/xml] This will activate the profile “nofoobar” whenever the property foo.bar is not defined.

Define that same property in some other profile: nofoobar will automatically become active whenever the other is not. This is admittedly more verbose than <activeByDefault>, but it’s more powerful and, most importantly, surprise-free.

Use profiles to adapt to build-time context, not run-time context, and not (with rare exceptions) to produce alternative versions of your artifact

Profiles, in a nutshell, allow you to have multiple builds with a single POM. You can use them in two ways:

To adjust how you build: that is, to adapt the build to variable circumstances (developer's machine or CI server; with or without integration tests) whilst still producing the same final artifact, or
To adjust what you build: that is, to produce variant artifacts.

We can further divide the second option into: structural variants, where the executable code in the variants is different, and variants which vary only in the value taken by some variable (such as a database connection parameter).

If you need to vary the value of some variable at run-time, profiles are typically not the best way to achieve this. Producing structural variants is a rarer requirement – it can happen if you need to target multiple platforms, such as JDK 1.4 and JDK 1.5 – but it, too, is not recommended by the Maven people, and profiles are not the best way of achieving it.

A common case where profiles seem like a good solution is when you need different database connection parameters for development, test and production environments. It is tempting to meet this requirement by combining profiles with Maven’s resource filtering capability to set variables in the deliverable artifact’s configuration files (e.g. Spring context). This is a bad idea.

Why?

It's indirect: the point at which a variable's value is determined is far upstream from the point at which it takes effect. It makes work for the software's maintainers, who will need to retrace the chain of events in reverse

It's error prone: when there are multiple variants of the same artifact floating around, it's easy to generate or use the wrong one by accident.

You can only generate one of the variants per build, since the profiles are mutually exclusive. Therefore you will not be able to use the Maven release plugin if you need release versions of each variant (which you typically will).

It's against Maven convention, which is to produce a single artifact per project (plus secondary artifacts such as documentation).

It slows down feedback: changing the variable's value requires a rebuild. If you configured at run-time you would only need to restart the application (and perhaps not even that). One should always aim for rapid feedback.

Profiles are there to help you ensure your project will build in a variety of environments: a Windows developer’s machine and a CI server, for instance. They weren’t intended to help you build variant artifacts from the same project, nor to inject run-time configuration into your project.

How to achieve it

If you need to get variable runtime configuration into your project, there are alternatives:

Use JNDI for your database connections. Your project only contains the resource name of the datasource, which never changes. You configure the appropriate database parameters in the JNDI resource on the server.
Use system properties: Spring, for example, will pick these up when attempting to resolve variables in its configuration.
Define a standard mechanism for reading values from a configuration file that resides outside the project. For example, you could specify the path to a properties file in a system property.

Structural variants are harder to achieve, and I confess I have no first-hand experience with them. I recommend you read this explanation of how to do them and why they’re a bad idea, and if you still want to do them, take the option of multiple JAR plugin or assembly plugin executions, rather than profiles. At least that way, you’ll be able to use the release plugin to generate all your artifacts in one build, rather than only one of them.

Consider also Maven's per-user settings

Per-user settings are a bad idea in most cases, because the whole objective of the exercise is to have all artifacts under source control or in a Maven repository, such that the build can be replicated on any machine. However, when you want persistence tests to run in a different database schema for every developer, Maven’s per-user settings file (~/.m2/settings.xml) is a sensible alternative to profiles. In this case, you really do want the project to build differently depending on who runs the build. If you do this, make sure you still provide working default values in the POM itself (they will be over-ridden by user settings), such that builds will still work with an empty ~/.m2/settings.xml.

(Thanks to Eric Fitchett for this suggestion.)

Autogenerated comments rant

Thu, 21 Oct 2010 00:00:00 +0000

Let me clear up potential confusion right at the start. This rant is not auto-generated. It is entirely hand-crafted. Auto-generation of comments is its object. What I have to say about this abomination can be summed up in six words: why, why, why, why, and why? Oh, and a seventh: <h3>WHY?</h3>

I am talking about those handy little JavaDoc comments that many well-known IDEs thoughtfully generate for you along with JavaBean-style property accessors, new classes, and indeed anytime a “wizard” (wash my mouth out with soap and water) gets its hands on your project. Here’s a particularly heinous example I found at work lately. Pseudonyms have been used.

package bet3.gov.it.abc;

// import statements

/**
 * <p>
 * Coordonnée physique
 * </p>
 * <p>
 * <b>© Copyright 2010 B3IT - Betelgeuse 3 World Government.</b>
 * </p>
 * <p>
 * <b>Société</b> : B3IT - Betelgeuse 3 World Government {@link <a href="http://it.gov.bet3"> B3IT - Betelgeuse 3 World Government </a>}
 * </p>
 * <p>
 * <b>Projet</b> : abc-service
 * </p>
 * <p>
 * <b>Historique des modifications</b> : <br>
 * <br>
 * 4 janv. 2010 - création du fichier. <br>
 * <!-- date - {@link <a href="">lien vers JIRA</a>} --> <br>
 * </p>
 * 
 * @author bloggsj
 */
public class Coordonnee implements Serializable {

    /** La constante serialVersionUID. */
    private static final long serialVersionUID = -6370799192505622281L;

    /** Le/la email. */
    private String email;

    /** Le/la fax. */
    private String fax;

    /** Le/la id. */
    private long id;

    /**
     * Permet d'obtenir la valeur de "email".
     * 
     * @return la valeur de "email"
     */
    public String getEmail() {
        return email;
    }

    /**
     * Permet d'obtenir la valeur de "fax".
     * 
     * @return la valeur de "fax"
     */
    public String getFax() {
        return fax;
    }

    /**
     * Permet d'obtenir la valeur de "id".
     * 
     * @return la valeur de "id"
     */
    public long getId() {
        return id;
    }

    /**
     * Affecte à  l'objet la valeur "email".
     * 
     * @param email la nouvelle valeur de "email"
     */
    public void setEmail(String email) {
        this.email = email;
    }

    /**
     * Affecte à  l'objet la valeur "fax".
     * 
     * @param fax la nouvelle valeur de "fax"
     */
    public void setFax(String fax) {
        this.fax = fax;
    }

    /**
     * Affecte à  l'objet la valeur "id".
     * 
     * @param id la nouvelle valeur de "id"
     */
    public void setId(long id) {
        this.id = id;
    }

    // umpteen other properties left off for brevity - I think you got the picture anyway

}

The purpose of a comment is to help understand the code. It does so by providing information that is either missing or difficult to deduce from the code.

Let us not forget that a comment has a cost. It adds bulk to source code, hindering the reader. It adds extra text to modify in the event of maintenance (though refactoring tools may help). It must justify that cost by the value it adds.

A comment that simply repeats information in the method’s signature cannot possibly have any added value. Most IDEs already display such information, through syntax highlighting, autocomplete and tooltips, more prominently than they do the accompanying comment.

Now, I know that you can switch off comment auto-generation and get comment-free generation of accessor methods. But the auto-generation is switched on by default, and most developers don’t switch it off. This brings us to my opening question: why? Why do IDE makers think there is any point in auto-generating comments at all? Why on earth do they think it should be switched on by default? And why, why, why don’t most developers switch it off? What can they possibly have between their ears, to be capable of thinking that it’s a good idea to have a source file three-quarters made of completely useless noise? Java is verbose enough already. There’s no need to go adding yet more unnecessary guff.

Softshake compte rendu

Wed, 20 Oct 2010 00:00:00 +0000

J’étais lundi à SoftShake, une conférence réunissant des courants différents mais assez complémentaires de l’informatique. En parallèle il y avait des présentations Java, Agile, iPhone et Incubateur, cette dernière catégorisation regroupant tout ce qui méritait d’être présenté mais qui ne tombait pas sous l’égide de l’un des autres sillons [NdT : je suppose que c’est comme ça qu’ils disent “track” à l’Académie Française].

Il y avait des sujets intéressants dans toutes les sessions, mais j’ai surtout fréquenté la salle Incubateur, avec une petite interlude Java. J’ai laissé de côté la session Agile, avec regret, et le sillon iPhone, sans regret. (J’ai déjà acquis les bases en Agilité, et je ne travaille pas avec iOS.)

Voici un petit compte rendu de deux des présentations qui m’ont le plus marqué.

Dominic Williams : Le développement durable

J’avais déjà vu et aimé la présentation de Dominic sur le développement hédoniste à XP Day Paris en 2009. J’avais donc de bons espoir pour celle-ci, d’autant plus que le développement durable (dans les deux sens du jeu de mots) est un sujet qui me tient à coeur. Je n’ai pas été deçu : c’était limpide, bien étoffé, réfléchi et qui donne à réfléchir.

Dominic a commencé par un tour d’horizon du bilan écologique et social de notre économie et société actuelles : réchauffement climatique, pollution, inégalités entre les sexes et entre les hommes et femmes, maladies dues au stress professionnel… Ensuite il est passé au bilan de l’industrie informatique : génération des déchets toxiques, consommation d’énergie, etc. L’empreinte du secteur, quoique non prépondérante, n’est pas non plus négligeable. Il nous a dessiné les tendances, qui vont vers l’économie d’énergie (Green IT) et l’élimination des substances toxiques (rendue obligatoire en Europe par RoHS il y a deux ans déjà).

Ensuite – et c’est la partie la plus intéressante – il a donné quelques actions que nous pouvons prendre, individuellement et en tant que secteur, pour réduire notre empreinte et notre bilan social. Individuellement, nous pouvons :

Conserver et réutiliser le vieux matériel (exemple : mettre un vieil écran peu performant dans une salle serveurs où il sera rarement utilisé)
Paramétrer correctement les mises en veille
Choisir du matériel peu gourmand, c'est-à-dire non surdimensionné vis-à-vis du besoin réel Etre regardant sur les datacentres, exiger un reporting de son fournisseur
Se renseigner sur la "note" environnementale des fabricants de matériel et des fournisseurs de services (Greenpeace fournit des classements)
Utiliser des technologies qui exigent moins de puissance (par exemple, JEE demande du matériel plus puissant que PHP+MySQL... mais c'est pas pour autant que vous me retrouverez à faire du PHP). Le cloud computing peut aider sur ce point dans la mesure où il aide à mutualiser les infrastructures.

En regardant plus largement notre travail, Dominic a pu ensuite faire un peu de pub pour l’Agilité. Si on peut améliorer l’efficacité de l’équipe, on va générer plus rapidement de la valeur métier – d’ailleurs c’est un des objectifs premiers des processus Agiles – et donc notre empreinte écologique, pour un résultat équivalent, sera moindre. Pour générer rapidement de la valeur métier, il faut faire du développement itératif et incrémental.

Un autre objectif de l’Agilité est de respecter les développeurs et autres acteurs en tant qu’êtres humains. Dominic a parlé du Hawthorne Effect, un phénomène qui a été constaté comme quoi le fait de s’intéresser au leur processus métier augmente la productivité des travailleurs. L’Agilité implique davantage les développeurs dans leur propre processus, mais également veut se mettre davantage à l’écoute du métier du client.

Dominic a ensuite suggéré de ne pas perdre l’art de l’optimisation : faire attention aux goulets d’étrangement ; être conscient de la performance des algorithmes que l’on utilisé (il est tentant de traiter la mémoire et le CPU comme illimités) ; et plus largement, comprendre le fonctionnement de la mémoire, de l’ordinateur, du réseau, afin de ne pas gaspiller leurs ressources.

On peut tirer le bilan écologique et social de son produit. Par exemple, les logiciels anti-spam ont un excellent bilan ecologique si l’on tient compte des ressources qu’ils permettent d’économiser, qui autrement seraient dépensées à traiter ledit spam. (D’autres exemples sur SMART2020.)

Sur l’amélioration des aspects sociaux de la vie professionnel, plusieurs suggestions, dont j’ai retenu la notion de “manager son manager” et la piste de la structure co-opérative (en France, il s’agit du statut de SCOP).

Pour finir sur une note d’espoir, Dominic a fait valeur que l’informatique a le potentiel d’aider les autres secteurs à réduire leur empreinte, en les aidant à automatiser, à être économe…

Vaadin

Vaadin est un framework qui permet de faire des applications Web, en écrivant du code Java, et uniquement du Java, avec une API d’un style similaire à Swing (en fait l’API semble plus lisible que celle de Swing). Joonas Lehtinen nous a fait une rapide présentation de l’historique du projet et de son principe de fonctionnement. Il s’agit de “RIA thin client”, c’est-à-dire que la plupart des choses se passent côté serveur, et sur le client vous avez les widgets de présentation qui ont des canaux de communication avec les objets sur le serveur. Chaque widget est représentée par deux classes : une classe écrite en Java qui sera compilée en bytecode et déployée sur le serveur, qui est celle avec laquelle on code, et une classe déployée sur le navigateur, écrite aussi en Java mais traduite en Javascript par GWT. Tout cela est une boîte noire pour le développeur, et on code (presque) comme si on codait une application s’exécutant entièrement sur le serveur.

Une conséquence de ce modèle est qu’on a accès à l’ensemble des bibliothèques JSE et JEE, et non pas un petit sous-ensemble comme c’est le cas avec GWT. Une autre conséquence est que la charge sur le serveur est plus importante que pour une application où beaucoup des fonctionnalités s’exécutent en Javascript sur le navigateur. (Voici une présentation sur la question.)

Une bonne moitié de la présentation a été une démonstration de codage en temps réel d’une petite application web avec la librairie Vaadin. C’est toujours fascinant de regarder quelqu’un d’autre coder en direct, surtout quand c’est un expert, et j’ai beaucoup aimé cette démo. De plus, cela illustre très concrètement la limpidité de l’API Vaadin.

Alors j’ai bien envie d’essayer ce framework, d’autant plus que l’écosystème semble assez riche. Il y aurait une rubrique “démo” sur le site où l’on peut jouer avec des mini-apps et étudier leur code source. Ils ont un wizard de création de thème, comme le ThemeRoller de JQuery UI, où on peut personnaliser l’un des thèmes par défaut et ensuite télécharger une définition de ce thème personnalisé. On peut créer ses propres widgets, et il existe déjà un catalogue de composants fournis par des tiers, qui sont publiés sur un repository Maven. Il y a un outil VaadinTestBench dérivé de Selenium avec l’ajout de capabilités pertinentes à Vaadin… Et apparamment le framework est même déployable sur Google App Engine.

Voilà : une excellente conférence, et je remercie vivement les organisateurs et les présentateurs de l’énorme travail qu’ils ont accompli.

Border Crossings

Devoxx France roundup

What I got out of it

What I didn't get out of it

The high point

Is it worth going next year?

What should they keep next year? What should they change?

Fier d'être développeur - Pierre Piezzardi

Manipulation de bytecode pour les nuls

Invokedynamic them all - Rémi Forax

Kanban pour les nuls

Behind the scenes of day-to-day development at Google (Petra Cross)

Pour un développement durable

IBM talk on mobile apps

Portrait du développeur en "The Artist"

Abstraction distractions - Neal Ford

sizeOf in Java

TestNG parce que vos tests le valent bien (quickie)

Guava

Base classes

Collection classes

Hashing API

.NET for Java developers

Living fossils, or, How do you preserve old development environments?

The nub of the problem

Postscript

The Elements of Style

Caution

Clarity and concision

Further parallels

Abbreviations and acronyms

Negatives

Revise, rewrite, refactor

Cohesion

The underlying principle

Solve foreign-key problems in DBUnit test data

What causes the foreign-key violations

Why you should still use small independent datasets

How to avoid the foreign-key problem with small independent datasets

Unit testing fundamentals

What is a unit test?

Placeholders

Is it a mock or a stub?

Off-the-shelf packages

Mock object generators

Masochism

Dependency isolation requires dependency injection

The utility of unit tests

Which dependencies to isolate?

Integration tests: when it's impossible to isolate dependencies

Maven profile best practices

Maven Profiles 101

The build must pass when no profile has been activated

Why?

How to achieve it

Never use <activeByDefault>

Why not?

How to avoid it

Use profiles to adapt to build-time context, not run-time context, and not (with rare exceptions) to produce alternative versions of your artifact

Why?

How to achieve it

Consider also Maven's per-user settings

Further reading

Autogenerated comments rant

Softshake compte rendu

Dominic Williams : Le développement durable

Vaadin