In any big system that has formed the basis of a companies core business it reaches a point where compromises are made and these eventually come to a head.
These compromises usually manifest as divergence within a code base (we’ll branch this, just temporarily) or as duplication (a quick copy and paste for now). All of these are invariably made with good intentions and pretty much always to support the business – but for one reason of another they never get fixed again.
Add layer upon layer of these compromises and one factor is almost certainly inevitable – there will be a big refactor or rewrite project.
Now that we’ve got to this point and we have some understanding of the history of how we got here we seem to have 2 options (3 if we include carrying on as we are). So which one?
The Big Rewrite
This is the one that always seems so appealing to everyone. It gives us an opportunity to fix everything that we have learnt from past mistakes. We now understand the system much more and can enable all the features everyone wants.
But The Big Rewrite approach tends to fail. As developers we underestimate the size of the system (even though we may know the system very well) and we overestimate our abilities. This, coupled with the businesses need to still sell and use the system means we end up in a race condition. We are racing to make the new system as feature complete as possible while still supporting, maintaining and extending the old system. Often with dedicated teams on each system (with the ‘elite’ team being the big rewrite team).
I’ve yet to see a big rewrite successfully pay off. I have however seen a big rewrite cause many more problems than it attempts to resolve, often resulting on some customers on the new but with less features system and other customers on the older but more feature complete system. Sometimes this divergence goes even further (a branch of the not yet complete big rewrite system).
Refactor
As you can probably guess, this is the approach I favour. Rather than rewrite instead refactor the system as you go, but even then this approach can sometimes end up in a similar situation as the big rewrite approach. One of the main reasons for this is that to successfully refactor we need to make sure we have a crucial enabler in place – the migration.
If we do refactor, or we do manage to pull off the big rewrite (and I’m sure it does happen) then what? We now find we have all these customers on our old system and we want to migrate them to the new system. If we can’t do this then, after all, what’s the point?
But the migration itself is now seen as a big risk. We have to migrate customers data across to new databases, invariably stopping customer access to the system as we do it, which means picking a convenient timeframe to do it in (weekends being the obvious choice) and what do we do if it fails?
I’ve seen this big migration at the end cause many good intentions to go sour. What has happened in the past is that many customers never migrate across, for whatever reason, and now the new system and the old system continue on – dividing further and further apart as they go resulting in yet another code base to maintain.
Migrate Now
Which leads on (eventually) to the point of this post. Before attempting either rewrite or refactor, see if there’s a way of actually migrating to the new system now. Yes, right now.
It seems implausible as we haven’t even started on a new system yet but it may actually be much simpler than you realise (and in many ways is more of a mindset change than anything else).
In a web system recently, the simplest way to achieve this change would be to introduce a proxy (or what we were calling a ‘shim’) which would do nothing more than to proxy requests down to the underlying system.
But actually, it did much more than this.
The mindset change
By migrating now we are now on the new system. We have migrated all our customers with no loss of service and it just works (assuming it does that is). All we are doing is proxying the requests through.
But we now don’t have multiple teams, one on support and one on the new system, we just have the one new system team as we don’t have any customers on the old system.
We’ve also avoided the big migration at the end – with all the inherent risk, and more importantly, without leaving some customers on the old system, so no divergence. Now we have one code base going forward, with all our customers already using it.
What now?
Ok – so technically nothing has changed. We still have 2 or more systems underneath even if we have one system from above. So what is the next step? One next step would be to start identifying static content that is shared between systems. This could be just a help page, or a contact us page, or something of that nature. Whatever it is, we can now promote that duplicated static content up to the ‘shim’ itself.
This small change forces us to make some decisions. Where do we put this static content? How do we serve it up through the ‘shim’? Hopefully these will be fairly easy to solve issues. That’s the point though – think small. Small baby steps.
Once it’s promoted up we can switch the proxy to redirect to the new shim provided content. If there are any issues we should be able to redirect the proxy back to the old systems so we are only ever a small step away from rolling back to how it was before.
Once any kinks are ironed out and it is going through the proxy to the shim hosted content ok – delete the now redundant static content from the old systems. By deleting the redundant content we’ve, very slightly, reduced what we need to port across.
More baby steps
Next, look at ‘almost’ static content. Some content – again, something like help pages or contact us pages, that is almost shared but not quite the same. Look at how a basic templating engine (string template, spark or nvelocity for instance) could be used along with a basic model to hold the changes. Now again deal with the same small questions such as where the templating engine will sit, how will the content be served up, where do we get the model data from (and what is the identifier for each of the ‘below the shim’ systems). These should again be simple enough questions to find an answer for. Again, the point here is still very small changes.
Now again, when this almost static content is promoted to the shim, and the redirect through the proxy to the shim hosted content is working, delete it from the old systems.
At this point we’ve managed to nibble at the edge of our problem but by now the mindset change should have been reinforced. We are looking at how we can promote small parts of the existing systems into the shim itself, and once complete, deleting those small parts from the old systems. If we ever get a problem, we should only be a small redirect away from using the old system again until we resolve any issues and can redirect to the new parts of the system.
A bigger nibble
I was going to continue this on but its already got way too wordy as it is (and it’s close to midnight) so I will continue these thoughts on in another post. Now, given my current trend in posting, this may be in a few months time but bear with me 🙂