Thinking about microservices? Start with your database

Hang on, the database? Ahem, I think you’ll find that service contexts should be vertical, not horizontal, that is, based on function, not technology. So, nyah-nyah.

Let me explain.

This post isn’t for you if you happen to be Netflix. Nor is it for you if you’re writing a greenfield application with microservices in mind. No, this post is for the team maintaining a chunk of EnterpriseWare, something relatively dull yet useful and (presumably) profitable. This team has been toying with the idea of microservices, and may well be drawing up elaborate Visio diagrams to define the new architecture.

So, what does this have to do with databases?

A database is a service. In Windows land, it literally is a Windows service (mssql.exe, in point of fact), but in more general terms, it provides a service for managing an application’s data.

A database with a relatively simple schema, where the application layer controls most of the SQL code and most operations are simple CRUD, is more like a RESTful API. A database with a more complicated schema (parent-subcategories, multiple tables involved in atomic transactions) that provides complex stored procedures for the application(s) to use is more like a traditional RPC service. Either way, these things are services with a well-known API that the application uses.

Behold, the monolith!

Your basic monlithic codebase will hopefully include both the source code for the database and the application.

If every release requires your DBA to manually run ad-hoc SQL scripts, then you have far bigger problems to address than microservices. Version your migrations and set up continuous deployment before you start looking to change architecture!

Typically, a new version of the product will involve both database changes, and application changes, for example: adding a new field to a business entity, which may touch the table schema, stored procedure(s), the application tier and the UI layer. Without the schema changes, the app will not function. Without the application changes, the schema changes are, at best, useless, and at worst, broken.

Therefore, deployments involve taking the application down, migrating the database (usually after taking a backup), and then deploying the application code.

&nonbreakingchanges;

Microservices imply an application that composes multiple, independent services into a cohesive whole, with the emphasis being on “independent”. For a team with no real experience of this, a useful place to start is the database. Not only will it teach you valuable lessons about managing multiple parts of your application separately, but, even if you don’t decide to go down the microservice rabbit-hole, you will still have gained value from the exercise.

So, what exactly are we talking about? In practical terms:

  • Create a new repository for your database source code.
  • Create a separate continuous integration/deployment pipeline for your database.
  • Deploy your database on its own schedule, independent of the application.
  • Ensure that your database is always backwards-compatible with all live versions of your application.

Now, this last part is the hardest to do, and there’s no silver bullet or magic technique that will do it for you, but it is crucial that you get it right.

Whenever you make a schema change, whether that be a change to a table or stored procedure or whatever, then that change must not break any existing code. This may mean:

  • Using default values for new columns
  • Using triggers or stored procedures to fake legacy columns that are still in use
  • Maintaining shadow, legacy copies of data in extreme circumstances
  • Creating _V2, _V3 etc. stored procedures where the parameters or behaviour changes

The exact techniques used will depend on the change, and must be considered each time the schema changes. After a while, this becomes a habit, and ideally more and more business logic moves into the application layer (whilst storage and consistency logic remains the purview of the database).

Let’s take the example of adding a new column. In this new world, we simply add a database migration to add the new column, and either ensure that it’s nullable, or that it has a sensible default value. We then deploy this change. The application can then take advantage of the column.

Let’s take stored procedures. If we’re adding new optional parameters, then we can just change the procedure, since this is a safe change. If, however, we are adding new required parameters, or removing existing ones, we would create MyProcedure_V2, and deploy it.

Let’s say we want to remove a column – how do we do this? The simple answer is that we don’t, until we’re sure that no code is relying on it. We instead mark it as obsolete wherever we can, and gradually trim off all references until we can safely delete it.

And this benefits us… how?

The biggest benefit to this approach, besides training for microservices, is that you should now be able to run multiple versions of your application concurrently, which in turn means you can start deploying more often to preview customers and get live feedback rather than waiting for a big-bang, make-or-break production deployment.

It also means that you’re doing less during your deployments, which in turn makes them safer. In any case, application-tier deployments tend to be easier, and are definitely easier to roll back.

By deploying database changes separately, the riskier side of change management is at least isolated, and, if all goes well, invisible to end users. By their very nature, safe, non-breaking changes are also usually faster to run, and may even require no downtime.

Apart from all that, though, your team are now performing the sort of change management that will be required for microservices, without breaking up the monolith, and without a huge amount of up-front investment. You’re learning how to manage versions, how to mark APIs as obsolete, how to keep services running to maintain uptime, and how to run a system that isn’t just a single “thing”.

Conclusion

If your team has up until now been writing and maintaining a monolith, you aren’t going to get to serverless/microservice/containerised nirvana overnight. It may not even be worth it.

Rather than investing in a large-scale new architecture, consider testing the water first. If your team can manage to split off your database layer from your main application, and manage its lifecycle according to a separate deployment strategy, handling all versioning and schema issues as they arise, then you may be in a good position to do that for more of your components.

On the other hand, maybe this proves too much of a challenge. Maybe you don’t have enough people to make it work, or your requirements and schema change too often (very, very common for internal line-of-business applications). There’s nothing wrong with that, and it doesn’t imply any sort of failure. It does mean that you will probably struggle to realise any benefit from services, micro or otherwise, and my advice would be to keep refactoring the monolith to be the best it can be.

The disaster of OOP, and what we can salvage

Object Oriented Programming is a disaster, or at least that’s what Brian Will thinks, and I’m not about to disagree. I share his disillusionment with OOP, and I have always had a sneaking admiration for the PHP and JavaScript developers who just seem to get stuff done much faster than their static-typed object brethren.

However, one paragraph did stick out:

In [the] original vision, an object-oriented program is composed of a graph of objects, each an island of state unto itself. Other objects do not read or write the state of other objects directly but instead send messages.

Just because OOP as we write it today hasn’t produced the benefits we sought, doesn’t mean that there is no value in the original concepts. The difference between now and then is one of scale: how many of you out there are actually writing single programs, where a system is composed of objects that all live in a single app domain and communicate via messages?

I don’t think I’ve ever written such an app (it’s much harder in the web world in any case, where the app simply has no state from one request to the next). Where I have been writing actual, run-in-memory-UI-and-everything applications, they’ve been small utility apps that I’ve thrown together using RAD, that simply wrapped functionality that already existed in databases or services.

That, right there, is the part where OOP hasn’t failed – it has just, like the Roman Empire, transformed into something else. Instead of objects, think services – think how an enterprise system is composed of many disparate systems, services and databases, each of which are tied together by some form of message. In the worst case, they share state promiscuously and become an un-maintainable tangle, but in the best cases, we have services that sit serenely in the heavens and send messages back and forth to each other.

Or we could just hack it together with PHP and buy a bigger server.

Don’t outsource your business logic

Just another Monday morning, and Dwayne The Developer is ready to start cranking code. The business needs an object to make, track and approve holiday requests. Dwayne, being a conscientious sort, and having heard that Interfaces Are A Good Thing™, starts by writing:

interface IHolidays {
  // returns the holiday ID from the database
  int Request(Employee emp, DateTime start, DateTime finish);
  void Approve(Employee manager, int holidayId);
  void Reject(Employee manager, int holidayId, string reason);
  bool IsApproved(int holidayId);
}

Yay, Liskov and all that good stuff, right?

Wrong.

Sharon The Architect happens to be wandering by Dwayne’s desk, and sees what he’s writing. She asks why he’s made the business object an interface, and, not being satisfied with a few incoherent mumbles about SOLID, points out that Dwayne has just outsourced the business logic of this module, without gaining any of the benefits of dependency injection or inversion of control.

Interfaces are much like outsourcing: a software house might outsource non-core tasks such as designing the public website, internationalising the UI, designing icons or writing a custom SAP integration module. A sensible software house would seldom outsource its “secret sauce” – the core code that makes or breaks the app (I’ve seen some people do this, and it never ends well).

An interface essentially outsources a task: it says “here is a method that does X; I don’t care who does it or how, just do it and deliver me the result”.

There point is that there is no point in ever having a different implementation of IHolidays, because that object IS the application.

These things are good candidates for wrapping inside interfaces:

  • Database calls
  • SMTP and SMS APIs
  • Message queueing systems
  • File systems
  • Third-party APIs

Each of these is something with complicated internal logic that isn’t relevant to the business logic that your real objects are implementing – think of them as “stuff that does stuff and returns a result, where I don’t really know or care how that stuff happens”.

Because they aren’t part of the business logic, they’re excellent candidates for mocking during unit tests. We don’t need to test whether System.IO.File.Delete works; we just need to know what happens to our code when it does or doesn’t.

So what should our holiday object look like? Tune in next time to find out…

On the art of Simplicity

It must use a distributed SOA architecture!

Really? Are you sure that a simple database-models-website solution wouldn’t work just as well for your 100 users, not to mention improving developer productivity by allowing them to run/debug it in one go?

Pro tip: if you factor your application well, such that all large components (e.g. your file storage layer, messaging sub-system, order processor, customer credit checker, etc) are represented by interfaces and passed as parameters, then splitting a ‘monolithic’ application into SOA components becomes quite a simple exercise.

We should think up-front about sharding the database!

Maybe. If the database is well-designed, sharding shouldn’t present a problem when it becomes necessary, but you might find that judicious use of de-normalisation, data warehousing and caching can take you a long way before you need to consider fragmenting the physical schema (millions of rows is not big data!).

We need to make the business rules configurable!

Maybe. See The Daily WTF for some thoughts on why configurability isn’t always a good idea.

I’ve seen so many instances where rules that are intrinsic were made configurable – I once saw a file storage library where users could ‘configure’ file extensions to map to different DLL handlers for thumbnails. Of course, only developers could write and deploy these modules, and the configuration was hard to script, so it became a manual step that had to be done after deployment of a new thumbnail handler.

Hard-coding a price list is probably a bad idea. Soft-coding a set of legal rules, on the other hand, can lead you into a world of hurt if the shape of those rules changes in the future. Where possible, factor the rules into a re-usable library and, if you need to make it configurable, you now only have one object to modify.

What about an Enterprise Service Bus?

No. Just no.

(You might need message queues for background processing, and no harm done. You might even decide on a common queuing engine – be it a library, service or just a pattern – but you soon as you start referring to it as the Enterprise Service Bus, it takes on a gruesome life of its own and summons Cthulhu)

So I was thinking that we should have a common repository pattern where we serialise business objects into DTOs—

What’s wrong with ADO.NET and SQL queries? Or you could get fancy and set up an Entity Framework data context, or the ORM of your choice. Unless you absolutely need some sort of API-based interface, why worry about serialisation (and, when you do need to, then JSON.Net does a brilliant job)?

By all means have a pattern for DALs/repositories, but for smaller projects/simpler databases you might be able to just have a single DAL representing your database.

Let’s build a SPA UI framework!

Let’s not. If you need a SPA, use Angular, Ember etc, but maybe you don’t need one yet – if your data set is small and your customer base isn’t massive, you can probably get away with vanilla ASP.NET MVC/Rails/Node.js/etc. Add Knockout.js or even just some simple jQuery/vanilla JavaScript on top for bits of rich UI. If this starts looking like spaghetti, then it’s time to think about going to the SPA and putting a framework in place.

Hell, I had to re-write a simple data entry system (PHP, 10 users, about a dozen screens, internal only) a couple of years back – I wrote it in ASP.NET WebForms (boom!) in a weekend using GridViews, ObjectDataSources and Membership controls.

But changing code after the fact is hard and messy – this is how systems become Big Balls of Mud!

Ye-es, to an extent. Certainly this happens if you do a half-assed job of it, and try to change piece-meal.

If your system is well-factored with clear separations between layers and components, then it should be simple to upgrade/refactor/change components. If you’re going to move from ASP.NET MVC to Ember.js, then make sure you do it wholesale – change the entire front-end UI and don’t call ‘done’ until it’s all done. If you’re going to move that order processing object into a web service, move the whole thing and delete the old implementation lest others try to use it.

The problem is not change – the problem is half-a-job changes where people think they’re playing it safe by leaving the old stuff there ‘just in case’; in reality, it’s much safer to break the system now rather than leave time-bombs in for the future.