Manage Data More Effectively with Multiple Databases

Vitalii Symon, 22 October 2010

Managing data is a hard task if you have 2-3 Gigabytes of data and extreme requirements for quick use. For example – post database.

Currently, I am working on database which will allow you to pick addresses from a NZ-wide database for customers via a web API. This will increase accuracy of your data and will allow you to maintain correct addressing even if a post code or Zip code was updated. This database is updated every 3 months and is shipped on a few CD disks.  But, carrying all versions of addresses inside one database would slow down whole system.

So we have to create primary (root) database which will contain global values (like settings, list of customers) and several databases which will contain and share data between them.

In general scheme it looks like this:

Manage Data More Effectively with Multiple Databases 

Note that web service interacts with secondary databases (A, B and C) directly to reduce load onto the main database. To achieve this goal you have define some rules to select database. For example you can

1. select database by date or

2. select database by name of object you are working with. For example if you work with company related information you can store information for companies starting on ‘A’-‘K’ in database A in ‘L’ to ‘Z’ – in database B.

Usually the amount of common data isn’t big, so keeping it in the memory is possible, that avoids creating bottleneck when accessing the root database.

The next question is how to store connection strings. There are two primary options:

1. Configuration file. This looks like a good option but needs extra code for managing xml, reloading settings on the fly (if web.config is used), etc.

2. Root database. You can create just another table in the database and put connection strings there. That’s an easy and reliable solution which requires minimal efforts. Also it makes sense to cache connection strings to reduce response time for your service.

Having several databases is a good approach to make your service quicker and more reliable. In most cases it is easy split data, so if you have big solution - it makes sense to design it to support multiple databases from scratch rather than redesign after.