How to scale a Database ?

Your product just launched. Everything is working fine and fast and your database is running on the single server. Then, one day traffic doubles. Your website slows down, CPU hits 100%, your write start timing out and then suddenly your database becomes the bottleneck.

At some point, every growing system faces the same question: how do we scale the database?

Databases are stateful
They’re harder to scale than stateless application servers.
You can’t just “add more instances” casually.
Scaling introduces trade-offs, cost, complexity, consistency, operational overhead.

There are some fundamental approaches that you can implement to scale a database.

Vertical Scaling

In vertical scaling, you just increase the resources allocated to server or to your database. It's just making the box bigger. It works well early and it's operationally simple and if your product is in early stage this is the approach that you should go with.

Vertical Scaling also has it's cons:

Cost
As the hardware increases, the cost increases non-linearly with it.
Database Down Time and Availability
Vertical Scaling often requires restarting a database. This restarts often introduces downtime, which can impact the availability of the system.
Single Point of Failure
If your this database instances goes down then your whole application become unavailable.

So while vertical scaling is the simplest way to grow, it doesn’t solve availability or fault tolerance. To address those concerns, we need a different strategy — replication.

Replication

Database replication is the process of copying and maintaining the same data across multiple database servers.

In this we generally follow the Master/Slave architecture, where the master database generally supports only the write operations and a slave database who generally get the copies of data from the master database and performs all the read operations. In a system the numbers of slave are generally larger than number of master database.

Types of Replication

Synchronous Replication
In this changes made to master database are immediately replicated to all slave database before the transaction is considered completed. This ensures strong consistency but impact the performance.
Asynchronous Replication
In this changes made to master database are replicated to slave database with slight delay. It offers better performance but with data inconsistency between master and slave (known as replication lag).

Advantages of database replication:

Performance
In master/slave architecture, all the writes operation are performed by the master node and all the read operations are distributed among the slave nodes. This is increase performance as now many queries can be executed in the parallel.
High Availability
In this, if the slave node goes offline then all the read operations are redirected to the other healthy slave nodes and in case if there is only one slave node than all the read operations are temporarily redirected to the master node until a new slave node replace the old one.

If the master node goes offline, then any of the slave node would be promoted to master database and all the write operations would be performed on the new master node.

Cons of Database replication:

Cost
Data Inconsistency
Due to replication lag, the data on master and slave nodes might not be consistent which can lead to stale reads when a user try to access it.

Replication helps when reads are the bottleneck. But when write traffic starts overwhelming a single primary node, replication alone is no longer enough.

Sharding

Sharding is a technique for splitting the data across multiple database servers (also called shards).

Data in each shard is unique to it. No two shards hold the same copy of the primary data.

Sharding Key

A shard key is a field or combination of fields (e.g., user_id, timestamp) used in database sharding to determine how data is partitioned across multiple servers (shards).

Hash Function

In database sharding, a hash function is an algorithm that takes the value of the shard key as input and generates a fixed-length numerical value (a hash code or hash value). This hash value is then used to determine which physical server, or shard, the corresponding data record should be stored on.

shard_number = hash(user_id) % number_of_shards // This is a hash function

Now, we have allocated the user data onto the database server using the user IDs. Anytime you try access the data, a hash function is used to find the corresponding shard. In our example user_id % 4 is a hash function. If the result equals to 0, shard 0 is used to store and fetch data. If the result equals to 1, shard 1 is used. The same logic applies to other shards.

The user table in sharded databases.

The most important factor to consider when implementing a sharding strategy is the choice of the sharding key. Sharding key (known as a partition key) consists of one or more columns that determine how data is distributed. “user_id” is the sharding key. A sharding key allows you to retrieve and modify data efficiently by routing database queries to the correct database shard.

Sharding is a great technique to scale the database but it is far from a perfect solution. It introduces complexities and new challenges to the system:

Resharding Data
Our hash function is user_id % 4 so if in this setup we add a another database server than the hash function would become user_id % 5 due to which we have to rearrange our data.

As you can see that just for seven keys when we add a single database server to your setup we have to move 4 users data to a different database server.
Celebrity Problem
This is also known as the Hotspot Key Problem. Imagine data for Ronaldo, Virat Kohli and Justin Bieber all end up on the same shard. For social applications like Instagram, that database shard would be overwhelmed with the read operations. To solve this problem, we may need to allocate a shard for each celebrity. Each shard might even require further partition.
Cross Shard Queries
Suppose you want to access data that is present in multiple shards then you have to send query to each shard to collect the data which is a very expensive operation.

Sharding is powerful for scaling databases, but it demands careful planning to handle issues like resharding, hotspots, and cross-shard queries.

How to Scale a Database?

Vertical Scaling

Replication

Types of Replication

Sharding

References

Comments

System Design

Monolith and Microservices Architecture

More from this blog

Monolith and Microservices Architecture

Consistent Hashing

The Thundering Herd Problem

this Keyword in JavaScript

Command Palette

Vertical Scaling

Replication

Types of Replication

Sharding

References

Comments

System Design

Monolith and Microservices Architecture

More from this blog