Everyone wants a log in distributed systems !!

Can you guess what is behind every successful implementation of a distributed systems or relational databases like mysql ? Well it is a very small concept called as log. In relational database world it is sometimes called as transactional log, journal or write ahead log (there might be many more).

what is a log is it something what i write in programs like

log.info("hello world"); 

Well no these are human readable texts which are used by programmers to debug an issue/ bug. These can be called as application logs. The idea i want to cover in this post is something called as data log or journal which are built for programmatic access.

I’m not tech savvy can you example me logs in simple terms ?

Sure no problem. We all learnt basic math operations while we are in school like a+b, a*b, a/b, a-b etc. Let’s do a basic math exercise

  1. Can you integrate sin(x) + cos(x) from 0 to 1? (Just kidding .. you can start from step 2 :D )

What result did you get ? Well you must have got 48 ? Yay ! I know how you think.

How did we both reach to same result ?

It is because we both started with same initial state 0 and applied the math operations sequentially. If either of them was different then we couldn’t have reached to a different result


  • Let’s suppose i started with one and you started with zero then the final value in my case will be 49 but in your case it will be 48.

So what is a learning ?

Given an initial state and same sequence of logs records (operations) we would reach to same result no matter what is the speed at which we process the records.

Can you now give me the actual definition of log ?

A log is perhaps the simplest possible storage abstraction. It is an append-only, totally-ordered sequence of records ordered by time.

Records will only be append only i.e they will be added only to the end. Once appended they become immutable. Each log entry will be assigned a unique sequential number. One idea is to number the log records based on the time it got generated i.e t(1) < t(2) < t(3). Using time in distributed systems is not a very great idea (pretty big topic will cover in another post… It is very similar to real life the time on your watch might not match with time on my watch)

How traditional databases like mysql get constructed ?

In very overly simplified view, every table in mysql will starts empty, apply the logs which are generated by sql constructs and as a consequences rows get generated in the database.

Remember #(logs) >>>> #(records) in table. Consider this scenario where i keep updating the same records again and again then number of logs will be very high but rows might be just one.

How is it helpful in distributed systems ?

One of the core features on why we use distributed systems is that we want our data to be as durable as possible i.e even if one machine goes down because of some issues it shouldn’t lose my data. So one easy solution is to replicate the data across multiple machines.

So if we can successfully replicate the logs among machines , with both machines starting with empty state( assuming we creating from scratch) we can reach to same state after successful processing of the replicated logs.

This is very similar to the math exercise we both started with same state, we both applied operations sequentially and we reached to same output. So now even if i get hit by a bus you will have the result and be my backup. ( Stressing this point because this is the most critical concept you need to understand )

So how can we replicate logs between machines ?

The distributed systems literature commonly distinguishes two broad approaches to processing and replication. The “state machine model” usually refers to an active-active model where we keep a log of the incoming requests and each replica processes each request. A slight modification of this, called the “primary-backup model”, is to elect one replica as the leader and allow this leader to process requests in the order they arrive and log out the changes to its state from processing the requests. The other replicas apply in order the state changes the leader makes so that they will be in sync and ready to take over as leader should the leader fail.

What are different replication control strategies ?

In order to maintain mutually consistent data in all sites, replication control techniques need to be adopted. There are two approaches for replication control, namely −

  • Synchronous Replication Control

Synchronous Replication Control

We programmatically update both databases present in different machines synchronously( either parallely or serially)

{Update table in machine (1)Update table in machine (2)}

Some of the obvious disadvantages with this approach are that even if one machine is not able to reply then we will timeout the request and customers receives 500 which is not a very good customer experience. Some machines might be inconsistent state and needs to roll back properly.

Asynchronous Replication Control

In asynchronous replication approach, the replicas do not always maintain the same value. One or more replicas may store an outdated value, and a transaction can see the different values. The process of bringing all the replicas to the current value is called synchronization.

Some of the replication control algorithms are −

  • Master-slave replication control algorithm.

Master-Slave Replication Control Algorithm

There is one master site and ’N’ slave sites. A master algorithm runs at the master site to detect conflicts. A copy of slave algorithm runs at each slave site. The overall algorithm executes in the following two phases −

  • Transaction acceptance/rejection phase − When a transaction enters the transaction monitor of a slave site, the slave site sends a request to the master site. The master site checks for conflicts. If there aren’t any conflicts, the master sends an “ACK+” message to the slave site which then starts the transaction application phase. Otherwise, the master sends an “ACK-” message to the slave which then rejects the transaction.

Distributed voting algorithm

This comprises of ’N’ peer sites, all of whom must “OK” a transaction before it starts executing. Following are the two phases of this algorithm −

  • Distributed transaction acceptance phase − When a transaction enters the transaction manager of a site, it sends a transaction request to all other sites. On receiving a request, a peer site resolves conflicts using priority based voting rules. If all the peer sites are “OK” with the transaction, the requesting site starts application phase. If any of the peer sites does not “OK” a transaction, the requesting site rejects the transaction.

Majority Consensus Algorithm

This is a variation from the distributed voting algorithm, where a transaction is allowed to execute when a majority of the peers “OK” a transaction. This is divided into three phases −

  • Voting phase − When a transaction enters the transaction manager of a site, it sends a transaction request to all other sites. On receiving a request, a peer site tests for conflicts using voting rules and keeps the conflicting transactions, if any, in pending queue. Then, it sends either an “OK” or a “NOT OK” message.

Circulating Token Algorithm

In this approach the transactions in the system are serialized using a circulating token and executed accordingly against every replica of the database. Thus, all the transactions are accepted, i.e. none is rejected. This has two phases −

  • Transaction serialization phase − In this phase, all transactions are scheduled to run in a serialization order. Each transaction in each site is assigned a unique ticket from a sequential series, indicating the order of transaction. Once a transaction has been assigned a ticket, it is broadcasted to all the sites.

Other advantage of the logs

Point in time view :- I can get into a any point of view of my database using the logs. For example you can get bank balance on March 25, 2015 by computing all the withdrawals and deposits from account creation time to March 25, 2015 by which can be obtained from the statements.

Disaster recovery:- If node is suffering from intermittent power failures it can reconstruct the state by applying the logs.

Support ACID:- Provide ACID properties even if there was an disaster.

In summary the two problems a log solves — ordering changes and distributing data — are even more important in distributed data systems. Agreeing upon an ordering for updates (or agreeing to disagree and coping with the side-effects) are among the core design problems for these systems

We will discuss more about consensus in the next article ….




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Bits And Bytes Of Avikodak

I spend most of the time on helping out zeros and ones to reach there destination in a orderly and reliable manner :)