In a recent post, I briefly touched upon mysql replication events. It is important to know the advantages and disadvantages of each type.
There are two types of replication event:
Statement based – these are the write operations. They written to the binary log as SQL statements.
Row based – these are the row changes and they are written to the binary log as complete copies of the rows.
It’s also possible to configure the server as MIXED which is a combination of both statement and row based replication. We’ll talk about that in a minute…
When setting up replication, in order to specify a particular replication event format, the binlog_format setting is configured in my.cnf (my.ini if using Windows)
Is there a best option ?
Let’s expand first on the differences between statement based and row based replication in MySQL.
With this format, every SQL statement that could perform a modify operation is logged to the binary log. Those same statements are then executed on the slaves by the SQL thread.
If the application makes five update operations to a table on the master, those same five statements are executed on the slave against the same dataset and in the same order.
These are complete copies of the rows that are written to the binary log and subsequently applied to the slaves.
If an application modified the same row five times, then five copies of that row would be written to the binary log and applied to the slaves. A modification might only be an update to a single field on the row but in row based replication, this operation is replicated as the entire row and not a single field update.
What are the advantages of statement based replication in MySQL?
- Binary logs are smaller so less disk space is needed for them
- Less data is transferred between the master and slaves
- Auditing changes made to a database could be done by looking in the binary logs using a tool like mysqlbinlog
What are the disadvantages of statement based replication in MySQL?
- Non-deterministic queries can produce results on the slave than are different to what were returned on the master
This points deserves padding out as will cause data consistency problems.
There are many non-deterministic operations which are considered as unsafe when used with replication.
An example of a non-deterministic operation would be where the function UUID() is used to a universal unique identifier. This will return a different result upon each execution. Having this function inside an insert or update statement for example would produce different results on both the master and the slave because the SQL statements generated on the master and executed on the slave. This behaviour applies to not only ad-hoc SQL statements but also to stored procedures and UDF’s.
- Greater numbers of row level locks may be necessary, for example if the operation was a INSERT…SELECT. This can introduce a performance penalty.
- There is a lot of extra context work needed to ensure that update operations are executed in the correct order
What are the advantages of row based replication in MySQL?
- The entire row is replicated. This avoids all of the issues with non-deterministic operations found in ad-hoc SQL, stored procedures and UDF’s
- No context information is needed to help guarantee operations are executed in the correct order
- Fewer row level locks are needed
What are the disadvantages of row based replication in MySQL?
- More data can be generated, increasing disk space requirements, network traffic and I/O
- The data is logged in binary format, making it harder to audit the changes to the database using the binary logs
Conclusion – statement based, row based replication or mixed replication
In my view it really depends 🙂 Data consistency should be the number one priority and this has to factor heavily in any decision.
Assess requirements based on the application. If you know for sure that the application is not using non-deterministic functions, then statement based replication would be sufficient but how can you guarantee this as a DBA?
As a DBA you might be responsible for introducing stored procedures and udf’s into production. This gives you a chance to review things before they are live and non-deterministic functions may raise a red flag. However ad-hoc SQL inside the application would be harder to review. In a statement based configuration, data consistency cannot be guaranteed between the master and the slave in this scenario.
So let’s go with row based – no wait, our application is write heavy, that’s going to cost more in hardware storage and potentially affect I/O performance!
Given the above, I would choose MIXED which switches between statement and row based replication depending on the operations performed. This helps keep binary logs in check whilst helping to guarantee data consistency between the master and the slave.
If I had to choose between statement and row based and storage, I/O etc was not an issue, I would stick to row based replication given the extra peace of mind it provides.