DBA Diaries

Thoughts and experiences of a DBA working with SQL Server and MySQL

MySQL Replication Events – Statement versus Row-Based Formats

Posted on May 8, 2016 Written by Andy Hayes Leave a Comment

In a recent post, I briefly touched upon mysql replication events. It is important to know the advantages and disadvantages of each type.

There are two types of replication event:

Statement based – these are the write operations. They written to the binary log as SQL statements.

Row based – these are the row changes and they are written to the binary log as complete copies of the rows.

It’s also possible to configure the server as MIXED which is a combination of both statement and row based replication. We’ll talk about that in a minute…

When setting up replication, in order to specify a particular replication event format, the binlog_format setting is configured in my.cnf (my.ini if using Windows)

Is there a best option ?

Let’s expand first on the differences between statement based and row based replication in MySQL.

Statement based

With this format, every SQL statement that could perform a modify operation is logged to the binary log. Those same statements are then executed on the slaves by the SQL thread.

If the application makes five update operations to a table on the master, those same five statements are executed on the slave against the same dataset and in the same order.

Row based

These are complete copies of the rows that are written to the binary log and subsequently applied to the slaves.

If an application modified the same row five times, then five copies of that row would be written to the binary log and applied to the slaves. A modification might only be an update to a single field on the row but in row based replication, this operation is replicated as the entire row and not a single field update.

What are the advantages of statement based replication in MySQL?

  • Binary logs are smaller so less disk space is needed for them
  • Less data is transferred between the master and slaves
  • Auditing changes made to a database could be done by looking in the binary logs using a tool like mysqlbinlog

What are the disadvantages of statement based replication in MySQL?

  • Non-deterministic queries can produce results on the slave than are different to what were returned on the master

This points deserves padding out as will cause data consistency problems.

There are many non-deterministic operations which are considered as unsafe when used with replication.

An example of a non-deterministic operation would be where the function UUID() is used to a universal unique identifier. This will return a different result upon each execution. Having this function inside an insert or update statement for example would produce different results on both the master and the slave because the SQL statements generated on the master and executed on the slave. This behaviour applies to not only ad-hoc SQL statements but also to stored procedures and UDF’s.

  • Greater numbers of row level locks may be necessary, for example if the operation was a INSERT…SELECT. This can introduce a performance penalty.
  • There is a lot of extra context work needed to ensure that update operations are executed in the correct order

What are the advantages of row based replication in MySQL?

  • The entire row is replicated. This avoids all of the issues with non-deterministic operations found in ad-hoc SQL, stored procedures and UDF’s
  • No context information is needed to help guarantee operations are executed in the correct order
  • Fewer row level locks are needed

What are the disadvantages of row based replication in MySQL?

  • More data can be generated, increasing disk space requirements, network traffic and I/O
  • The data is logged in binary format, making it harder to audit the changes to the database using the binary logs

Conclusion – statement based, row based replication or mixed replication

In my view it really depends 🙂 Data consistency should be the number one priority and this has to factor heavily in any decision.

Assess requirements based on the application. If you know for sure that the application is not using non-deterministic functions, then statement based replication would be sufficient but how can you guarantee this as a DBA?

As a DBA you might be responsible for introducing stored procedures and udf’s into production. This gives you a chance to review things before they are live and non-deterministic functions may raise a red flag. However ad-hoc SQL inside the application would be harder to review. In a statement based configuration, data consistency cannot be guaranteed between the master and the slave in this scenario.

So let’s go with row based – no wait, our application is write heavy, that’s going to cost more in hardware storage and potentially affect I/O performance!

Given the above, I would choose MIXED which switches between statement and row based replication depending on the operations performed. This helps keep binary logs in check whilst helping to guarantee data consistency between the master and the slave.

If I had to choose between statement and row based and storage, I/O etc was not an issue, I would stick to row based replication given the extra peace of mind it provides.

 

Filed Under: All Articles, MySQL Administration Tagged With: mysql, replication

MySQL Replication Use Cases

Posted on February 6, 2016 Written by Andy Hayes Leave a Comment

In my last post, I summarized when MySQL replication is typically implemented. I now want to expand on that and talk about MySQL replication use cases.

Scaling out using MySQL Replication

Let’s take a typical example, the application is read-heavy versus writes. The server is under heavy read load and you want to reduce that. You have optimized indexes etc all you can but the database is still struggling to keep up with the demand. To alleviate the load, more read requests per second need to be processed.

You now have to make a decision, do you scale vertically with a larger server and hope the problem gets fixed this way or do you scale horizontally and add extra servers?

Scaling vertically with a bigger server may address the problem. More CPU’s, faster disks etc could well remove the bottleneck, however this involves a migration exercise from the old to the new server which can be a lengthy operation.

If the application is supporting a business which is experiencing growth and therefore increases in load, repeating the migrations each time (assuming that the hardware to support it has been invented that is faster than what you already have) is not a viable long term solution.

By scaling horizontally and implementing MySQL Replication, you can add additional servers and load balance read requests across them. Want more read throughput? You can add more replica’s.

I’ve used a read-heavy application as an example here. If you wanted more write throughput, then MySQL replication is not a solution to this problem and some form of sharding/HA cluster solution would be used.

Reducing Impact of Backups Taken By Mysqldump

If you use the out of the box tool for taking backups of your database – mysqldump, you may have noticed that it can cause locking problems as it takes the backup.

One get get around this by adding –single-transaction to the command but if you have a replica of your databases available on another server you may choose to take backups of mysql databases from the slave to take some of the load away from the master server.

After all, in a mysql master slave setup, there could be multiple slaves but only one master. So backing up a slave, rather than the master makes logical sense.

Facilitating Reporting Using MySQL Replication

The business wants information from its data. The data is in the production databases. Running queries against databases for reporting can be heavy and affect performance to the main application due to the need to join many tables and aggregate lots of rows.

So you could schedule your reports to run off peak via some cron or windows task. But wait, the business wants these reports sooner, it doesn’t want to have to wait for this information for many hours. The data analysts need them today!

This is another great reason to setup MySQL replication, the replica can be used for heavy reporting tasks, whether these are reports within the application, data analysts or ETL jobs, the fact that the replica is not used by the production application means these tasks have no impact. Everyone is happy 🙂

Providing Failover/High Availability

It’s worth noting that MySQL replication is not a complete high availability solution due to the master/slave configuration. If the master goes down for example, there is a bit of work needed to promote an existing slave to become the master. For more information on a complete HA solution for MySQL, check out MySQL Cluster

In my mind, when I think high availability, database services are not interrupted at any level. Instead, we could say that MySQL replication provides increased availability or higher availability because of the replica’s that exist which can be used in the event of failover.

For businesses which can suffer some brief interruptions whilst the master is promoted,  MySQL replication increases database availability and fulfills a need. With more investment, availability increases. There is a link here which I am quoting from that summarizes this perfectly;

Each architecture used to achieve highly available database services is differentiated by the levels of uptime it offers. These architectures can be grouped into three main categories:

  • Data Replication.
  • Clustered & Virtualized Systems.
  • Shared-Nothing, Geographically-Replicated Clusters.

In my next post, I am going to look more closely at the replication event types.

Filed Under: All Articles, MySQL Administration Tagged With: backup and restore, mysql

What is MySQL Replication and How Does It Work?

Posted on January 31, 2016 Written by Andy Hayes 1 Comment

MySQL replication, a.k.a MySQL database replication provides the facility to make replicas of databases. The ability to make exact copies of databases and keep them in real-time sync as changes are made at the “master” provides a number of advantages.

In summary these are:

  • Scaling out a database application
  • Reducing database backup impact
  • Facilitate reporting without affecting production load
  • Failover/High Availability

I’m going to go into detail about these in my next post.

How does MySQL Replication Work?

Firstly we have to understand the architecture. At a minimum, in order to establish MySQL database replication, we have a master and a slave.

Typically, these would be two different servers.

The master is where all the changes happen. All database updates occur here, from adding, updating or deleting table records to creating functions, stored procedures or making table changes.

The slave server receives a copy of the changes applied at the master server. This all happens very quickly in order that the slave is always in sync with the master.

But how does this happen?

Replication events are written to a special log called the binary log on the master. The events are then read by the slave (a.k.a replica) and applied to the databases on slave server.

Replication Events

Two types:

Statement based – these are the write operations. They written to the binary log as SQL statements.

Row based – these are the row changes and they are written to the binary log as complete copies of the rows.

I will cover the differences of these and all of the advantages and disadvantages in a later post. When replication is set up, there is a choice between statement based, row based or mixed for the binlog-format variable to determine how the events are written to the binary log.

On the master

As updates occur, the events are written sequentially to the binary log for the slave to read later. Writes on the master are actually buffered to improve performance for when the slave is reading them as well as being written to the binary log.

The binlog dump thread – when the slave connects to the master, the master opens up a thread for the connection from the slave. This thread is just like any other thread however the difference will be that it will be typically opened using a dedicated user set up for the replication process and it just handles notifying the slave of changes and sending the binary log contents to it as it connects.

On the slave

When replication is started, two threads are created on the slave:

The IO thread – this thread connects to the master and reads the events as they come in. The replication events are then written to a local log file called the relay log which will be read by the second thread created on the slave.

The SQL thread – this thread reads the relay log (written by the IO thread) and applies the changes to the database as quickly as possible.

If you want to check the status of the IO thread, run “show slave status\G” on the slave:

  • Master_Log_File – the last binary log file copied from the master.
  • Read_Master_Log_Pos – the binary log from the master is copied over to the relay log on the slave up to the position listed.

If you want to see the status of the SQL thread on the slave, again using “show slave status\G” , check the following:

  • Relay_Master_Log_File – this is the relay log (binary log from the master) that SQL thread is currently processing.
  • Exec_Master_Log_Pos – the position in the binary log currently being executed.

I hope you found this post useful. Please share if you liked it. Thanks 🙂

Filed Under: All Articles, MySQL Administration Tagged With: mysql, replication

10 Database Performance Monitoring Tools You Can Get For Free

Posted on June 28, 2014 Written by Andy Hayes Leave a Comment

Database performance monitoring is something every DBA worth their salt should be doing on a regular basis.

It should be adopted as a proactive task to help identify issues early on before they become too serious and be part of a post code deployment monitoring process.

Bundled in with linux based operating systems are a heap of great tools that you can use as a DBA to help performance monitor your database server. If you are not happy with what you get “out of the box”, you can also find some great database monitoring tools online that are available to download for free.

For this post, I’m going to talk about both MySQL and Linux operating system performance monitoring tools. In many scenarios, you’ll need both types in order to get a complete understanding of where the delays are in your system.

MySQL Performance Monitoring Tools

1/ MySQL slow query log
The mysql slow query log is absolutely brilliant for capturing slow queries hitting your MySQL databases.

You can log queries whose durations match the number you specify in my.cnf. So you can analyze queries which take more than 3 seconds for example.

Activate in my.cnf with customizable settings for log location, long query time and whether to log queries that do not use any indexes.

#slow query logging
slow-query-log = 1
slow-query-log-file = /var/log/mysql/slow-log
long-query-time = 3
log-queries-not-using-indexes = 0

Once you have been logging for a while you can aggregate the results with the mysqldumpslow utility,  optimize them and then monitor for improvements! 🙂

2/ MySQL Performance Schema
Introduced in version 5.5, the performance_schema database provides a way of querying internal execution of the server at run-time.

To enable add “performance_schema” to my.cnf

There are many objects to query, too many to talk about in this post. Check out the documentation here.

3/ The MySQL process list

To get an idea of how many processes are connected to your MySQL instance, what they are running and for how long, you can run SHOW FULL PROCESSLIST or alternatively read from the information_schema.processlist table.

mysql> SELECT user, host, time, info FROM information_schema.processlist;
+-------------+------------+-------+-------------------------------------------------------------------+
| user        | host       | time  | info                                                              |
+-------------+------------+-------+-------------------------------------------------------------------+
| root        | localhost  |     0 | SELECT user, host, time, info FROM information_schema.processlist |
| replication | srv1:46892 | 11843 | NULL                                                              |
+-------------+------------+-------+-------------------------------------------------------------------+
2 rows in set (0.00 sec)

4/ mtop
I love this utility, it provides a real-time view of the MySQL process list and updates according to the number of seconds your specify when you run it.

What I really like about it is that you can have it running on one screen and as problems occur, the colours of the threads change colour with red indicating that something has been running for some time.

There is a great article here about how to install it on different flavours of Linux as well as some detail on how to run it.

5/ SHOW STATUS
Like other command line tools, such as SHOW PROCESSLIST, you run these to get moment in time reports on different variable status’s.

For example, if you want to get information about the query cache, you can run :

mysql> SHOW STATUS LIKE 'Qcache{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}';
+-------------------------+------------+
| Variable_name           | Value      |
+-------------------------+------------+
| Qcache_free_blocks      | 9353       |
| Qcache_free_memory      | 93069936   |
| Qcache_hits             | 9719103977 |
| Qcache_inserts          | 1451857238 |
| Qcache_lowmem_prunes    | 897050960  |
| Qcache_not_cached       | 222234089  |
| Qcache_queries_in_cache | 20856      |
| Qcache_total_blocks     | 52497      |
+-------------------------+------------+
8 rows in set (0.00 sec)

This type of reporting can help you monitor specific areas of your MySQL instance. For example, if you wanted to know the query cache hit rate, you could get the numbers from above and calculate based on this formula:

((Qcache_hits/(Qcache_hits+Qcache_inserts+Qcache_not_cached))*100)

For more information, see this link.

Operating System Performance Monitoring Tools

6/ TOP
This will list running processes and the resources that they are consuming. It updates real-time and you can quickly gage if there are processes which are consuming large areas of resource in CPU and memory at a very high level.

top - 17:33:48 up 7 min,  1 user,  load average: 0.03, 0.04, 0.04
Tasks:  64 total,   1 running,  63 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}us,  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}sy,  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}ni,100.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}id,  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}wa,  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}hi,  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}si,  0.0{3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}st
Mem:    604332k total,   379280k used,   225052k free,    11724k buffers
Swap:        0k total,        0k used,        0k free,   135064k cached

  PID USER      PR  NI  VIRT  RES  SHR S {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}CPU {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}MEM    TIME+  COMMAND
  809 tomcat7   20   0 1407m 149m  13m S  0.3 25.4   0:10.99 java
 1153 ubuntu    20   0 81960 1592  756 S  0.3  0.3   0:00.01 sshd
 1318 root      20   0 17320 1256  972 R  0.3  0.2   0:00.07 top
    1 root      20   0 24340 2284 1344 S  0.0  0.4   0:00.39 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0
    5 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kworker/u:0

7/ free
This utility helps to give you an idea whether you have a memory issue. Again this is another great tool for getting a high level view. I like to use “free -m” as it returns the numbers to me in megabytes instead of bytes. The information returned shows you in use, free and swap usage. It also shows what is in use by the kernel and buffers.

[email protected]:~# free -m
             total       used       free     shared    buffers     cached
Mem:           590        373        216          0         11        131
-/+ buffers/cache:        229        360
Swap:            0          0          0

8/ vmstat
This utility is very useful for monitoring many areas of the system, CPU, IO blocks and swap. I find it particularly good to monitor swap file usage.

Whilst “free” might tell you if there are any pages in the swap file, vmstat will tell you if your system is actively swapping.  Computers and servers do need to use their swap file but the less this happens, the better it is for your applications performance.

When you have a problem with swap, it is when it is being used constantly and can be a sign that you don’t have enough memory installed in your system.

By default, running vmstat will not give you a real time view of your system. So you need to add a figure to the command to give you a fresh read out in the number of seconds specified. In this example, I am specifying every 2 seconds.

[email protected]:~# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 221324  12556 135252    0    0    93    19   40   75  1  0 98  0
 0  0      0 221324  12556 135276    0    0     0     0   34   65  0  0 100  0
 0  0      0 221324  12564 135280    0    0     0    24   38   64  0  0 100  0
 0  0      0 221324  12564 135280    0    0     0     0   32   56  0  0 100  0
 0  0      0 221324  12564 135280    0    0     0     0   33   56  0  0 100  0
 0  0      0 221324  12564 135280    0    0     0     0   30   55  0  1 100  0
 0  0      0 221324  12564 135280    0    0     0     0   35   59  0  0 100  0

The columns you are interested in are swap si and so. Which stands for “swap in” and “swap out”. These figures tell you what is being read in from disk swap file (si) and what is being swapped out to the swap file (so). Swapping is very slow I/O intensive process and you want to be doing some optimization somewhere or adding more memory if this is a problem.

Run “man vmstat” for a full list of features and documentation.

9/ sar
I love sar! It will capture you a whole bunch of metrics based on CPU time, CPU queues, RAM, IO and network activity. It will give you a point in time view of the resource usage in the form of a historical report.

The default time between report lines is 10 minutes but you can change that. It’s great for seeing whether you have any particularly heavy areas of resource pressure at any time in the day. You can also use it as a performance monitoring tool to measure the effects of optimizations to your system.

Some examples, run “man sar” for a full list of features and documentation on what each column header means.

sar -q (check CPU queue length)

11:20:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
11:30:01 AM         1       201      0.00      0.00      0.00
11:40:01 AM         1       200      0.00      0.00      0.00
11:50:01 AM         1       201      0.00      0.00      0.00
12:00:01 PM         2       201      0.00      0.00      0.00

sar -r (check RAM usage)

11:20:01 AM kbmemfree kbmemused  {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}memused kbbuffers  kbcached  kbcommit   {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}commit
11:30:01 AM    151308   3765480     96.14     91416   1054136   2961684     49.25
11:40:01 AM    151076   3765712     96.14     91664   1054136   2961012     49.24
11:50:01 AM    150680   3766108     96.15     91888   1054148   2961152     49.24
12:00:01 PM    150704   3766084     96.15     92104   1054152   2961340     49.24

10/ iostat
This tool will you give you statistics for CPU and I/O for devices, partitions and network file systems. Great for knowing where the busiest drives are for example.

[email protected] ~# iostat
Linux 2.6.32-431.11.2.el6.x86_64 (vm1)        06/27/2014      _x86_64_        (4 CPU)
avg-cpu:  {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}user   {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}nice {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}system {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}iowait  {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}steal   {3a76cbff1b5ce5ace3bd34bbdd68f71285d81fb9af60114c7a6db80d3a8688de}idle
           0.23    0.00    0.07    0.10    0.00   99.60Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              11.78       785.38       450.12 1437054564  823620760
dm-0              1.00         1.35         6.67    2472280   12211040
dm-1             64.52       783.30       441.42 1433252442  807699512
dm-2              0.00         0.00         0.02       7658      29336
dm-3              0.27         0.53         2.01     978626    3680440

Finally

So there you have it – 10 really useful tools which you can utilize in your database performance monitoring efforts. There are many more but I’ve run out of time now. 🙂

Filed Under: All Articles, MySQL Administration Tagged With: mysql, performance, ubuntu

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next Page »

Categories

  • All Articles (82)
  • Career Development (8)
  • MySQL Administration (18)
  • MySQL Performance (2)
  • SQL Server Administration (24)
  • SQL Server News (3)
  • SQL Server Performance (14)
  • SQL Server Security (3)
  • SQL Tips and Tricks (19)

Top 10 Popular Posts

  • Using sp_change_users_login to fix SQL Server orphaned users
  • How to shrink tempdb
  • How to Transfer Logins to Another SQL Server or Instance
  • How to Delete Millions of Rows using T-SQL with Reduced Impact
  • MySQL SHOW USERS? – How to List All MySQL Users and Privileges
  • New T-SQL features in SQL Server 2012 – OFFSET and FETCH
  • T-SQL – How to Select Top N Rows for Each Group Using ROW_NUMBER()
  • How to move tempdb
  • How to Kill All MySQL Processes For a Specific User
  • Using exec sp_who2 to help with SQL Server troubleshooting

Recent Posts

  • How to Setup MySQL Master Master Replication
  • How To Use SQL to Convert a STRING to an INT
  • How to set up MySQL Replication Tutorial
  • How to Use SQL CASE for Conditional Logic in Your SQL Queries
  • Using ISNULL in SQL Server to Replace NULL Values

Search

Connect

  • Twitter
  • Facebook
  • Google+
  • RSS

About

  • Cookie Policy
  • Disclaimer
  • About
Copyright © ‘2019’ DBA Diaries built on the Genesis Framework

This site uses cookies. We assume you are happy with cookies but click the link if you are not. Close