Saturday, December 2, 2017

Using strace for MySQL Troubleshooting

I'd say that strace utility is even more useful for MySQL DBAs than lsof. Basically, it is a useful general purpose diagnostic and debugging tool for tracing system calls some process makes and signals it receives. The name of each system call, its arguments and its return value are printed to stderr or to the file specified with the -o option.

In context of MySQL strace is usually used to find what files mysqld process accesses, and to check the details about any I/O errors. For example, if I'd really want to try to verify Bug #88479 - "Unable to start mysqld using a single config file (and avoiding reading defaults)" by Simon Mudd, I'd just run mysqld from some 5.7.x version as a command argument for strace. On my Ubuntu 14.04 netbook I have the following files:
openxs@ao756:~/dbs/5.7$ ls -l /usr/my.cnf
-rw-r--r-- 1 root root 943 лип 19  2013 /usr/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/my.cnf
-rw-r--r-- 1 root root 260 чер 24 20:40 /etc/my.cnf
openxs@ao756:~/dbs/5.7$ ls -l /etc/mysql/my.cnf
-rw-r--r-- 1 root root 116 лют 26  2016 /etc/mysql/my.cnf
openxs@ao756:~/dbs/5.7$ bin/mysqld --version
bin/mysqld  Ver 5.7.18 for Linux on x86_64 (MySQL Community Server (GPL))
So, what happens if I try to run mysqld --defaults-file=/etc/mysql/my.cnf, like this:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
openxs@ao756:~/dbs/5.7$
It seems we proved that only the file passed as --defaults-file is read (if it exists). By default other locations are also checked in a specific order (note that return codes are mapped to symbolic errors when possible):
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --print-defaults --print-defaults 2>&1 | grep 'my.cnf'
stat("/etc/my.cnf", {st_mode=S_IFREG|0644, st_size=260, ...}) = 0
open("/etc/my.cnf", O_RDONLY)           = 3
stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
stat("/home/openxs/dbs/5.7/etc/my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
stat("/home/openxs/.my.cnf", 0x7ffd68f0e020) = -1 ENOENT (No such file or directory)
openxs@ao756:~/dbs/5.7$
If we think that --print-defaults may matter, we can try without it:
openxs@ao756:~/dbs/5.7$ strace bin/mysqld --defaults-file=/etc/mysql/my.cnf 2>&1 | grep 'my.cnf'stat("/etc/mysql/my.cnf", {st_mode=S_IFREG|0644, st_size=116, ...}) = 0
open("/etc/mysql/my.cnf", O_RDONLY)     = 3
^C
openxs@ao756:~/dbs/5.7$
Last example also shows how one can terminate tracing with Ctrl-C.

Now, let me illustrate typical use cases with (surprise!) some public MySQL bug reports where strace was important to find or verify the bug:
  • Bug #20748 - "Configuration files should not be read more than once". In this old bug report Domas Mituzas proved the point by running mysqld as a command via strace. He filtered out lines related to opening my.cnf file with egrep and made it obvious that one file may be read more than once. The bug is closed long time ago, but it is not obvious that all possible cases are covered, based on further comments...
  • Bug #49336 - "mysqlbinlog does not accept input from stdin when stdin is a pipe". Here bug reporter shown how run mysqlbinlog as a command under strace and redirect output to the file with -o option.
  • Bug #62224 - "Setting open-files-limit above the hard limit won't be logged in errorlog". This minor bug in all versions before old 5.6.x (that still remains "Verified", probably forgotten) was reported by Daniël van Eeden. strace allowed him to show what limits are really set by setrlimit() calls. The fact that arguments of system calls are also shown matters sometimes.
  • Bug #62578 - "mysql client aborts connection on terminal resize". This bug report by Jervin Real from Percona is one of my all time favorites! I clearly remember how desperate customer tried to to dump data and load them on a remote server with a nice command line, and after waiting for many days (mysqldump was run for a data set of 5TB+, don't ask, not my idea) complained that they got "Lost connection to MySQL server during query" error message and loading failed. Surely, the command was run from the terminal window on his Mac and in the process of moving it here and there he just resized the window  by chance... You can see nice example of strace usage with MySQL Sandbox in the bug report, as well as some outputs for SIGWINCH signal. Note that the bug is NOT fixed by Oracle in MySQL 5.5.x (and never happened in 5.6+), while Percona fixed it since 5.5.31.
  • Bug #65956 - "client and libmysqlclient VIO drops connection if signal received during read()". In this yet another 5.5 only regression bug that was NOT fixed in 5.5.x by Oracle (until in frames of Bug #82019 - "Is client library supposed to retry EINTR indefinitely or not" the patch was contributed for a special case of it by Laurynas Biveinis from Percona, that in somewhat changed form allowed to, hopefully, get some fix in 5.5.52+), the problem was noted while trying to attach strace to running client program (with -p option).
  • Bug #72340 - "my_print_defaults fails to parse include directive if there is no new line". Here bug reporter used strace (with -f option to trace child/forked processes and -v option get verbose output) to show that file name is NOT properly recognized by my_print_defaults. The bug is still "Verified".
  • Bug #72701 - "mysqlbinlog uses localtime() to print events, causes kernel mutex contention". Yet another nice bug report by Domas Mituzas, who had used -e stat options of strace to trace only stat() calls and show how many times they are applied to /etc/localtime.The bug is fixed since 5.5.41, 5.6.22 and 5.7.6 removed related kernel mutex contention.
  • Bug #76627 - "MySQL/InnoDB mix buffered and direct IO". It was demonstrated with strace that InnoDB was opening each .ibd file twice (this is expected as explained by Marko Mäkelä) in different modes (and this was not any good). The bug is fixed in recent renough MySQL server versions.
  • Bug #80020 - "mysqlfrm doesn't work with 5.7". In this bug report Aleksandr Kuzminsky had to use strace to find out why mysqlfrm utility failed mostly silently even with --verbose option.
  • Bug #80319 - ""flush tables" semantics deviate from manual". Here Jörg Brühe proved with strace that close() system call is not used for individual InnoDB table t when FLUSH TABLES t is executed. manual had to be clarified.
  • Bug #80889 - "PURGE BINARY LOGS TO is reading the whole binlog file and causing MySql to Stall". By attaching strace to running mysqld process and running the command it was shown that when GTID_MODE=OFF the entire file we purge to was read. No reason to do this, really. Note how return value of open(), file descriptor, was further used to track reads from this specific file.
  • Bug #81443 - "mysqld --initialize-insecure silently fails with --user flag". As MySQL DBA, you should be ready to use strace when requested by support or developers. Here bug reporter was asked to use it, and this revealed a permission problem (that was a result of the bug). No need to guess what may go wrong with permissions - just use strace to find out!
  • Bug #84708 - "mysqld fills up error logs with [ERROR] Error in accept: Bad file descriptor". Here strace was used to find out that "When mysqld is secured with tcp_wrappers it will close a socket from an unauthorized ip address and then immediately start polling that socket"... The bug is fixed in MySQL 5.7.19+ and 8.0.2+, so take care!
  • Bug #86117 - "reconnect is very slow". I do not care about MySQL Shell, but in this bug report Daniël van Eeden used -r option of strace to print a relative timestamp for every system call, and this was it was clear how much time was spent during reconnect, and where it was spent. Very useful!
  • Bug #86462 - "mysql_ugprade: improve handling of upgrade errors". In this case strace allowed Simon Mudd to find out what exact SQL statement generated by mysql_upgrade failed.
The last but not the least, note that you may need root or sudo privileges to use strace. On my Ubuntu 14.04 this kind of messages may appear even if I am the user that owns mysqld process (same with gdb):
openxs@ao756:~/dbs/maria10.2$ strace -p 2083
strace: attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
To summarize, strace may help MySQL DBA to find out:

  • what files are accessed by the mysqld process or related utilities, and in what order
  • why some MySQL-related command (silently) fails or hangs
  • why some commands end up with permission denied or other errors
  • what signals MySQL server and tools get
  • what system calls could took a lot of time when something works slow
  • when files are opened and closed, and how much data are read from the files
  • where the error log and other logs are really located (we can look for system calls related to writing to stderr, for example)
  • how MySQL really works with files, ports and sockets
It also helps to find and verify MySQL bugs, and clarify missing details in MySQL manual.

There other similar tools for tracing system calls (maybe among other things) on Linux that I am going to review in this blog some day. Performance impact of running MySQL server under this kind of tracing is also a topic to study.

No comments:

Post a Comment