Oct
09
2015
--

When mysqld kills mysqld

Server ShutdownThe other day a colleague and friend of mine, Peter Boros, had a case where one of our clients had to track down the process shutting down MySQL. This blog is based on the discussion we had about that internally.

Our client wanted Peter to identify the culprit behind periodic shutdowns. This proved to be slightly more difficult than usual, for reasons that you might guess from the title of this blog.

Here is what Peter saw in the logs:

150928 15:15:33 [Note] /usr/sbin/mysqld: Normal shutdown
150928 15:15:36 [Note] Event Scheduler: Purging the queue. 0 events
150928 15:15:39 [Warning] /usr/sbin/mysqld: Forcing close of thread 411515  user: 'dashboard'
150928 15:15:40  InnoDB: Starting shutdown...
150928 15:15:59  InnoDB: Shutdown completed; log sequence number 4873840375
150928 15:16:00 [Note] /usr/sbin/mysqld: Shutdown complete

Some of you may recall that I wrote a blog post about tools that can help you identify other processes that send signals to mysqld. Peter chose SystemTap to track down the process. The script he used to trace it was from David Busby’s blog:

probe signal.send {
  if (sig_name == "SIGKILL" || sig_name == "SIGTERM")
    printf("[%s] %s was sent to %s (pid:%d) by %s uid:%dn",
    	ctime(gettimeofday_s()),
    	sig_name,
    	pid_name,
    	sig_pid,
    	execname(),
    	uid()
    )
}

Using this SystemTap script Peter discovered that the “killer” was mysqld:

[Wed Sep 16 18:57:33 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497
[Wed Sep 16 18:57:34 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497
[Wed Sep 16 18:57:34 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497
[Wed Sep 16 18:57:40 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497
[Mon Sep 28 15:15:31 2015] SIGTERM was sent to mysqld (pid:12203) by mysqld uid:497
[Mon Sep 28 15:15:31 2015] SIGTERM was sent to mysqld (pid:12203) by mysqld uid:497
[Mon Sep 28 15:16:00 2015] SIGTERM was sent to mysqld (pid:12203) by mysqld uid:497

As you can see above, mysqld received a SIGTERM from mysqld. How is this possible? Let’s try to interpret what happened here!

According to the manual, server shutdown can be initiated in different ways. For instance:

  • SIGTERM is sent to mysqld by a UNIX user
  • server is shut down administratively via mysqladmin shutdown by a privileged mysql user

Let’s assume that we are talking about the first example, where a privileged process/script sends a SIGTERM to mysqld. If that was the case we would get:

[root@centos7 ~]# kill -15 `pidof -x mysqld`
[root@centos7 ~]#

[root@centos7 ~]# ./signals.stp
[Thu Oct  1 17:56:36 2015] SIGTERM was sent to mysqld (pid:2105) by bash uid:0
[Thu Oct  1 17:56:37 2015] SIGTERM was sent to mysqld (pid:2105) by mysqld uid:995

The first line in the above output shows the client process (bash) that issued the TERM signal to MySQL. In response, MySQL started a signal handler thread and processed shutdown (COM_SHUTDOWN) using that thread. In turn the referenced function kill_mysqld() may send a signal to current_pid on behalf of the initiating process.

As a side note, in the above output you don’t see anything related to threads. You could get even more detail about MySQL’s operation if you were to modify the tapscript to include tgkill system calls and display related thread IDs as well:

#!/usr/bin/env stap
probe signal.send, signal.systgkill {
  if (sig_name == "SIGKILL" || sig_name == "SIGTERM")
    printf("[%s] %s was sent to %s (pid:%d) by %s (pid: %d, tid:%d) uid:%dn",
        ctime(gettimeofday_s()),
        sig_name,
        pid_name,
        sig_pid,
        execname(),
        pid(),
        tid(),
        uid()
    )
}

While this might be useful to better comprehend how mysqld behaves, it is irrelevant in our search for the culprit process, so I’m not going to include the output of that script here – that exercise will be left to you, dear reader!

Now what happens if a MySQL user with administrative privileges initiates the shutdown via the console instead? We get:

[root@centos7 ~]# mysqladmin shutdown
[root@centos7 ~]#

[root@centos7 ~]# ./signals.stp
[Thu Oct  1 17:59:08 2015] SIGTERM was sent to mysqld (pid:3906) by mysqld uid:995
[Thu Oct  1 17:59:10 2015] SIGTERM was sent to mysqld (pid:3906) by mysqld uid:995

You see that this time the sender was mysqld, which thoroughly resembles the original output that Peter had. Thus, we now know that what we were looking for was a program called mysqladmin shutdown!

Unfortunately, this means that the program may not be local and the client could connect from a different host. A local recursive grep may or may not solve our problem. However, if we enable general-log with log-warnings=2 it might yield something like:

[root@centos7 ~]# tail -F /var/log/mysqld_generic.log
151001 17:07:57	    5 Connect	robert@192.168.3.101 on
		    		5 Shutdown

Thus, we now know where to run our recursive grep for that rogue mysqladmin shutdown (or a similar, API-leveraging) process! In my case it was running on remote host 192.168.3.1 and connected as MySQL user ‘robert’.

Of course you could find alternative methods to track down MySQL connections but that’s beyond what I intended to include in this blog. Perhaps in the next?

The post When mysqld kills mysqld appeared first on MySQL Performance Blog.

Mar
06
2015
--

What stopped MySQL? Tracing back signals sent to MySQL

Have you ever had a case where you needed to find a process which sent a HUP/KILL/TERM or other signal to your database? Let me rephrase. Did you ever have to find which process messed up your night? ;) If so, you might want to read on. I’m going to tell you how you can find it.

Granted, on small and/or meticulously managed systems tracking down the culprit is probably not a big deal. You can likely identify your process simply by checking what processes have enough privileges to send mysqld a HUP/KILL/TERM signal. However, frequently we see cases where this may not work or the elimination process would be too tedious to execute.

We recently had a case where a process was frequently sending SIGHUPs to mysqld and the customer asked us to see if we could get rid of his annoyance. This blog is the direct result of a discussion I had with my colleague Francisco Bordenave, on options available to deal with his issue. I’m only going to cover a few of them in this blog but I imagine that most of you will be able to find one that will work for your case. Note that most tracing tools add some overhead to the system being investigated. The tools presented in the following are designed to be lightweight so the impact should be well within acceptable range for most environments.

DISCLAIMER: While writing this blog I discovered that David Busby has also discussed one of the tools that I’m going to cover in his article. For those who have read the article note that I’m going to cover other tools as well and I will also cover a few extra SystemTap details in this blog. For those who haven’t yet had chance to read David’s blog, you can read it here.

All right, let’s see what “low hanging tools” there are available to us to deal with our issue!

Linux

  • SystemTap: widely available on Linux but usually not enabled by default. You need to install debuginfo and devel kernel packages and systemtap itself. Similar to DTrace.
  • Perf: although not quite written for generic tracing, due to its ability to trace system calls we can use it to our advantage if we trace sys_enter_sigkill.
  • Audit: generic system auditing platform. Given its nature, we can use it to track down many things, including rogue processes sending HUP signals to our poor mysqld!
  • Code!: Given that MySQL is opensource, you could customize the signal handler to obtain extra information. See more in sigaction(2) and the SA_SIGINFO flag. I’m not sure if this should be listed as a more efficient solution but it’s an option nevertheless. I guess one could also preload/inject his own singal handler via an LD_PRELOAD trick and a custom library but that’s beyond the scope what I intend to cover. However, for certain signals (most notably, SIGSEGV) you may not need to write your own tools as the OS may already come with libs/tools that can assist you. See Ulrich Drepper’s catchsegv or /usr/lib64/libSegFault.so, for instance.
  • Debuggers: These may be efficient to use in some cases but I won’t cover them this time, either.

FreeBSD/Solaris

  • DTrace: a very decent, stable tracing platform. Included in most recent kernels by default for the mentioned platforms (FreeBSD 9.2+, FreeBSD 10+, Solaris 10+).

In this article I’m going to focus on Linux as that’s what people in the MySQL community seem to care about most nowadays. The tools that I will discuss will be SystemTap, Perf and Audit. If you feel that you would like to read about the rest, let me know and I will cover the rest of the options in a followup article.

SystemTap

I’m going to set up SystemTap on a recent, 64 bit CentOS 7 box. I will only cover basic install, you can find more about how to install SystemTap here.

The strength of SystemTap is definitely its flexibility, potentially the best tool for solving our problem on the Linux platform. It’s been around for some time and is generally regarded mature but I would recommend to test your “tapscripts” in dev/qa before you run them in production.

Installing SystemTap

Follow below steps to install SystemTap:

[root@centos7]~# sed -i 's/enabled=0/enabled=1/' /etc/yum.repos.d/CentOS-Debuginfo.repo
[root@centos7]~# yum repolist
...
base-debuginfo/x86_64                         CentOS-7 - Debuginfo                                          1,688
...

[root@centos7]~# yum install kernel-debuginfo kernel-debuginfo-common kernel-devel
[root@centos7]~# yum install systemtap systemtap-runtime

Tracing with SystemTap

Create a tapscript like the one below:

[root@centos7]~# cat find_sighupper.stp
#!/usr/bin/stap
# Prints information on process which sent HUP signal to mysqld
probe begin {
  printf("%-26s %-8s %-5s %-8s %-5sn", "TIME", "SOURCE", "SPID", "TARGET", "TPID");
}
probe nd_syscall.kill.return {
  sname = @entry(execname());
  spid = @entry(pid());
  sig = @entry(uint_arg(2));
  tpid = @entry(uint_arg(1));
  tname = pid2execname(tpid);
  time = ctime(gettimeofday_s());
  if (sig == 1 && tname == "mysqld")
    printf("%-26s %-8s %-5d %-8s %-5dn", time, sname, spid, tname, tpid);
}

Then run the tap script in a dedicated terminal:

[root@centos7]~# stap find_sighupper.stp
TIME                       SOURCE   SPID  TARGET   TPID

Send your HUP signal to mysqld from another terminal:

[root@centos7]~# kill -1 1984

The culprit should will show up on your first window like so:

[root@centos7]~# stap find_sighupper.stp
TIME                       SOURCE   SPID  TARGET   TPID
Thu Feb 26 21:20:44 2015   kill     6326  mysqld   1984
^C

Note that with this solution I was able to define fairly nice constraints relatively easily. With a single probe (well, quasi, as @entry refers back to the callee) I was able to get all this information and filter out HUP signals sent to mysqld. No other filtering is necessary!

Perf

Perf is another neat tool to have. As its name implies, it was originally developed for lightweight profiling, to use the performance counters subsystem in Linux. It became fairly popular and got extended many times over these past years. Since it happens to have probes we can leverage, we are going to use it!

Installing Perf

As you can see, installing Perf is relatively simple.

# yum install perf

Start perf in a separate terminal window. I’m only going to run it for a minute but I could run it in screen for a longer period of time.

[root@centos7 ~]# perf record -a -e syscalls:sys_enter_kill sleep 60

In a separate terminal window send your test and obtain the results via “perf script”:

[root@centos7 ~]# echo $$
11380
[root@centos7 ~]# pidof mysqld
1984
[root@centos7 ~]# kill -1 1984
[root@centos7 ~]# perf script
# ========
# captured on: Thu Feb 26 14:25:02 2015
# hostname : centos7.local
# os release : 3.10.0-123.20.1.el7.x86_64
# perf version : 3.10.0-123.20.1.el7.x86_64.debug
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
# cpuid : GenuineIntel,6,70,1
# total memory : 1885464 kB
# cmdline : /usr/bin/perf record -a -e syscalls:sys_enter_kill sleep 60
# event : name = syscalls:sys_enter_kill, type = 2, config = 0x9b, config1 = 0x0, config2 = 0x0, excl_usr = 0, exc
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: software = 1, tracepoint = 2, breakpoint = 5
# ========
#
            bash 11380 [000]  6689.348219: syscalls:sys_enter_kill: pid: 0x000007c0, sig: 0x00000001

As you can see in above output process “bash” with pid of 11380 signalled pid 0x07c0 (decimal: 1984) a HUP signal (0x01). Thus, we found our culprit with this method as well.

Audit

You can read more about Audit in the Red Hat Security Guide.

Installing Audit

Depending on your OS installation, it may be already installed.

If case it is not, you can install it as follows:

[root@centos7 ~]# yum install audit

When you are done installing, start your trace and track 64 bit kill system calls that send HUP signals with signal ID of 1:

[root@centos7]~# auditctl -l
No rules
[root@centos7]~# auditctl -a exit,always -F arch=b64 -S kill -F a1=1
[root@centos7]~# auditctl -l
LIST_RULES: exit,always arch=3221225534 (0xc000003e) a1=1 (0x1) syscall=kill
[root@centos7]~# auditctl -s
AUDIT_STATUS: enabled=1 flag=1 pid=7010 rate_limit=0 backlog_limit=320 lost=0 backlog=0
[root@centos7]~# pidof mysqld
1984
[root@centos7]~# kill -1 1984
[root@centos7]~# tail -2 /var/log/audit/audit.log
type=SYSCALL msg=audit(1425007202.384:682): arch=c000003e syscall=62 success=yes exit=0 a0=7c0 a1=1 a2=a a3=7c0 items=0 ppid=11380 pid=3319 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="zsh" exe="/usr/bin/zsh" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=OBJ_PID msg=audit(1425007202.384:682): opid=1984 oauid=-1 ouid=995 oses=-1 obj=system_u:system_r:mysqld_t:s0 ocomm="mysqld"

As you can see from above output, the results showed up nicely in the system audit.log. From the log it’s clear that I sent my SIGHUP to mysqld (pid 1984, “opid” field) from zsh (see the command name in the “comm” field) via the 64 bit kill syscall. Thus, mischief managed, once again!

Summary

In this blog I presented you three different tools to help you trace down sources of signals. The three tools each have their own strengths. SystemTap is abundant of features and really nicely scriptable. The additional features of auditd may make it appealing to deploy to your host. Perf is a great tool for CPU profiling and you might want to install it solely for that reason. On the other hand, your distribution might not have support compiled in its kernel or may make the setup harder for given tool. In my experience most modern distributions support the tools discussed here so the choice comes down to personal preference or convenience.

In case you were wondering, I often pick auditd because it is often already installed. SystemTap might be a bit more complicated to setup but I would likely invest some extra time into the setup if my case is more complex. I primary use perf for CPU tracing and tend to think of the other two tools before I think of perf for tracing signals.

Hope you enjoyed reading! Happy [h/t]racking!

The post What stopped MySQL? Tracing back signals sent to MySQL appeared first on MySQL Performance Blog.

Jul
18
2014
--

Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue

The Percona Managed Services team recently faced a somewhat peculiar client issue. We’d receive pages about their MySQL service being unreachable. However, studying the logs showed nothing out of the ordinary…. for the most part it appeared to be a normal shutdown and there was nothing in anyone’s command history nor a cron task to speak of that was suspicious.

This is one of those obscure and peculiar (read: unique) issues that triggered an old memory; I’d seen this behavior before and I had just the tool to catch the culprit in the act.

Systemtap made diagnostics of this issue possible and I can’t state enough how much of a powerful and often under-utilized tool set systemtap really is.

cat > signals.stp << EOF
probe signal.send {
if (sig_name == “SIGKILL” || sig_name == “SIGTERM”)
printf(“[%s] %s was sent to %s (pid:%d) by %s uid:%dn”,
ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), uid())
}
EOF

sudo stap ./signals.stp > signals.log 2>signals.err

grep mysqld signals.log
[Wed Jun 11 19:03:23 2014] SIGKILL was sent to mysqld (pid:8707) by cfagent uid:0
[Fri Jun 13 21:37:27 2014] SIGKILL was sent to mysqld (pid:6583) by cfagent uid:0
[Sun Jun 15 05:05:34 2014] SIGKILL was sent to mysqld (pid:19818) by cfagent uid:0
[Wed Jul 9 07:03:47 2014] SIGKILL was sent to mysqld (pid:4802) by cfagent uid:0

Addendum: It had been so long since I had used this tooling that I could not remember the original source from which I derived the module above; some cursory searching to rectify this issue for this blog post found this original source by Eugene Teo of Red Hat made available under GPLv2.

From this we were able to show that cfagent was killing the mysqld process presumably via a misconfigured job; this information was returned to the client and this has continued to be run in production for two months now at the client’s request with no issues to speak of.

This is by no means the limit to what systemtap can be used to achieve; you can hook into functions though whilst you may need to install the debug packages to find what functions are available run for example:

sudo stap -L 'process("/usr/sbin/mysqld").function("*")' > /tmp/mysql_stapfunc
...
head /tmp/mysql_stapfunc
process("/usr/sbin/mysqld").function("TIME_from_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_from_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_set_hhmmss")
process("/usr/sbin/mysqld").function("TIME_set_yymmdd")
process("/usr/sbin/mysqld").function("TIME_to_longlong_date_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_datetime_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_packed")
process("/usr/sbin/mysqld").function("TIME_to_longlong_time_packed")
process("/usr/sbin/mysqld").function("TIME_to_ulonglong")
...

This is also true of the kernel using sudo stap -L 'kernel.function("*")' > /tmp/kernel_stapfunc however you must be booted into a debug kernel for this to function.

Systemtap is more than a worthy tool to have at your disposal with plenty of examples available.

Finally I invite you to join me July 23 at 10 a.m. Pacific time for my webinar, “What Every DBA Needs to Know About MySQL Security.” This detailed technical webinar provides insight into best security practices for either setting up a new MySQL environment or upgrading the security of an existing one. I hope to see you there!

The post Systemtap solves phantom MySQLd SIGTERM / SIGKILL issue appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com