In this blog, we’ll discuss the ramifications of the Galera Error Failed to Report Last Committed (Interrupted System Call).
I have recently seen this error with Percona XtraDB Cluster (or Galera):
[Warning] WSREP: Failed to report last committed 549684236, -4 (Interrupted system call)
It was posted in launchpad as a bug in 2013: https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1434646
My colleague Przemek replied, and explained it as:
Reporting the last committed transaction is just a part of the certification index purge process. In case it fails for some reason (it occasionally does), the cert index purge may be a little delayed. But it does not mean the transaction was not applied successfully. This is a warning after all.
If we look up this error in the source code, we realize it is reusing Linux system errors. Specifically:
#define EINTR 4 /* Interrupted system call */
As there isn’t much documentation regarding this error, and internet searches did not bring up useful information, my colleague David Bennett and I delved into the source code (as we do on occasion).
If we look in the Galera source code gcs_sm.hpp
we see:
289 * @retval -EINTR - was interrupted by another thread
We also see:
317 /* was interrupted, will be handled by someone else */
This means that the thread was interrupted, but the server will retry on another thread. As it is just a warning, it isn’t anything to be too concerned about – unless they begin to pile up (which could be a sign of concurrency issues).
The specific warning is thrown from galera_service_thd.cpp
here:
58 if (gu_unlikely(ret < 0))
59 {
60 log_warn << "Failed to report last committed "
61 << data.last_committed_ << ", " << ret
62 << " (" << strerror (-ret) << ')';
63 // @todo: figure out what to do in this case
64 }
This warning could be handled better so as to not bloody the logs, or sound cryptic enough to concern administrators.