Database administrators often need to identify inactive databases and users to save resources. This can be done using various methods to determine which databases and users are frequently accessed. We can save disk space and other resources by purging inactive databases and users. This task is part of regular maintenance and helps to manage disk space and other resources effectively.
This post will explain easy ways to identify active databases and users in MySQL.
Identifying active databases in MySQL
There are various ideas we can use to identify the active databases. We can identify based on the timestamp of the data file update, an entry in a slow log, scan binary logs, and parse the audit log or general log. But we can ignore the general log from this investigation as it consumes a lot of disk space, which is an overhead.
We have the below methods to identify active database objects:
- Review slow logs to identify database objects being queried.
- Find the updated timestamp on datadir files.
- Scan binary logs for identifying tables that are being modified.
- Scan audit logs for identifying database objects.
During our tests, we observed that the above methods helped us identify the active databases; however, we sometimes had to use two or more ways to get the best result. Also, if there are monthly read or written tables, the scope of the review increases, and you will have to maintain the logs for a more extended period.
We also have two stats table features present in Percona Server for MySQL and MySQL Community versions as follows:
- User Statistics
- Sys-schema-table-statistics
User Statistics: We have another way to identify the active databases, users, and other information in Percona Server for MySQL (and MariaDB). Both have userstat, a new plugin feature that is disabled by default and must be enabled by setting userstat to ON. This ensures that the statistics collection doesn’t cause any extra load on the server unless desired.
Sys-schema-table-statistics: sys-schema-table-statistics summarizes table statistics. By default, rows are sorted by descending total wait time. Sys schema stats table will help you to know the table statistics until MySQL Service Restart. Now, as MySQL restarts, the stats are lost; hence, we will not cover it in this blog.
We have a great utility called pt-table-usage from Percona Toolkit, which can identify database objects from logs.
Identify database tables using pt-table-usage
The pt-table-usage tool reads queries from a log and analyzes how they use tables. This tool can generate a report based on the slow query log and show you what tables were used in queries. Here you can explore more options for pt-table-usage.
To identify the databases, you must depend on slow logs where we can find the read queries. Make sure to configure slow_query_log and long_query_time parameters as below:
SET GLOBAL slow_query_log=1;
SET GLOBAL long_query_time=0;
SET GLOBAL slow_query_log_use_global_control='long_query_time';
Slow_query_log: MySQL’s slow query log feature enables you to log queries that exceed a predefined time limit. When we use SET GLOBAL slow_query_log=1; the slow log is enabled, and to disable the slow_query_log, it needs to be set as 0.
Long_query_time: When the slow_query_log is enabled, it logs any query that takes longer than long_query_time seconds to run. When configured with SET GLOBAL long_query_time=0;, all queries will be logged.
SET global slow_query_log_use_global_control=’long_query_time’;: This variable is helpful to apply the global settings for current running sessions in Percona Server for MySQL. You can take an overview of slow_query_log_use_global_control with other options.
So, the above parameter settings will help to log all queries in the slow log file. Also, we can add the below parameter if we want to reduce the overhead of logging all queries and disk space.
SET GLOBAL Log_slow_rate_type = query;
SET GLOBAL Log_slow_rate_limit = 10;
Note: If we set the value to 10, then 10% of sessions/queries will be logged overhead of logging all queries and disk space. Once we have the slow log collection done for a sufficient time, we can parse the slow log and fetch the database objects actively used.
pt-table-usage reads queries from a log and analyzes how they use tables. This tool can generate a report based on the slow query log and show you what tables were used in queries. Here you can explore more options for pt-table-usage.
The tool prints a usage report for each table in every query, similar to the following:
$ sudo pt-table-usage /var/lib/mysql/ip-172-31-92-72-slow.log > table_usage.log
$ cat table_usage.log | head -10
Query_id: 0xA212AD93263CF26F.1
SELECT DUAL
Query_id: 0xC684EA1D78348D23.1
SELECT information_schema.innodb_metrics
WHERE information_schema.innodb_metrics
Query_id: 0x153F1CE7D660AE82.1
SELECT information_schema.processlist
WHERE information_schema.processlist
From the above, the output used pt-table-usage with slow log (/var/lib/mysql/ip-172-31-92-72-slow.log) directed to a log file containing query and database object details.
Now parse the output table_usage.log using the below command to identify unique database objects.
$ cat table_usage.log | egrep "WHERE|JOIN" | grep -v "__SQ" | grep -v "DUAL" | awk '{print $2}' | sort | uniq -c | sort -nr
6703 information_schema.innodb_metrics
3352 performance_schema.table_lock_waits_summary_by_table
3352 information_schema.processlist
3352 INFORMATION_SCHEMA.PLUGINS
2790 information_schema.tables
558 performance_schema.table_io_waits_summary_by_table
558 performance_schema.table_io_waits_summary_by_index_usage
558 information_schema.schemata
5 authors
4 information_schema.table_statistics
Identify databases using slow logs
Once we have a collection of slow logs for a certain period with the above slow log configuration to capture all queries. We can easily fetch the databases read from a slow log file through Unix commands.
Below command will list out all database names based on usage.
$ sudo grep Schema: /var/lib/mysql/mysql-slow.log| awk -F' ' '{print $4 $5}' | sort | uniq -c
2717 Schema:mytestdb
123 Schema:percona
762238 Schema:sbtest
Example for 8.0 Version:
$ sudo grep Schema: /var/lib/mysql/mysql-slow.log| awk -F' ' '{print $2 $3}' | sort | uniq -c
2717 Schema:mytestdb
273 Schema:percona
223567 Schema:sbtest
We can grep tables as well by exploring with Unix grep or awk commands.
Identifying active users in MySQL
We have two methods to identify active users for a specific period.
- Parse Audit Logs to identify active users.
- Identify active users from processlist logs.
Disclaimer: Though we have user_stat plugin feature for identifying the active users in Percona Server for MySQL (and MariaDB). But the statistics in the INFORMATION_SCHEMA.CLIENT_STATISTICS table are available only from the last MySQL service restart.
Parse Audit Logs to identify active users
The Percona Audit Log Plugin monitors and logs connection and query activity performed on a specific server. Information about the activity is stored in a log file. The Audit Log plugin is installed but, by default, is not enabled when you install Percona Server for MySQL. To check if the plugin is enabled, run the following commands:
mysql> SELECT * FROM information_schema.PLUGINS WHERE PLUGIN_NAME LIKE '%audit%';
Follow this link to enable audit_log_plugin and other options in Percona Server for MySQL.
We use audit_log_rotations and audit_log_rotate_on_size parameters to manage the disk space.
SET GLOBAL audit_log_rotate_on_size=1073741824;
SET GLOBAL audit_log_rotations=5;
Audit_log_rotate_on_size: This variable is measured in bytes and specifies the maximum size of the audit log file. Upon reaching this size, the audit log will be rotated. The rotated log files are present in the same directory as the current log file. The sequence number is appended to the log file name upon rotation. For this variable to take effect, set the audit_log_handler variable to FILE.
Audit_log_rotations: The variable is used to specify how many log files should be kept when the audit_log_rotate_on_size variable is set to a non-zero value. This variable has an effect only when audit_log_handler is set to FILE.
Once these variables are set, we can fetch the user details from these logs. You can use more filters to get specific users, database objects, host details, etc.
$ sudo grep -w 'USER' /var/lib/mysql/audit.log | tail
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm[pmm] @ localhost [127.0.0.1]"
USER="pmm"
USER="pmm"
Identify Active Users from processlist logs:
To identify active users from processlist, we will need the data collected over time to have the most possibly accurate information. The processlist will collect the information only at a specific interval; if the user is not executing the query at that time, there’s a high chance of missing it. Thus the granularity of collecting the processlist is critical to get the most accurate user.
So, here we will collect a processlist every five seconds using a chron, save it in a daily file, and analyze it.
processlist_DB1_20221215.log – Processlist file name collected for one day.
$ cat processlist_DB1_20221215.log | head
================================================================================
2022-12-15 00:00:05: Uptime: 8935629 Threads: 7 Questions: 5258388359 Slow queries: 96680 Opens: 4096 Flush tables: 1 Open tables: 1981
40 sampleuser 10.11.8.11:54540 sampledb Sleep 5 None 1 1
41 sampleuser 10.11.8.11:54542 sampledb Sleep 5 None 38 38
64 sbtest_user 10.11.8.11:54610 sbtest Sleep 5 None 3 3
65 sbtest_user 10.11.8.11:54612 sbtest Sleep 5 None 0 544
66 sbtest_user 10.11.8.11:54614 sbtest Sleep 5 None 1 15
101 sampleuser 10.11.8.11:54736 sampledb Sleep 5 None 2 14
106 percona 10.11.8.52:53386 percona Sleep 5 None 0 0
Once we have a collection of processlist logs for a particular period, we will fetch the active users information from them.
$ for f in `ls processlist_DB1*202212*`; do echo $f; cat $f | grep -e ^[1-9] | grep -vie "system\ user\|Uptime" | awk -F' ' '{print $2}' >> /tmp/USERZ;done; cat /tmp/USERZ | sort | uniq -c
11 mytestdb_user
42498 percona
8640 sbtest_user
processlist_DB1*202208* – Processlist file names collected for a specific time.
This command lists the active users on this node/database. From this output, we can filter out the user and then, with awk, sort commands to fetch the user list using a specific database.
At last, comes the general log, the ruler of all logs. But yes, it is logging everything that happens in MySQL.
General Log: The General Log is another option for retrieving information about database activity. It logs all activity on the instance, but it has some limitations. Enabling the General Log can consume a lot of disk space and generate a high I/O load due to the constant writing of operations to the log.
Therefore, it is generally better to avoid using the General Log and to use the methods described above instead. It would also be helpful if MySQL had a system table that stored active/inactive information about databases, users, and other items, even after MySQL restarts.
Conclusion
Following the process described above, we can identify the most active databases and users. We can collect information over a specific period and follow the same process to find the list of inactive databases. Any databases and users that do not appear in the output can be considered inactive. However, it is essential to note that we should consult with the application team before marking any databases or users as inactive, as some jobs may only run periodically (e.g., quarterly or semi-annually).
Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!
Download Percona Distribution for MySQL Today