Jun
09
2015
--

Auditing MySQL with McAfee and MongoDB

Greetings everyone! Let’s discuss a 3rd Party auditing solution to MySQL and how we can leverage MongoDB® to make sense out of all of that data.

The McAfee MySQL Audit plugin does a great job of capturing, at low level, activities within a MySQL server. It does this through some non-standard APIs which is why installing and configuring the plugin can be a bit difficult. The audit information is stored in JSON format, in a text file, by default.

There is 1 JSON object for each action that takes place within MySQL. If a user logs in, there’s an object. If that user queries a table, there’s an object. Imagine 1000 active connections from an application, each doing 2 queries per second. That’s 2000 JSON objects per second being written to the audit log. After 24 hours, that would be almost 173,000,000 audit entries!

How does one make sense of that many JSON objects? One option would be to write your own parser in $YOUR_FAVORITE_LANGUAGE and convert the JSON to INSERT statements and write the data back to MySQL (Note: If you do this, you can whitelist this table within the plugin so that these INSERTs are not re-audit logged). Or, we can use a system designed to import, store and query JSON objects, such as MongoDB.

Install McAfee Audit Plugin

First we need to download the source code for the plugin and download the source code for the specific MySQL version you are running. This is not a complete step-by-step HOWTO on installing this plugin; just some high-level points.

My client for this exercise is still on Percona Server 5.1.73, so we need the source for that EXACT version from percona.com.

We can clone the mcafee/mysql-audit using git.

Unzip the MySQL source and compile it; just don’t do “make install”, only “./configure” and “make” are necessary.

Now compile the plugin. You may want to read the detailed instructions.

This next step is tricky and really only necessary if you are not using vanilla MySQL. It is a required step to allow the plugin to use those non-standard API’s I mentioned earlier. You need to extract the offsets for the plugin to work. Follow the instructions given.

Once that is all done, you can:

INSTALL PLUGIN AUDIT SONAME 'libaudit_plugin.so';

If the plugin fails to load, check MySQL’s error log for the reason why and confer with the plugin documentation on how to resolve.

We now need to enable audit logging because nothing is enabled by default.

SET GLOBAL audit_record_cmds = "select,insert,update,delete";
SET GLOBAL audit_json_file = ON;
SET GLOBAL audit_record_objs = "*.*,{}";
SET GLOBAL audit_force_record_logins = ON;

Look inside @@datadir and you should see a file called mysql-audit.json. You can tail -f this file if you’d like to watch it to make sure data is being written.

If you’d like some more background reading on the audit plugin, check out Fernando’s post on Experiences with McAfee Audit Plugin.

Setting Up MongoDB

Let me begin by stating this is my first time really dealing with MongoDB in any real sense. I spun up an EC2 instance in AWS (m3.large, CentOS 6) and installed MongoDB using yum and the Mongo repositories.

As the ephemeral storage for my instance had been mounted at /opt, I changed just this one option in the supplied /etc/mongod.conf and restarted mongo (service mongod restart).

dbpath=/opt/mongo

I then copied the mysql-audit.json from the MySQL host using SSH:

[percona@mysql-host ~]$ scp -i .ssh/amazon.pem /data/mysql/mysql-audit.json root@54.177.22.22:/tmp/

Then I imported this JSON file directly into MongoDB:

[root@ip-10-255-8-15 ~]# mongoimport --db test --collection audit --drop --file /tmp/mysql-audit.json

The above mongoimport command specifies the database in which to import (test) and in which collection (audit). I also specify to –drop the database before importing. This drop is necessary because the Audit Plugin appends to JSON file and if we repeated these import steps without the –drop, we would be duplicating data.

If there is enough interest, via the comments below, I will investigate the potential of using the socket functionality of the Audit Plugin to have the events stream directly into mongo.

For now though, it’s a wash-rinse-repeat cycle; though there is the ability to rotate the JSON audit log after a certain amount of time and import each file on a daily basis.

Making Data Make Sense

Here is a sample “document” (ie: audit event) that is created by the Audit Plugin.

{
	"_id" : ObjectId("5571ea51b1e714b8d6d804c8"),
	"msg-type" : "activity",
	"date" : "1433438419388",
	"thread-id" : "10214180",
	"query-id" : "295711011",
	"user" : "activebatchSVC",
	"priv_user" : "activebatchSVC",
	"host" : "ecn.corp",
	"ip" : "10.2.8.9",
	"cmd" : "select",
	"objects" : [
		{
			"db" : "",
			"name" : "*",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "markets_source_tfutvol_eab",
			"obj_type" : "VIEW"
		},
		{
			"db" : "historical",
			"name" : "futureopt",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "securities_futures_optdef",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "markets_source_tfutvol_eab",
			"obj_type" : "VIEW"
		},
		{
			"db" : "historical",
			"name" : "futureopt",
			"obj_type" : "TABLE"
		},
		{
			"db" : "risque",
			"name" : "securities_futures_optdef",
			"obj_type" : "TABLE"
		}
	],
	"query" : "SELECT far, bar, baz FROM mytable"
}

!! MongoDB BUG !!

Notice that last field in the document is named “query.” When I attempted some basic aggregate() functions on this field, I received errors on bad syntax. After much frustration, lots Googling and repeated testing, I came to the only conclusion that “query” is a reserved word in MongoDB. There is little to no documentation on this, aside from an almost 3 year old bug report that simply helped confirm my suspicion.

To work around the above bug issue, let’s rename all of the “query” fields to “qry”:

db.audit.update({}, { $rename: { "query": "qry"} }, false, true);

Now we can begin.

Basic Command Counters

Using any of the “top level” fields in each document, we can run reports (called aggregates in Mongo). So an easy one is to get a list of all unique “commands” and how many times they occurred.

> db.audit.aggregate([ { $group: { "_id": "$cmd", "count": { $sum: 1 } } } ]);
{ "_id" : "Failed Login", "count" : 2 }
{ "_id" : "select", "count" : 458366 }
{ "_id" : "Connect", "count" : 455090 }
{ "_id" : "insert", "count" : 2 }
{ "_id" : "Quit", "count" : 445025 }
{ "_id" : null, "count" : 1 }

Breaking down the command above, we are grouping all values in the “cmd” field and counting them up. The SQL equivalent would be:

SELECT cmd, count(cmd) FROM audit GROUP BY cmd;

User Counts

Let’s get a list and count of all user activities. This will include any of the commands listed in the previous aggregate.

> db.audit.aggregate([ { $group: { "_id": "$user", "count": { $sum: 1 } } } ]);
{ "_id" : "baw", "count" : 1883 }
{ "_id" : "eq_shrd", "count" : 1 }
{ "_id" : "reski", "count" : 3452 }
{ "_id" : "alin", "count" : 1 }
{ "_id" : "oey", "count" : 62 }
{ "_id" : "dule", "count" : 380062 }
{ "_id" : "ashi", "count" : 802 }
{ "_id" : "tech_shrd", "count" : 392464 }

A couple interesting things come out here. Firstly, the tech_shrd user does the most ‘activities’ over all other users. Is this expected? Is this normal? Your environment will determine that.

Specific User Activities

Let’s pick a specific user and get their activity counts to make sure they aren’t doing anything weird.

> db.audit.aggregate([
... { $match: { "user": "tech_shrd" } },
... { $group: { "_id": "$cmd", "count": { $sum: 1 } } }
... ]);
{ "_id" : "select", "count" : 132970 }
{ "_id" : "Connect", "count" : 133919 }
{ "_id" : "Quit", "count" : 125575 }

The SQL equivalent:

SELECT cmd, count(cmd) FROM audit WHERE user = 'tech_shrd';

Activities By User

We saw above that there were 2 insert commands. Who ran those?

> db.audit.aggregate([
... { $match: { "cmd": "insert" } },
... { $group: { "_id": "$user", "count": { $sum: 1 } } }
... ]);
{ "_id" : "graz", "count" : 2 }

More simply, we could have just done this to see the entire document/record which would include the SQL that the user executed, timestamp, hostname, etc.

> db.audit.find({ "cmd": "insert" });

The SQL equivalents:

SELECT user, count(user) FROM audit WHERE cmd = 'insert';
SELECT * FROM audit WHERE cmd = 'insert';

Table Activity

The most complex example I could come up with was trying to find out how many times each table was referenced. In theory, with weeks or even months of audit data, we could decide which tables aren’t needed any longer by the application.

> db.audit.aggregate(
... { $unwind: "$objects" },
... { $group: { _id : "$objects.name", count: { $sum: 1 } } },
... { $sort: { "count": -1 } }
... );
{ "_id" : "*", "count" : 17359 }
{ "_id" : "swaps", "count" : 4392 }
{ "_id" : "futureopt", "count" : 3666 }
...(more)

You’ll notice in the sample document above that “objects” is an array of objects with 1 element for each table/view referenced in the ‘qry’ field. We need to “unwind” this array into single elements before we can count them. If someone knows a better way, please let me know. The Audit Plugin uses “*” to represent a derived table from a sub-SELECT, which has no proper name. We can remove all of these using:

> db.audit.update({ }, { $pull: { "objects": { "name": "*" } } }, false, true);

Audit Plugin Caveat: The ‘objects’ array is not a distinct list of the tables involved. For example, a SELECT statement that self-joins twice would produce 3 identical elements in the ‘objects’ array for that audit record. This may skew results. If anyone knows a cool Mongo trick to remove duplicates, please share in the comments.

Conclusion

For a quick wrap-up, we installed the McAfee Audit Plugin, exported some audit data, set up a MongoDB instance in AWS and imported the audit data. As you can see, the possibilities are plentiful on what kind of information you can gather. Feel free to comment on an aggregation you’d like to see if we were running this type of audit on your system.

Cheers,
Matthew

The post Auditing MySQL with McAfee and MongoDB appeared first on MySQL Performance Blog.

Oct
16
2014
--

Percona Toolkit for MySQL with MySQL-SSL Connections

I recently had a client ask me how to use Percona Toolkit tools with an SSL connection to MySQL (MySQL-SSL). SSL connections aren’t widely used in MySQL due to most installations being within an internal network. Still, there are cases where you could be accessing MySQL over public internet or even over a public “private” network (ex: WAN between two colo datacenters). In order to keep packet sniffers at bay, the connection to MySQL should be encrypted.

If you are connecting to Amazon RDS from home or office (ie: not within the AWS network) you better be encrypted!

As there is already a MySQL Performance Blog post on how to setup MySQL SSL connections, we can skip that and dive right in.

As you probably know, the mysql client can read multiple configuration files; the primary one being /etc/my.cnf  You probably also know that the client reads a config file in your $HOME directory: .my.cnf (that’s dot-my-dot-cnf).  It is inside this file that we can set parameters for our shell-user account when connecting to MySQL hosts.

Percona Toolkit uses Perl’s DBI:mysql to make connections to MySQL hosts. This library is linked to the libmysqlclient C library which is responsible for reading and parsing the global config file as well as your $HOME config file. Let’s set some options here that are not directly available in the toolkit scripts. Using $MY_FAVORITE_EDITOR, edit your $HOME/.my.cnf as such:

[client]
user = myuser
password = foobar
ssl-ca = /Users/drmac/ca-cert.pem

You must use the absolute path to the CA file. Relative paths won’t cut it:

ERROR 2026 (HY000): SSL connection error: SSL_CTX_set_default_verify_paths failed

Test your connection first using the mysql client:

asura:~ drmac$ mysql -h 74.13.19.17 -e "SHOW STATUS LIKE 'Ssl_cipher'"
+---------------+--------------------+
| Variable_name | Value              |
+---------------+--------------------+
| Ssl_cipher    | DHE-RSA-AES256-SHA |
+---------------+--------------------+

Excellent! Now we can use any Percona Toolkit script and connect via SSL:

asura:~ drmac$ pt-table-checksum -h 74.13.19.17 -d foo -t zipcodes
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
10-13T14:10:02      0      0    45358       7       0   5.959 foo.myzipcodes

Sweet!

Unfortunately, Percona Toolkit scripts are hard-coded to read the [client] section of your .my.cnf. If you don’t want to overwrite any existing configuration that may be present, you can make a new configuration and specify that file to any toolkit script using -F. Again, relative paths won’t work here. Use the absolute path; even if you are in the same directory.

asura:~ drmac$ cp .my.cnf mytestconfig.cnf
asura:~ drmac$ rm .my.cnf
asura:~ drmac$ pt-table-checksum -h 74.13.19.17 -d foo -t zipcodes -F /Users/drmac/mytestconfig.cnf

Now you can continue using our awesome tools in a secure manner.

Cheers!
-Matthew

The post Percona Toolkit for MySQL with MySQL-SSL Connections appeared first on MySQL Performance Blog.

Jul
07
2014
--

Looking out for max values in integer-based columns in MySQL

Yay! My first blog post! As long as at least 1 person finds it useful, I’ve done my job. ;) Recently, one of my long-term clients was noticing that while their INSERTs were succeeding, a particular column counter was not incrementing. A quick investigation determined the column was of type int(11) and they had reached the maximum value of 2147483647. We fixed this by using pt-online-schema-change to change the column to int(10) unsigned, thus allowing values up to 4294967295.

My client was now concerned about all his other integer-based columns and wanted me to check them all. So I wrote a quick-n-dirty script in Go to check all integer-based columns on their current value compared to the maximum allowed for that column type.

You can find the full source code in my git repo.

Here’s a quick overview; the code is pretty simple.

First we connect to MySQL and verify the connection:

db, err := sql.Open("mysql", fmt.Sprintf("%s:%s@tcp(%s:3306)/%s", mysqlUn, mysqlPw, hostToCheck, dbToCheck))
if err != nil {
	fmt.Printf("Error connecting to MySQL on '%s': n", hostToCheck, err)
	db.Close()
	os.Exit(1)
}
// Check connection is alive.
err = db.Ping()
if err != nil {
	fmt.Printf("Unable to ping mysql at '%s': %sn", hostToCheck, err)
	db.Close()
	os.Exit(1)
}

Next, we query the information_schema.columns table for the names of all integer-based columns and calculate what their maximum value can be (credit for the clever SQL goes to Peter Boros).

// Construct our base i_s query
var tableExtraSql string
if tableToCheck != "" {
	tableExtraSql = fmt.Sprintf("AND TABLE_NAME = '%s'", tableToCheck)
}
baseSql := fmt.Sprintf(`
	SELECT TABLE_NAME, COLUMN_NAME, COLUMN_TYPE, (CASE DATA_TYPE
   	  WHEN 'tinyint' THEN 255
    	  WHEN 'smallint' THEN 65535
    	  WHEN 'mediumint' THEN 16777215
    	  WHEN 'int' THEN 4294967295
    	  WHEN 'bigint' THEN 18446744073709551615
   	END >> IF(LOCATE('unsigned', COLUMN_TYPE) > 0, 0, 1)) AS MAX_VALUE
	FROM information_schema.columns
	WHERE TABLE_SCHEMA = '%s' %s
	AND DATA_TYPE IN ('tinyint', 'int', 'mediumint', 'bigint')`, dbToCheck, tableExtraSql)

Now that we have this list of columns to check, we simply loop over this result set, get the MAX() of each column and print a pretty report.

// Loop over rows received from i_s query above.
for columnsToCheck.Next() {
	err := columnsToCheck.Scan(&tableName, &columnName, &columnType, &maxValue)
	if err != nil {
		log.Fatal("Scanning Row Error: ", err)
	}
	// Check this column
	query := fmt.Sprintf("SELECT MAX(%s), ROUND((MAX(%s)/%d)*100, 2) AS ratio FROM %s.%s",
		columnName, columnName, maxValue, dbToCheck, tableName)
	err = db.QueryRow(query).Scan(&currentValue, &ratio)
	if err != nil {
		fmt.Printf("Couldn't get MAX(%s.%s): %sn", tableName, columnName, err)
		fmt.Println("SQL: ", query)
		continue
	}
	// Print report
	if ratio.Valid && ratio.Float64 >= float64(reportPct) {
		fmt.Printf("'%s'.'%s' - Type: '%s' - ", tableName, columnName, columnType)
		fmt.Printf("ColumMax: '%d'", maxValue)
		fmt.Printf(" - CurVal: '%d'", currentValue.Int64)
		fmt.Printf(" - FillRatio: '%.2f'n", ratio.Float64)
	}
}

There are more options to the app that allow you to silence some of the verbosity and to only print report lines where the value-to-max ratio is > a user-defined threshold. If you have frequently changing schemas, this should allow you to cron the app and only receive email reports when there is a potential problem. Otherwise, this tool could be useful to run once a month/quarter, just to verify things are in good standing.

Like I said before, hopefully this helps at least 1 person catch a potential problem sooner rather than later.

The post Looking out for max values in integer-based columns in MySQL appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com