Creating GEO-enabled applications with MySQL 5.6

In my previous post I’ve showed some new MySQL 5.6 features which can be very helpful when creating geo-enabled applications. In this post I will show how we can obtain open-source GIS data, convert it to MySQL and use it in our GEO-enabled applications. I will also present at the upcoming Percona Live conference on this topic.

Data sources (US)

For the U.S. we may look at 2 major data sources:

1. ZIP codes with latitude, longitude and zip code boundaries (polygon). This can be downloaded from the U.S. Census website: US Zipcodes direct link

2. Point of interests, roads, boundaries, etc. The Openstreatmap website provides an excellent source of the GIS data. North American data can be downloaded here (updates frequently)

Data formats and conversion

U.S. Census data is stored in Shapefile (.shp, .shx, .dbf) format. Openstreetmap uses its own XML format (OSM)  and/or Protocolbuffer Binary Format. We can convert this to MySQL with GDAL server (on Linux) and ogr2ogr utility. To convert Shapefile any version of GDAL will work, however, for OSM/PBF we will need to use v. 1.10. The easiest way to get the GDAL 1.10 is to use Ubuntu + ubuntugis-unstable repo.

Here are the commands I’ve used to install:

apt-add-repository ppa:ubuntugis/ubuntugis-unstable
apt-get update
apt-get install gdal-bin

This will install gdal server. Make sure it is latest version and support OSM format:

ogr2ogr --version
GDAL 1.10.1, released 2013/08/26
ogrinfo --formats|grep OSM
-> "OSM" (readonly)

Now we can convert it to MySQL. First, make sure MySQL has the default storage engine = MyISAM (yes, GDAL will use MyISAM to be able to add a spatial index) and the max_allowed_packet is large enough:

mysql -e "set global max_allowed_packet = 16777216*10; set global default_storage_engine = MyISAM; "

ZIP codes and boundaries conversion

Now we can start conversion:

# ogr2ogr -overwrite -progress -f "MySQL" MYSQL:zcta,user=root tl_2013_us_zcta510.shp
0...10...20...30...40...50...60...70...80...90...100 - done.

The only thing we need to specify is db name and user name (assuming it will write to the localhost, otherwise specify the MySQL host). ogr2org will create all needed tables.

mysql> use zcta
Database changed
mysql> show tables;
| Tables_in_zcta     |
| geometry_columns   |
| spatial_ref_sys    |
| tl_2013_us_zcta510 |
3 rows in set (0.00 sec)

The  geometry_columns and spatial_ref_sys are the reference tables only. All zip codes and boundaries will be stored in tl_2013_us_zcta510 table:

mysql> show create table tl_2013_us_zcta510\G
*************************** 1. row ***************************
       Table: tl_2013_us_zcta510
Create Table: CREATE TABLE `tl_2013_us_zcta510` (
  `SHAPE` geometry NOT NULL,
  `zcta5ce10` varchar(5) DEFAULT NULL,
  `geoid10` varchar(5) DEFAULT NULL,
  `classfp10` varchar(2) DEFAULT NULL,
  `mtfcc10` varchar(5) DEFAULT NULL,
  `funcstat10` varchar(1) DEFAULT NULL,
  `aland10` double DEFAULT NULL,
  `awater10` double DEFAULT NULL,
  `intptlat10` varchar(11) DEFAULT NULL,
  `intptlon10` varchar(12) DEFAULT NULL,
1 row in set (0.00 sec)

Example 1. Selecting zip code boundaries for a given zipcode (Durham, NC):

mysql> select astext(shape) from zcta.tl_2013_us_zcta510 where zcta5ce10='27701'\G
*************************** 1. row ***************************
astext(shape): POLYGON((-78.902351 35.988107,-78.902436 35.988116,-78.902597 35.98814,-78.902725 35.988147,-78.902992 35.988143,-78.903117 35.988129,... -78.902351 35.988107))

Example 2. Find ZIP code for the given point (Lat, Lon): Percona HQ in Durham, NC

mysql> SELECT zcta5ce10 as ZIP
FROM tl_2013_us_zcta510
POINT(-78.90423, 36.004122));
| ZIP   |
| 27701 |
1 row in set (0.00 sec)

Converting OpenStreetMap (OSM) data 

Converting OSM is the same:

ogr2ogr -overwrite -progress -f "MySQL" MYSQL:osm,user=root north-america-latest.osm.pbf

Please note, that it will take a long time to convert (8-12+ hours, depends upon the hardware).


mysql> use osm
Database changed
mysql> show tables;
| Tables_in_osm    |
| geometry_columns |
| lines            |
| multilinestrings |
| multipolygons    |
| other_relations  |
| points           |
| spatial_ref_sys  |
7 rows in set (0.00 sec)

Points of interest are stored in “points” table. “Lines” and “multilinestrings” tables contain streets, hiking trails, bike paths, etc:

mysql> show create table points\G
*************************** 1. row ***************************
       Table: points
Create Table: CREATE TABLE `points` (
  `SHAPE` geometry NOT NULL,
  `osm_id` text,
  `name` text,
  `barrier` text,
  `highway` text,
  `ref` text,
  `address` text,
  `is_in` text,
  `place` text,
  `man_made` text,
  `other_tags` text,
1 row in set (0.00 sec)

“Shape” is the point (in spatial format) and other_tags will contain some additional format (in JSON format), this is how ogr2ogr converts it by default. See the GDAL documentation on the OSM driver for more information.

OSM data may contain the zip code, but this is not guaranteed.  Here is the example how we can find all coffee shops in ZIP code 27701:

mysql> select shape into @shape
from zcta.tl_2013_us_zcta510
where zcta5ce10='27701';
Query OK, 1 row affected (0.00 sec)
mysql> SELECT name, st_distance(shape, centroid(@shape) ) as dist
FROM points
WHERE st_within(shape, @shape)
and other_tags like '%"amenity"=>"cafe"%'  limit  10;
| name               | dist                 |
| Blue Coffee Cafe   |  0.00473103443182092 |
| Amelia Cafe        | 0.013825134250907745 |
| Serrano's Delicafe | 0.013472792849827055 |
| Blend              | 0.009123578862847042 |
4 rows in set (0.09 sec)

First, I have selected the ZIP code boundaries into MySQL variable (I could have used subquery, in MySQL 5.6 the performance will be very similar; this is a little bit outside of the current blog post topic, so I will not compare the 2 methods here).

Second I’ve used this variable to find all point which will fit into our boundaries and filter by ”amenity”=>”cafe”. I have to use like ‘%..%’ here, but I’m relying on the spatial index here. Explain plan:

mysql> explain SELECT name, st_distance(shape, centroid(@shape) ) as dist  FROM osm.points   WHERE st_within(shape, @shape)  and other_tags like '%"amenity"=>"cafe"%'  limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: points
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 10
        Extra: Using where


Using open source spatial data is a great way to enrich your application and add new features. You can store this data in MySQL so the application will be able to perform a join to the existing data. For example, if you store ZIP code for a user you can use OpenStreetMap data to show the appropriate content for this user. I will also provide more examples in my upcoming Talk @ Percona Live 2014 as well as share it in this blog in a future post.

I’ve also created a Public Amazon AMI: GIS-MySQL-Ubuntu – ami-ddfdf5b4. The AMI has the ZIP code and OSM data in MySQL 5.6 as well as the GDAL server installed (under /data, mounted on EBS). Please feel free to give it a try. As always I appreciate any comments/questions/thoughts/etc.

The post Creating GEO-enabled applications with MySQL 5.6 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by