Feb
03
2016
--

New GIS Features in MySQL 5.7

MySQL 5.7

MySQL 5.7MySQL 5.7 has been released, and there are some exciting new features now available that I’m going to discuss in this blog — specifically around geographic information system (GIS).

I’ve used GIS features in MySQL for a long time. In my previous blog entries I’ve shown how to create geo-enabled applications with MySQL 5.6 and use MySQL 5.6 geo-spatial functions. In this blog post, I’ll look into what is new in MySQL 5.7 and how we can use those new features in practice for geo-enabled applications.

New in MySQL 5.7

MySQL 5.7 introduces the following major improvements and features for GIS:

  1. Spatial indexes for InnoDB. Finally it is here! This was a long overdue feature, which also prevented many companies from converting all tables to InnoDB.
  2. st_distance_sphere: native function to calculate a distance between two points on earth. Finally it is here as well! Like many others, I’ve created my stored procedure to calculate the distance between points on earth using haversine formula. The native function is ~20x faster than the stored procedure (in an artificial benchmark, see below). This is not surprising, as stored procedures are slow computationally – especially for trigonometrical functions.
  3. New functions: GeoHash and GeoJSON. With GeoJSON we can generate the results that are ready for visualizing on Google Maps.
  4. New GIS implementation based on Boost.Geometry library. This is great news, as originally GIS was implemented independently from scratch with a very limited set of features. Manyi Lu from MySQL server team provides more reasoning behind the choice of Boost.Geometry.

This is the great news. The bad news is that except for the st_distance_sphere, all other functions use planar geometry (no change since MySQL 5.6) and do not support Spatial Reference System Identifier (SRID). That means that if I want to calculate the distance of my favorite bike path in miles or kilometers, I’ll still have to use a stored function (see below for an example) or write an application code for that. Native function st_distance will ignore SRID for now and return a value which represents a distance on a planar – not very useful for our purposes (may be useful for order by / compare).

Distance on Sphere

MySQL 5.7 introduces the function st_distance_sphere, which uses a haversine formula to calculate distance. He is the example:

mysql> select st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954));
+--------------------------------------------------------------------------------+
| st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)) |
+--------------------------------------------------------------------------------+
|                                                             3855600.7928957273 |
+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)

The distance is in meters by default (you can also change the radius of the earth to meters using the 3rd optional parameter, default: 6,370,986). Although our earth is represented as an oblate spheroid, all practical applications use the distance on a sphere. The difference between the haversine formula and more precise (and much slower) functions is negligible for our purposes.

The st_distance_sphere is much faster than using stored routines. Here is the artificial benchmark:

mysql> select benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947));
+-----------------------------------------------------------------------------------------+
| benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)) |
+-----------------------------------------------------------------------------------------+
|                                                                                       0 |
+-----------------------------------------------------------------------------------------+
1 row in set (22.55 sec)
mysql> select benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)));
+----------------------------------------------------------------------------------------------------+
| benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))) |
+----------------------------------------------------------------------------------------------------+
|                                                                                                  0 |
+----------------------------------------------------------------------------------------------------+
1 row in set (0.77 sec)

haversine_distance_sp is a stored routine implementation of the same algorithm.

InnoDB GIS example: find 10 restaurants near me 

In my previous blog post I’ve demonstrated how to use st_within function to find restaurants inside my zipcode (US postal code) and sort by distance. In MySQL 5.7 there will be 2 changes:

  1. We can use InnoDB table
  2. We can use st_distance_sphere function

For this example, I’ve converted Open Street Map data to MySQL and then created a new InnoDB table:

CREATE TABLE `points_new` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `osm_id` text,
  `name` text,
  `barrier` text,
  `highway` text,
  `ref` text,
  `address` text,
  `is_in` text,
  `place` text,
  `man_made` text,
  `other_tags` text,
  UNIQUE KEY `OGR_FID` (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=InnoDB AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1

SHAPE is declared as geometry (and stores points in this table). We also have SPATIAL KEY SHAPE in the InnoDB table.

The following query will find all cafe or restaurants in Durham, NC (zipcode: 27701):

SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist
FROM points_new
WHERE st_within(shape,
      (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
	  and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
	  and name is not null
ORDER BY dist asc LIMIT 10;

Table tl_2013_us_zcta510 stores the shapes of polygons for all US zipcodes. (It needs to be converted to MySQL.) In this example I’m using st_within to filter only the POIs I need, and st_distance_sphere to get the distance from my location (-78.9064543 35.9975194 are the coordinates of Percona’s office in Durham) to the restaurants.

Explain plan:

mysql> EXPLAIN
    -> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist
    -> FROM points_new
    -> WHERE st_within(shape,
    ->       (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
    ->   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
    ->   and name is not null
    -> ORDER BY dist asc LIMIT 10G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: points_new
   partitions: NULL
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 21
     filtered: 18.89
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: tl_2013_us_zcta510
   partitions: NULL
         type: ref
possible_keys: zcta5ce10
          key: zcta5ce10
      key_len: 8
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 1 warning (0.00 sec)

That looks pretty good: MySQL is using and index on the SHAPE field (even with the subquery, btw).

Results:

mysql> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist, st_astext(shape)
    -> FROM points_new
    -> WHERE st_within(shape,
    ->       (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
    ->   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
    ->   and name is not null
    -> ORDER BY dist asc LIMIT 10;
+------------+----------------------------+--------+--------------------------------------+
| osm_id     | name                       | dist   | st_astext(shape)                     |
+------------+----------------------------+--------+--------------------------------------+
| 880747417  | Pop's                      | 127.16 | POINT(-78.9071795 35.998501)         |
| 1520441350 | toast                      | 240.55 | POINT(-78.9039761 35.9967069)        |
| 2012463902 | Pizzeria Toro              | 256.44 | POINT(-78.9036457 35.997125)         |
| 398941519  | Parker & Otis              | 273.39 | POINT(-78.9088833 35.998997)         |
| 881029843  | Torero's                   | 279.96 | POINT(-78.90829140000001 35.9995516) |
| 299540833  | Fishmonger's               | 300.01 | POINT(-78.90850250000001 35.9996487) |
| 1801595418 | Lilly's Pizza              | 319.83 | POINT(-78.9094462 35.9990732)        |
| 1598401100 | Dame's Chicken and Waffles | 323.82 | POINT(-78.9031929 35.9962871)        |
| 685493947  | El Rodeo                   | 379.18 | POINT(-78.909865 35.999523)          |
| 685504784  | Piazza Italia              | 389.06 | POINT(-78.9096472 35.9998794)        |
+------------+----------------------------+--------+--------------------------------------+
10 rows in set (0.13 sec)

0.13 seconds response time on AWS t2.medium box sounds reasonable to me. The same query on the MyISAM table shows ~same response time: 0.14 seconds.

GeoJSON feature and Google Maps

Another nice feature of MySQL 5.7 GIS is GeoJSON function: you can convert your result set to GeoJSON, which can be used with other applications (for example Google Maps API).

Let’s say I want to visualize the above result set on Google Map. As the API requires a specific format, I can use concat / group_concat to apply the format inside the SQL:

SELECT CONCAT('{
  "type": "FeatureCollection",
  "features": [
  ',
   GROUP_CONCAT('{
   "type": "Feature",
      "geometry": ', ST_AsGeoJSON(shape), ',
      "properties": {}
   }'),
  ']
}') as j
FROM points_new
WHERE st_within(shape,
      (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
	  and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
	  and name is not null

I will get all the restaurants and cafes in zipcode 27701. Here I’m using ST_AsGeoJSON(shape) to convert to GeoJSON, and concat/group_concat to “nest” the whole result into the format suitable for Google Maps.

Result:

mysql> set group_concat_max_len = 1000000;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT CONCAT('{
    '>   "type": "FeatureCollection",
    '>   "features": [
    '>   ',
    ->    GROUP_CONCAT('{
    '>    "type": "Feature",
    '>       "geometry": ', ST_AsGeoJSON(shape), ',
    '>       "properties": {}
    '>    }'),
    ->   ']
    '> }') as j
    -> FROM points_new
    -> WHERE st_within(shape,
    ->       (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
    ->   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
    ->   and name is not null
*************************** 1. row ***************************
j: {
  "type": "FeatureCollection",
  "features": [
  {
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.890852, 35.9903403]},
      "properties": {}
   },{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.8980807, 35.9933562]},
      "properties": {}
   },{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.89972490000001, 35.995879]},
      "properties": {}
   } ... ,{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.9103211, 35.9998494]},
      "properties": {}
   },{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.9158326, 35.9967114]},
      "properties": {}
   }]
}
1 row in set (0.14 sec)

I did not include the full result set for the lack of space; I also had to change the group concat max length, otherwise MySQL will cut the result of the group_concat function.

Now I can visualize it:

 

 

 

 

 

 

 

 

 

 

Example: Find the longest bike path

MySQL 5.7 (as well as the older versions) supports st_length function to calculate a length of a linestring. However, even in MySQL 5.7 st_length can’t calculate the distance on earth. To find the distance of a linestring I’ve created a very simple stored procedure:

DELIMITER //
CREATE DEFINER=CURRENT_USER() FUNCTION `ls_distance_sphere`(ls GEOMETRY) RETURNS DECIMAL(20,8)
    DETERMINISTIC
BEGIN
DECLARE i, n INT DEFAULT 0;
DECLARE len DECIMAL(20,8) DEFAULT 0;
SET i = 1;
SET n = ST_NumPoints(ls);
 WHILE i < n DO
    SET len = len +  st_distance_sphere(st_pointN(ls, i), st_pointN(ls, i+1));
SET i = i + 2;
 END WHILE;
 RETURN len;
END //
DELIMITER ;

As the Open Street Map data has the information about roads in North America, we can use this function to calculate the length (in meters) for every road it stores:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null limit 10;
+---------------------------------------+---------------------------+
| name                                  | ls_distance_sphere(shape) |
+---------------------------------------+---------------------------+
| Highbury Park Drive Bypass            |                0.97386664 |
| Ygnacio Canal Trail                   |                0.86093199 |
| South Marion Parkway                  |                1.06723424 |
| New River Greenway                    |                1.65705401 |
| Northern Diversion Trail              |                2.08269808 |
| Gary L. Haller Trail;Mill Creek Trail |                2.09988209 |
| Loop 1                                |                2.05297129 |
| Bay Farm Island Bicycle Bridge        |                2.51141623 |
| Burrard Street                        |                1.84810259 |
| West 8th Avenue                       |                1.76338236 |
+---------------------------------------+---------------------------+
10 rows in set (0.00 sec)

Index the polygon/area distance using MySQL 5.7 virtual fields

To really answer the question “what is the longest bikepath (cyclepath) in North America?” we will have to order by stored function result. This will cause a full table scan and a filestort, which will be extremely slow for 30 millions of rows. The standard way to fix this is to materialize this road distance: add an additional field to the table and store the distance there.

In MySQL 5.7 we can actually use Generated (Virtual) Columns feature:

CREATE TABLE `lines_new` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `osm_id` int(11) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  `highway` varchar(60) DEFAULT NULL,
  `waterway` text,
  `aerialway` text,
  `barrier` text,
  `man_made` text,
  `other_tags` text,
  `linestring_length` decimal(15,8) GENERATED ALWAYS AS (st_length(shape)) VIRTUAL,
  PRIMARY KEY (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`),
  KEY `linestring_length` (`linestring_length`),
  KEY `highway_len` (`highway`,`linestring_length`)
) ENGINE=InnoDB AUTO_INCREMENT=27077492 DEFAULT CHARSET=latin1

Unfortunately, MySQL 5.7 does not support non-native functions (stored procedures or UDF) in generated columns, so I have to use st_length in this example. Ordering by value of st_length may be OK though:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10;
+-----------------------------+---------------------------+
| name                        | ls_distance_sphere(shape) |
+-----------------------------+---------------------------+
| Confederation Trail         |            55086.92572725 |
| Cowboy Trail                |            43432.06768706 |
| Down East Sunrise Trail     |            42347.39791330 |
| Confederation Trail         |            29844.91038542 |
| Confederation Trail         |            26141.04655981 |
| Longleaf Trace              |            29527.66063726 |
| Cardinal Greenway           |            30613.24487294 |
| Lincoln Prairie Grass Trail |            19648.26787218 |
| Ghost Town Trail            |            25610.52158647 |
| Confederation Trail         |            27086.54829531 |
+-----------------------------+---------------------------+
10 rows in set (0.02 sec)

The query is very fast as it uses an index on both highway and linestring:

mysql> explain select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lines_new
   partitions: NULL
         type: ref
possible_keys: highway_len
          key: highway_len
      key_len: 63
          ref: const
         rows: 119392
     filtered: 90.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Conclusion

MySQL 5.7 contains a great set of features to work with geospatial data. Finally, spatial indexes are supported in InnoDB; st_distance_sphere as a native function is very useful. Unfortunately, other spatial functions only work with planar coordinates and do not support SRID. I hope this will be fixed in the new releases.

Mar
24
2014
--

Creating GEO-enabled applications with MySQL 5.6

In my previous post I’ve showed some new MySQL 5.6 features which can be very helpful when creating geo-enabled applications. In this post I will show how we can obtain open-source GIS data, convert it to MySQL and use it in our GEO-enabled applications. I will also present at the upcoming Percona Live conference on this topic.

Data sources (US)

For the U.S. we may look at 2 major data sources:

1. ZIP codes with latitude, longitude and zip code boundaries (polygon). This can be downloaded from the U.S. Census website: US Zipcodes direct link

2. Point of interests, roads, boundaries, etc. The Openstreatmap website provides an excellent source of the GIS data. North American data can be downloaded here (updates frequently)

Data formats and conversion

U.S. Census data is stored in Shapefile (.shp, .shx, .dbf) format. Openstreetmap uses its own XML format (OSM)  and/or Protocolbuffer Binary Format. We can convert this to MySQL with GDAL server (on Linux) and ogr2ogr utility. To convert Shapefile any version of GDAL will work, however, for OSM/PBF we will need to use v. 1.10. The easiest way to get the GDAL 1.10 is to use Ubuntu + ubuntugis-unstable repo.

Here are the commands I’ve used to install:

apt-add-repository ppa:ubuntugis/ubuntugis-unstable
apt-get update
apt-get install gdal-bin

This will install gdal server. Make sure it is latest version and support OSM format:

ogr2ogr --version
GDAL 1.10.1, released 2013/08/26
ogrinfo --formats|grep OSM
-> "OSM" (readonly)

Now we can convert it to MySQL. First, make sure MySQL has the default storage engine = MyISAM (yes, GDAL will use MyISAM to be able to add a spatial index) and the max_allowed_packet is large enough:

mysql -e "set global max_allowed_packet = 16777216*10; set global default_storage_engine = MyISAM; "

ZIP codes and boundaries conversion

Now we can start conversion:

# ogr2ogr -overwrite -progress -f "MySQL" MYSQL:zcta,user=root tl_2013_us_zcta510.shp
0...10...20...30...40...50...60...70...80...90...100 - done.

The only thing we need to specify is db name and user name (assuming it will write to the localhost, otherwise specify the MySQL host). ogr2org will create all needed tables.

mysql> use zcta
Database changed
mysql> show tables;
+--------------------+
| Tables_in_zcta     |
+--------------------+
| geometry_columns   |
| spatial_ref_sys    |
| tl_2013_us_zcta510 |
+--------------------+
3 rows in set (0.00 sec)

The  geometry_columns and spatial_ref_sys are the reference tables only. All zip codes and boundaries will be stored in tl_2013_us_zcta510 table:

mysql> show create table tl_2013_us_zcta510\G
*************************** 1. row ***************************
       Table: tl_2013_us_zcta510
Create Table: CREATE TABLE `tl_2013_us_zcta510` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `zcta5ce10` varchar(5) DEFAULT NULL,
  `geoid10` varchar(5) DEFAULT NULL,
  `classfp10` varchar(2) DEFAULT NULL,
  `mtfcc10` varchar(5) DEFAULT NULL,
  `funcstat10` varchar(1) DEFAULT NULL,
  `aland10` double DEFAULT NULL,
  `awater10` double DEFAULT NULL,
  `intptlat10` varchar(11) DEFAULT NULL,
  `intptlon10` varchar(12) DEFAULT NULL,
  UNIQUE KEY `OGR_FID` (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=MyISAM AUTO_INCREMENT=33145 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Example 1. Selecting zip code boundaries for a given zipcode (Durham, NC):

mysql> select astext(shape) from zcta.tl_2013_us_zcta510 where zcta5ce10='27701'\G
*************************** 1. row ***************************
astext(shape): POLYGON((-78.902351 35.988107,-78.902436 35.988116,-78.902597 35.98814,-78.902725 35.988147,-78.902992 35.988143,-78.903117 35.988129,... -78.902351 35.988107))

Example 2. Find ZIP code for the given point (Lat, Lon): Percona HQ in Durham, NC

mysql> SELECT zcta5ce10 as ZIP
FROM tl_2013_us_zcta510
WHERE
st_contains(shape,
POINT(-78.90423, 36.004122));
+-------+
| ZIP   |
+-------+
| 27701 |
+-------+
1 row in set (0.00 sec)

Converting OpenStreetMap (OSM) data 

Converting OSM is the same:

ogr2ogr -overwrite -progress -f "MySQL" MYSQL:osm,user=root north-america-latest.osm.pbf

Please note, that it will take a long time to convert (8-12+ hours, depends upon the hardware).

Tables:

mysql> use osm
Database changed
mysql> show tables;
+------------------+
| Tables_in_osm    |
+------------------+
| geometry_columns |
| lines            |
| multilinestrings |
| multipolygons    |
| other_relations  |
| points           |
| spatial_ref_sys  |
+------------------+
7 rows in set (0.00 sec)

Points of interest are stored in “points” table. “Lines” and “multilinestrings” tables contain streets, hiking trails, bike paths, etc:

mysql> show create table points\G
*************************** 1. row ***************************
       Table: points
Create Table: CREATE TABLE `points` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `osm_id` text,
  `name` text,
  `barrier` text,
  `highway` text,
  `ref` text,
  `address` text,
  `is_in` text,
  `place` text,
  `man_made` text,
  `other_tags` text,
  UNIQUE KEY `OGR_FID` (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=MyISAM AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

“Shape” is the point (in spatial format) and other_tags will contain some additional format (in JSON format), this is how ogr2ogr converts it by default. See the GDAL documentation on the OSM driver for more information.

OSM data may contain the zip code, but this is not guaranteed.  Here is the example how we can find all coffee shops in ZIP code 27701:

mysql> select shape into @shape
from zcta.tl_2013_us_zcta510
where zcta5ce10='27701';
Query OK, 1 row affected (0.00 sec)
mysql> SELECT name, st_distance(shape, centroid(@shape) ) as dist
FROM points
WHERE st_within(shape, @shape)
and other_tags like '%"amenity"=>"cafe"%'  limit  10;
+--------------------+----------------------+
| name               | dist                 |
+--------------------+----------------------+
| Blue Coffee Cafe   |  0.00473103443182092 |
| Amelia Cafe        | 0.013825134250907745 |
| Serrano's Delicafe | 0.013472792849827055 |
| Blend              | 0.009123578862847042 |
+--------------------+----------------------+
4 rows in set (0.09 sec)

First, I have selected the ZIP code boundaries into MySQL variable (I could have used subquery, in MySQL 5.6 the performance will be very similar; this is a little bit outside of the current blog post topic, so I will not compare the 2 methods here).

Second I’ve used this variable to find all point which will fit into our boundaries and filter by ”amenity”=>”cafe”. I have to use like ‘%..%’ here, but I’m relying on the spatial index here. Explain plan:

mysql> explain SELECT name, st_distance(shape, centroid(@shape) ) as dist  FROM osm.points   WHERE st_within(shape, @shape)  and other_tags like '%"amenity"=>"cafe"%'  limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: points
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 10
        Extra: Using where

Conclusion

Using open source spatial data is a great way to enrich your application and add new features. You can store this data in MySQL so the application will be able to perform a join to the existing data. For example, if you store ZIP code for a user you can use OpenStreetMap data to show the appropriate content for this user. I will also provide more examples in my upcoming Talk @ Percona Live 2014 as well as share it in this blog in a future post.

I’ve also created a Public Amazon AMI: GIS-MySQL-Ubuntu – ami-ddfdf5b4. The AMI has the ZIP code and OSM data in MySQL 5.6 as well as the GDAL server installed (under /data, mounted on EBS). Please feel free to give it a try. As always I appreciate any comments/questions/thoughts/etc.

The post Creating GEO-enabled applications with MySQL 5.6 appeared first on MySQL Performance Blog.

Oct
21
2013
--

Using the new spatial functions in MySQL 5.6 for geo-enabled applications

Geo-enabled (or location enabled) applications are very common nowadays and many of them use MySQL. The common tasks for such applications are:

  • Find all points of interests (i.e. coffee shops) around (i.e. a 10 mile radius) the given location (latitude and longitude). For example we want to show this to a user of the mobile application when we know his/her approximate location. (This usually means we need to calculate a distance between 2 points on Earth).
  • Find a ZIP code (U.S. Postal address) for the given location or determine if this location is within the given area. Another example is to find a school district for the given property.

MySQL had the spatial functions originally (implementation follows a subset of OpenGIS standard). However, there are 2 major limitation of MySQL spatial functions that can make it difficult to use those functions in geo-enabled applications:

  • Distance between 2 points.  The “distance” function was not implemented before MySQL 5.6. In addition (even in MySQL 5.6), all calculations (e.g. distance between 2 points) are done using a planar coordinate system (Euclidean geometry). For the distance between 2 points on Earth this can produce incorrect results.
  • Determine if the point is inside a polygon. Before MySQL 5.6 the functions that test the spatial relationships between 2 geometries (i.e. find if the given point is within a polygon) only used a Minimum Bounding Rectangle (MBR). This is a major limitation for example #2 above (I will explain it below).

In my old presentation for the 2006 MySQL User Conference I  showed how to calculate distances on Earth in MySQL without using the MySQL spatial functions. In short, one can store the latitude and longitude coordinates directly in MySQL fields (decimal) and use a haversine  formula to calculate distance.

New MySQL 5.6 Geo Spatial Functions 

The good news is:

1) MySQL 5.6 adds a set of new functions (some of them are not 100% documented though) that use the object shapes rather than the MBR to calculate spatial relationships. Those new functions begins with “ST_”, i.e.

  • contains(g1, g2)  uses MBR only (not exact!)
  • st_contains(g1, g2) uses exact shapes

2) MySQL 5.6 implements st_distance(g1, g2) function that calculates the distance between 2 geometries, which is currently not documented (I’ve filed the feature request to document the st_distance function in MySQL)

The bad news is:

1) All functions still only use the planar system coordinates. Different SRIDs are not supported.

2) Spatial indexes (RTREE) are only supported for MyISAM tables. One can use the functions for InnoDB tables, but it will not use spatial keys.

Example of MySQL’s MBR “false positives”

To illustrate why we do not want to use MBR-based functions for geospatial search, I’ve generated 2 polygons that represent 2 zip code boundaries in San Francisco, CA and placed it on Google Maps.

The blue rectangle represents the Minimum Bounding Rectangle of Zip code “91102″ (I’ve used envelope() mysql function to obtain coordinates for the MBR). As we can see it covers both zip code 94103 and 94102. In this case if we have coordinates of a building in the city’s “south of market” district (ZIP 91103) and try to find a zip code it belongs to using the “contains()” function we will have a “false positives”:

mbr_example_sm

mysql>  select zip from postalcodes where contains(geom, point(-122.409153, 37.77765));
+-------+
| zip   |
+-------+
| 94102 |
| 94103 |
| 94158 |
+-------+
3 rows in set (0.00 sec)

In this particular example we got 3 zip codes as the MBR of 94158 also overlaps this area. Another point in “south of market” can actually produce 4 different zip codes. However, in MySQL 5.6 we can use the new st_contains function:

mysql> select zip from postalcodes where st_contains(geom, point(-122.409153, 37.77765));
+-------+
| zip   |
+-------+
| 94103 |
+-------+
1 row in set (0.00 sec)

As we can see st_contains() produces the correct results.

Find a ZIP code for the given location

Starting with MySQL 5.6 one can use the MySQL spatial functions st_contains or st_within to find if the given point is inside the given polygon. In our scenario we will need to find the zip code for the given latitude and longitude. To do that in MySQL we can perform the following steps:

  1. Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts). The data will be stored as MySQL Geometry object, to convert it to text we can use astext(geom) function.
  2. Use the st_contains() or st_within() functions:
    mysql> select zip from postalcodes where st_contains(geom, point(-122.409153, 37.77765));
    +-------+
    | zip   |
    +-------+
    | 94103 |
    +-------+
    1 row in set (0.00 sec)


    or

    mysql> select zip from postalcodes where st_within(point(-122.409153, 37.77765), geom);
    +-------+
    | zip   |
    +-------+
    | 94103 |
    +-------+
    1 row in set (0.00 sec)

Spatial Index for “ST_” functions

MyISAM tables support Spatial indexes, so the above queries will use those indexes. Example:

mysql> alter table postalcodes add spatial index zip_boundaries_spatial (geom);
Query OK, 35679 rows affected (5.30 sec)
Records: 35679  Duplicates: 0  Warnings: 0
mysql> explain select zip from postalcodes where st_contains(geom, point(-122.409153, 37.77765))\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: postalcodes
         type: range
possible_keys: zip_boundaries_spatial
          key: zip_boundaries_spatial
      key_len: 34
          ref: NULL
         rows: 1
        Extra: Using where
1 row in set (0.01 sec)

As we can see our spatial index is used for those functions. If we ignore or remove the index, the query will run significantly slower:

mysql> select zip from postalcodes where st_within(point(-122.409153, 37.77765), geom);
+-------+
| zip   |
+-------+
| 94103 |
+-------+
1 row in set (0.00 sec)
mysql> select zip from postalcodes ignore index (zip_boundaries_spatial) where st_contains(geom, point(-122.409153, 37.77765));
+-------+
| zip   |
+-------+
| 94103 |
+-------+
1 row in set (4.24 sec)

The InnoDB engine does not support spatial indexes, so those queries will be slow. As zip boundaries does not change often we can potentially use MyISAM tables for them.

Find all coffee shops in a 10-mile radius

MySQL 5.6 supports st_distance functions with 2 drawbacks:

  1. It only supports planar coordinates
  2. It does not use index

Given those major limitations, it is not very easy to use st_distance function for the geo enabled applications. If we simply need to find a distance between 2 points it is easier to store lat, lon directly and use harvesine expression (as described above).

However it is still possible to use the st_distance() if we do not need exact numbers for the distance between 2 points (i.e. we only need to sort by distance). In our example, to find all coffee shops we will need to:

  1. Get the 10 mile radius MBR and use “within()” or “st_within()” function
  2. Use st_distance function in the order by clause

First, we will calculate an envelope (square) to include approximately 10 miles, using the following approximations:

  • 1 degree of latitude ~= 69 miles
  • 1 degree of longitude ~= cos(latitude)*69 miles
set @lat= 37.615223;
set @lon = -122.389979;
set @dist = 10;
set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
set @rlat1 = @lat-(@dist/69);
set @rlat2 = @lat+(@dist/69);

@lat and @lon in this example are the coordinates for the San Francisco International Airport (SFO).

This will give us a set of coordinates (points) for the lower left and upper right corner of our square. Then we can use a MySQL’s envelope function to generate the MBR (we use linestring to draw a line between the 2 generated points and then envelope to draw an square):

select astext(envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))));

The “envelope” will look like this:

envelope_example

This is not exactly a 10-mile radius, however it may be close enough. Now we can find all points around SFO airport and sort by distance.

mysql> select astext(shape), name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), shape) limit  10;
+--------------------------------+-------------------------------+
| astext(shape)                  | name                          |
+--------------------------------+-------------------------------+
| POINT(-122.3890954 37.6145378) | Tram stop:Terminal A          |
| POINT(-122.3899 37.6165902)    | Tram stop:Terminal G          |
| POINT(-122.3883973 37.6150806) | Fast Food Restaurant          |
| POINT(-122.388929 37.6164584)  | Restaurant:Ebisu              |
| POINT(-122.3885347 37.6138365) | Fast Food Restaurant:Firewood |
| POINT(-122.38893 37.6132399)   | Cafe:Amoura Café              |
| POINT(-122.3894594 37.6129537) | Currency exchange             |
| POINT(-122.39197849 37.614026) | Parking:Garage A              |
| POINT(-122.3919031 37.6138567) | Tram stop:Garage A            |
| POINT(-122.389176 37.612886)   | Public telephone              |
+--------------------------------+-------------------------------+
10 rows in set (0.02 sec)
mysql> explain select astext(shape), name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), shape) limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: waypoints
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 430
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

As we can see from the explain it will use the spatial key on SHAPE and will only scan 430 rows, rather than millions of POIs.

The query does not show the exact distance (this may be ok if we only need to output the points on the map).  If we need to show the distance we can use the harvesine formula to calculate that. For example we can create the following stored function to implement the calculations:

create DEFINER = CURRENT_USER function harvesine (lat1 double, lon1 double, lat2 double, lon2 double) returns double
 return  3956 * 2 * ASIN(SQRT(POWER(SIN((lat1 - abs(lat2)) * pi()/180 / 2), 2)
         + COS(abs(lat1) * pi()/180 ) * COS(abs(lat2) * pi()/180) * POWER(SIN((lon1 - lon2) * pi()/180 / 2), 2) )) ;

And then use it for both order by and to displaying the distance. This query will also filter by “coffee”:

mysql> select harvesine(y(shape), x(shape), @lat, @lon ) as dist,  name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
and name like '%coffee%' 
order by dist limit 10;
+-------------------+----------------------------------+
| dist              | name                             |
+-------------------+----------------------------------+
| 3.462439728799387 | Cafe:Peet's Coffee               |
| 8.907725074619638 | Cafe:Nervous Dog Coffee          |
| 9.169043718528133 | Cafe:Peet's Coffee & Tea         |
| 9.252659680688794 | Cafe:Martha and Bros Coffee      |
| 9.492498547771854 | Cafe:Manor Coffee Shop           |
| 9.559275248726559 | Cafe:Dynamo Donut & Coffee       |
|  9.57775126039776 | Cafe:Starbucks Coffee            |
| 9.585378425394556 | Cafe:Muddy's Coffeehouse         |
|  9.66247951599322 | Cafe:Martha and Bros. Coffee Co. |
| 9.671254753804767 | Cafe:Starbucks Coffee            |
+-------------------+----------------------------------+
10 rows in set (0.02 sec)

Conclusion

MySQL 5.6 implements an additional set of functions that can help create geo-enabled applications with MySQL. Storing polygons boundaries (ZIP code boundaries for example) is efficient and the new spatial functions (st_within, st_contains, etc) will produce correct results and will use spatial (rtree) indexes (for MyISAM tables only). The OpenGIS standard is very common and it is easy to obtain the data in this format or use the standard application which can “talk” this language.

Unfortunately, st_distance function is not very usable for calculating distance between 2 points on Earth and it does not use an index. In this case it is still more feasible to calculate distances manually using the harvesine formula. Hopefully this will be fixed in the next mysql release.

There are also some other limitations, for example st_union() function only supports 2 arguments and does not support an array, so it can’t be used in a queries like “select st_union(geom) from zipcodes group by state”.

Links

And finally, let me know in the comments how you use MySQL for geo enabled applications. In my next post I will talk more about basics of the MySQL geo spatial extension as well as Sphinx Search‘s implementation of the Geospatial functions.

The post Using the new spatial functions in MySQL 5.6 for geo-enabled applications appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com