Mar
13
2015
--

MySQL and geospatial programming: An introduction to GIS

Geographic information systems (GIS) are used by application developers to incorporate geographic information into their datasets to create apps with cool features such finding the address to the best steakhouse in town or the locations of local points of historical interest (the list is endless). In a nutshell, GIS captures, store, checks and displays data related to positions on Earth’s surface.

Next Wednesday I’m going provide an introduction to GIS functionality in MySQL along an overview of recent changes. I invite you to register now for this free webinar titled “MySQL and Geospatial Programming.” It starts at 10 a.m. Pacific time on March 18.

MySQL and Geospatial ProgrammingIn this talk I’ll discuss:

  • The various tasks that deal with geocoding
  • How MySQL can solve problems such as points within a radius (e.g., “Find the 10 closest coffee shops) among others
  • Highlights from some of the new features bundled in the upcoming 5.7 release, and what benefits they can bring to your applications

I’m happy to field questions in advance in the comments section below. This webinar, like all Percona webinars, will be recorded. In addition to the video, my slides will also be available for download.

Register now and I’ll talk to you next week!

The post MySQL and geospatial programming: An introduction to GIS appeared first on MySQL Performance Blog.

Jun
19
2014
--

Using UDFs for geo-distance search in MySQL

In my previous post about geo-spatial search in MySQL I described (along with other things) how to use geo-distance functions. In this post I will describe the geo-spatial distance functions in more details.

If you need to calculate an exact distance between 2 points on Earth in MySQL (very common for geo-enabled applications) you have at least 3 choices.

  • Use stored function and implement haversine formula
  • Use UDF (user defined function) for haversine (see below)
  • In MySQL 5.6 you can use st_distance function (newly documented), however, you will get the distance on plane and not on earth; the value returned will be good for sorting by distance but will not represent actual miles or kilometers.

MySQL stored function for calculating distance on Earth

I previously gave an example for a MySQL-stored function which implements the haversine formula. However, the approach I used was not very precise: it was optimized for speed. If you need a more precise haversine formula implementation you can use this function (result will be in miles):

delimiter //
create DEFINER = CURRENT_USER function haversine_distance_sp (lat1 double, lon1 double, lat2 double, lon2 double) returns double
 begin
   declare R int DEFAULT 3958.76;
   declare phi1 double;
   declare phi2 double;
   declare d_phi double;
   declare d_lambda double;
   declare a double;
   declare c double;
   declare d double;
   set phi1 = radians(lat1);
   set phi2 = radians(lat2);
   set d_phi = radians(lat2-lat1);
   set d_lambda = radians(lon2-lon1);
   set a = sin(d_phi/2) * sin(d_phi/2) +
         cos(phi1) * cos(phi2) *
         sin(d_lambda/2) * sin(d_lambda/2);
   set c = 2 * atan2(sqrt(a), sqrt(1-a));
   set d = R * c;
   return d;
   end;
//
delimiter ;

(the algorithm is based on the standard formula, I’ve used the well-known Movable Type scripts calculator)

This is a slower implementation as it uses arctangent, however it is more precise.  

MySQL UDF for Haversine distance

Another approach, which will give you much more performance is to use UDF. There are a number of implementations, I’ve used lib_mysqludf_haversine.

Here is the simple steps to install it in MySQL 5.6 (will also work with earlier versions):

$ wget 'https://github.com/lucasepe/lib_mysqludf_haversine/archive/master.zip'
$ unzip master.zip
$ cd lib_mysqludf_haversine-master/
$ make
mysql> show global variables like 'plugin%';
+---------------+-------------------------+
| Variable_name | Value                   |
+---------------+-------------------------+
| plugin_dir    | /usr/lib64/mysql/plugin |
+---------------+-------------------------+
1 row in set (0.00 sec)
$ sudo cp lib_mysqludf_haversine.so /usr/lib64/mysql/plugin/
mysql> CREATE FUNCTION haversine_distance RETURNS REAL SONAME 'lib_mysqludf_haversine.so';
mysql> select haversine_distance(37.470295464, -122.572938858498, 37.760150536, -122.20701914150199, 'mi') as dist_in_miles;
+---------------+
| dist_in_miles |
+---------------+
|     28.330467 |
+---------------+
1 row in set (0.00 sec)

Please note:

  • Make sure you have the mysql-devel or percona-server-devel package installed (MySQL development libraries) before installing.
  • You will need to specify the last parameter to be “mi” if you want to get the results in miles, otherwise it will give you kilometers.

MySQL ST_distance function

In MySQL 5.6 you can use ST_distance function:

mysql> select st_distance(point(37.470295464, -122.572938858498), point( 37.760150536, -122.20701914150199)) as distance_plane;
+---------------------+
| distance_plane      |
+---------------------+
| 0.46681174155173943 |
+---------------------+
1 row in set (0.00 sec)

As we can see it does not give us an actual distance in mile or kilometers as it does not take into account that we have latitude and longitude, rather than X and Y on plane.

Geo Distance Functions Performance

The stored procedures and functions in MySQL are known to be slower, especially with trigonometrical functions. I’ve did a quick test, using MySQL function benchmark.

First I set 2 points (10 miles from SFO airport)

set @rlon1 = 122.572938858498;
set @rlat1 = 37.470295464;
set @rlon2 = -122.20701914150199;
set @rlat2 = 37.760150536;

Next I use 4 function to benchmark:

  • Less precise stored function (haversine)
  • More precise stored function (haversine)
  • UDF for haversine
  • MySQL 5.6 native ST_distance (plane)

The benchmark function will execute the above function 100000 times.

Here are the results:

mysql>  select benchmark(100000,  haversine_old_sp(@rlat1, @rlon1, @rlat2, @rlon2)) as less_precise_mysql_stored_proc;
+--------------------------------+
| less_precise_mysql_stored_proc |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (1.46 sec)
mysql>  select benchmark(100000,  haversine_distance_sp(@rlat1, @rlon1, @rlat2, @rlon2)) as more_precise_mysql_stored_proc;
+--------------------------------+
| more_precise_mysql_stored_proc |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (2.58 sec)
mysql>  select benchmark(100000,  haversine_distance(@rlat1, @rlon1, @rlat2, @rlon2, 'mi')) as udf_haversine_function;
+------------------------+
| udf_haversine_function |
+------------------------+
|                      0 |
+------------------------+
1 row in set (0.17 sec)
mysql> select benchmark(100000, st_distance(point(@rlat1, @rlon1), point(@rlat2, @rlon1))) as mysql_builtin_st_distance;
+---------------------------+
| mysql_builtin_st_distance |
+---------------------------+
|                         0 |
+---------------------------+
1 row in set (0.10 sec)

As we can see the UDF gives much faster response time (which is comparable to built-in function).

Benchmark chart (smaller the better)

Conclusion

The lib_mysqludf_haversine UDF provides a good function for geo-distance search in MySQL. Please let me know in the comments what geo-distance functions or approaches do you use in your applications.

The post Using UDFs for geo-distance search in MySQL appeared first on MySQL Performance Blog.

Oct
21
2013
--

Using the new spatial functions in MySQL 5.6 for geo-enabled applications

Geo-enabled (or location enabled) applications are very common nowadays and many of them use MySQL. The common tasks for such applications are:

  • Find all points of interests (i.e. coffee shops) around (i.e. a 10 mile radius) the given location (latitude and longitude). For example we want to show this to a user of the mobile application when we know his/her approximate location. (This usually means we need to calculate a distance between 2 points on Earth).
  • Find a ZIP code (U.S. Postal address) for the given location or determine if this location is within the given area. Another example is to find a school district for the given property.

MySQL had the spatial functions originally (implementation follows a subset of OpenGIS standard). However, there are 2 major limitation of MySQL spatial functions that can make it difficult to use those functions in geo-enabled applications:

  • Distance between 2 points.  The “distance” function was not implemented before MySQL 5.6. In addition (even in MySQL 5.6), all calculations (e.g. distance between 2 points) are done using a planar coordinate system (Euclidean geometry). For the distance between 2 points on Earth this can produce incorrect results.
  • Determine if the point is inside a polygon. Before MySQL 5.6 the functions that test the spatial relationships between 2 geometries (i.e. find if the given point is within a polygon) only used a Minimum Bounding Rectangle (MBR). This is a major limitation for example #2 above (I will explain it below).

In my old presentation for the 2006 MySQL User Conference I  showed how to calculate distances on Earth in MySQL without using the MySQL spatial functions. In short, one can store the latitude and longitude coordinates directly in MySQL fields (decimal) and use a haversine  formula to calculate distance.

New MySQL 5.6 Geo Spatial Functions 

The good news is:

1) MySQL 5.6 adds a set of new functions (some of them are not 100% documented though) that use the object shapes rather than the MBR to calculate spatial relationships. Those new functions begins with “ST_”, i.e.

  • contains(g1, g2)  uses MBR only (not exact!)
  • st_contains(g1, g2) uses exact shapes

2) MySQL 5.6 implements st_distance(g1, g2) function that calculates the distance between 2 geometries, which is currently not documented (I’ve filed the feature request to document the st_distance function in MySQL)

The bad news is:

1) All functions still only use the planar system coordinates. Different SRIDs are not supported.

2) Spatial indexes (RTREE) are only supported for MyISAM tables. One can use the functions for InnoDB tables, but it will not use spatial keys.

Example of MySQL’s MBR “false positives”

To illustrate why we do not want to use MBR-based functions for geospatial search, I’ve generated 2 polygons that represent 2 zip code boundaries in San Francisco, CA and placed it on Google Maps.

The blue rectangle represents the Minimum Bounding Rectangle of Zip code “91102″ (I’ve used envelope() mysql function to obtain coordinates for the MBR). As we can see it covers both zip code 94103 and 94102. In this case if we have coordinates of a building in the city’s “south of market” district (ZIP 91103) and try to find a zip code it belongs to using the “contains()” function we will have a “false positives”:

mbr_example_sm

mysql>  select zip from postalcodes where contains(geom, point(-122.409153, 37.77765));
+-------+
| zip   |
+-------+
| 94102 |
| 94103 |
| 94158 |
+-------+
3 rows in set (0.00 sec)

In this particular example we got 3 zip codes as the MBR of 94158 also overlaps this area. Another point in “south of market” can actually produce 4 different zip codes. However, in MySQL 5.6 we can use the new st_contains function:

mysql> select zip from postalcodes where st_contains(geom, point(-122.409153, 37.77765));
+-------+
| zip   |
+-------+
| 94103 |
+-------+
1 row in set (0.00 sec)

As we can see st_contains() produces the correct results.

Find a ZIP code for the given location

Starting with MySQL 5.6 one can use the MySQL spatial functions st_contains or st_within to find if the given point is inside the given polygon. In our scenario we will need to find the zip code for the given latitude and longitude. To do that in MySQL we can perform the following steps:

  1. Load the zip code boundaries into MySQL as a multipoligon. There are a number of ways to get this done, one way is to download the shape files from the Census website and convert them to MySQL using org2org utility. (I will describe this in more detail in upcoming blog posts). The data will be stored as MySQL Geometry object, to convert it to text we can use astext(geom) function.
  2. Use the st_contains() or st_within() functions:
    mysql> select zip from postalcodes where st_contains(geom, point(-122.409153, 37.77765));
    +-------+
    | zip   |
    +-------+
    | 94103 |
    +-------+
    1 row in set (0.00 sec)


    or

    mysql> select zip from postalcodes where st_within(point(-122.409153, 37.77765), geom);
    +-------+
    | zip   |
    +-------+
    | 94103 |
    +-------+
    1 row in set (0.00 sec)

Spatial Index for “ST_” functions

MyISAM tables support Spatial indexes, so the above queries will use those indexes. Example:

mysql> alter table postalcodes add spatial index zip_boundaries_spatial (geom);
Query OK, 35679 rows affected (5.30 sec)
Records: 35679  Duplicates: 0  Warnings: 0
mysql> explain select zip from postalcodes where st_contains(geom, point(-122.409153, 37.77765))\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: postalcodes
         type: range
possible_keys: zip_boundaries_spatial
          key: zip_boundaries_spatial
      key_len: 34
          ref: NULL
         rows: 1
        Extra: Using where
1 row in set (0.01 sec)

As we can see our spatial index is used for those functions. If we ignore or remove the index, the query will run significantly slower:

mysql> select zip from postalcodes where st_within(point(-122.409153, 37.77765), geom);
+-------+
| zip   |
+-------+
| 94103 |
+-------+
1 row in set (0.00 sec)
mysql> select zip from postalcodes ignore index (zip_boundaries_spatial) where st_contains(geom, point(-122.409153, 37.77765));
+-------+
| zip   |
+-------+
| 94103 |
+-------+
1 row in set (4.24 sec)

The InnoDB engine does not support spatial indexes, so those queries will be slow. As zip boundaries does not change often we can potentially use MyISAM tables for them.

Find all coffee shops in a 10-mile radius

MySQL 5.6 supports st_distance functions with 2 drawbacks:

  1. It only supports planar coordinates
  2. It does not use index

Given those major limitations, it is not very easy to use st_distance function for the geo enabled applications. If we simply need to find a distance between 2 points it is easier to store lat, lon directly and use harvesine expression (as described above).

However it is still possible to use the st_distance() if we do not need exact numbers for the distance between 2 points (i.e. we only need to sort by distance). In our example, to find all coffee shops we will need to:

  1. Get the 10 mile radius MBR and use “within()” or “st_within()” function
  2. Use st_distance function in the order by clause

First, we will calculate an envelope (square) to include approximately 10 miles, using the following approximations:

  • 1 degree of latitude ~= 69 miles
  • 1 degree of longitude ~= cos(latitude)*69 miles
set @lat= 37.615223;
set @lon = -122.389979;
set @dist = 10;
set @rlon1 = @lon-@dist/abs(cos(radians(@lat))*69);
set @rlon2 = @lon+@dist/abs(cos(radians(@lat))*69);
set @rlat1 = @lat-(@dist/69);
set @rlat2 = @lat+(@dist/69);

@lat and @lon in this example are the coordinates for the San Francisco International Airport (SFO).

This will give us a set of coordinates (points) for the lower left and upper right corner of our square. Then we can use a MySQL’s envelope function to generate the MBR (we use linestring to draw a line between the 2 generated points and then envelope to draw an square):

select astext(envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))));

The “envelope” will look like this:

envelope_example

This is not exactly a 10-mile radius, however it may be close enough. Now we can find all points around SFO airport and sort by distance.

mysql> select astext(shape), name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), shape) limit  10;
+--------------------------------+-------------------------------+
| astext(shape)                  | name                          |
+--------------------------------+-------------------------------+
| POINT(-122.3890954 37.6145378) | Tram stop:Terminal A          |
| POINT(-122.3899 37.6165902)    | Tram stop:Terminal G          |
| POINT(-122.3883973 37.6150806) | Fast Food Restaurant          |
| POINT(-122.388929 37.6164584)  | Restaurant:Ebisu              |
| POINT(-122.3885347 37.6138365) | Fast Food Restaurant:Firewood |
| POINT(-122.38893 37.6132399)   | Cafe:Amoura Café              |
| POINT(-122.3894594 37.6129537) | Currency exchange             |
| POINT(-122.39197849 37.614026) | Parking:Garage A              |
| POINT(-122.3919031 37.6138567) | Tram stop:Garage A            |
| POINT(-122.389176 37.612886)   | Public telephone              |
+--------------------------------+-------------------------------+
10 rows in set (0.02 sec)
mysql> explain select astext(shape), name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
order by st_distance(point(@lon, @lat), shape) limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: waypoints
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 430
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

As we can see from the explain it will use the spatial key on SHAPE and will only scan 430 rows, rather than millions of POIs.

The query does not show the exact distance (this may be ok if we only need to output the points on the map).  If we need to show the distance we can use the harvesine formula to calculate that. For example we can create the following stored function to implement the calculations:

create DEFINER = CURRENT_USER function harvesine (lat1 double, lon1 double, lat2 double, lon2 double) returns double
 return  3956 * 2 * ASIN(SQRT(POWER(SIN((lat1 - abs(lat2)) * pi()/180 / 2), 2)
         + COS(abs(lat1) * pi()/180 ) * COS(abs(lat2) * pi()/180) * POWER(SIN((lon1 - lon2) * pi()/180 / 2), 2) )) ;

And then use it for both order by and to displaying the distance. This query will also filter by “coffee”:

mysql> select harvesine(y(shape), x(shape), @lat, @lon ) as dist,  name from waypoints
where st_within(shape, envelope(linestring(point(@rlon1, @rlat1), point(@rlon2, @rlat2))))
and name like '%coffee%' 
order by dist limit 10;
+-------------------+----------------------------------+
| dist              | name                             |
+-------------------+----------------------------------+
| 3.462439728799387 | Cafe:Peet's Coffee               |
| 8.907725074619638 | Cafe:Nervous Dog Coffee          |
| 9.169043718528133 | Cafe:Peet's Coffee & Tea         |
| 9.252659680688794 | Cafe:Martha and Bros Coffee      |
| 9.492498547771854 | Cafe:Manor Coffee Shop           |
| 9.559275248726559 | Cafe:Dynamo Donut & Coffee       |
|  9.57775126039776 | Cafe:Starbucks Coffee            |
| 9.585378425394556 | Cafe:Muddy's Coffeehouse         |
|  9.66247951599322 | Cafe:Martha and Bros. Coffee Co. |
| 9.671254753804767 | Cafe:Starbucks Coffee            |
+-------------------+----------------------------------+
10 rows in set (0.02 sec)

Conclusion

MySQL 5.6 implements an additional set of functions that can help create geo-enabled applications with MySQL. Storing polygons boundaries (ZIP code boundaries for example) is efficient and the new spatial functions (st_within, st_contains, etc) will produce correct results and will use spatial (rtree) indexes (for MyISAM tables only). The OpenGIS standard is very common and it is easy to obtain the data in this format or use the standard application which can “talk” this language.

Unfortunately, st_distance function is not very usable for calculating distance between 2 points on Earth and it does not use an index. In this case it is still more feasible to calculate distances manually using the harvesine formula. Hopefully this will be fixed in the next mysql release.

There are also some other limitations, for example st_union() function only supports 2 arguments and does not support an array, so it can’t be used in a queries like “select st_union(geom) from zipcodes group by state”.

Links

And finally, let me know in the comments how you use MySQL for geo enabled applications. In my next post I will talk more about basics of the MySQL geo spatial extension as well as Sphinx Search‘s implementation of the Geospatial functions.

The post Using the new spatial functions in MySQL 5.6 for geo-enabled applications appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com