Apr
06
2021
--

Esri brings its flagship ArcGIS platform to Kubernetes

Esri, the geographic information system (GIS), mapping and spatial analytics company, is hosting its (virtual) developer summit today. Unsurprisingly, it is making a couple of major announcements at the event that range from a new design system and improved JavaScript APIs to support for running ArcGIS Enterprise in containers on Kubernetes.

The Kubernetes project was a major undertaking for the company, Esri Product Managers Trevor Seaton and Philip Heede told me. Traditionally, like so many similar products, ArcGIS was architected to be installed on physical boxes, virtual machines or cloud-hosted VMs. And while it doesn’t really matter to end-users where the software runs, containerizing the application means that it is far easier for businesses to scale their systems up or down as needed.

Esri ArcGIS Enterprise on Kubernetes deployment

Esri ArcGIS Enterprise on Kubernetes deployment. Image Credits: Esri

“We have a lot of customers — especially some of the larger customers — that run very complex questions,” Seaton explained. “And sometimes it’s unpredictable. They might be responding to seasonal events or business events or economic events, and they need to understand not only what’s going on in the world, but also respond to their many users from outside the organization coming in and asking questions of the systems that they put in place using ArcGIS. And that unpredictable demand is one of the key benefits of Kubernetes.”

Deploying Esri ArcGIS Enterprise on Kubernetes

Deploying Esri ArcGIS Enterprise on Kubernetes. Image Credits: Esri

The team could have chosen to go the easy route and put a wrapper around its existing tools to containerize them and call it a day, but as Seaton noted, Esri used this opportunity to re-architect its tools and break it down into microservices.

“It’s taken us a while because we took three or four big applications that together make up [ArcGIS] Enterprise,” he said. “And we broke those apart into a much larger set of microservices. That allows us to containerize specific services and add a lot of high availability and resilience to the system without adding a lot of complexity for the administrators — in fact, we’re reducing the complexity as we do that and all of that gets installed in one single deployment script.”

While Kubernetes simplifies a lot of the management experience, a lot of companies that use ArcGIS aren’t yet familiar with it. And as Seaton and Heede noted, the company isn’t forcing anyone onto this platform. It will continue to support Windows and Linux just like before. Heede also stressed that it’s still unusual — especially in this industry — to see a complex, fully integrated system like ArcGIS being delivered in the form of microservices and multiple containers that its customers then run on their own infrastructure.

Image Credits: Esri

In addition to the Kubernetes announcement, Esri also today announced new JavaScript APIs that make it easier for developers to create applications that bring together Esri’s server-side technology and the scalability of doing much of the analysis on the client-side. Back in the day, Esri would support tools like Microsoft’s Silverlight and Adobe/Apache Flex for building rich web-based applications. “Now, we’re really focusing on a single web development technology and the toolset around that,” Esri product manager Julie Powell told me.

A bit later this month, Esri also plans to launch its new design system to make it easier and faster for developers to create clean and consistent user interfaces. This design system will launch April 22, but the company already provided a bit of a teaser today. As Powell noted, the challenge for Esri is that its design system has to help the company’s partners put their own style and branding on top of the maps and data they get from the ArcGIS ecosystem.

 

Feb
03
2016
--

New GIS Features in MySQL 5.7

MySQL 5.7

MySQL 5.7MySQL 5.7 has been released, and there are some exciting new features now available that I’m going to discuss in this blog — specifically around geographic information system (GIS).

I’ve used GIS features in MySQL for a long time. In my previous blog entries I’ve shown how to create geo-enabled applications with MySQL 5.6 and use MySQL 5.6 geo-spatial functions. In this blog post, I’ll look into what is new in MySQL 5.7 and how we can use those new features in practice for geo-enabled applications.

New in MySQL 5.7

MySQL 5.7 introduces the following major improvements and features for GIS:

  1. Spatial indexes for InnoDB. Finally it is here! This was a long overdue feature, which also prevented many companies from converting all tables to InnoDB.
  2. st_distance_sphere: native function to calculate a distance between two points on earth. Finally it is here as well! Like many others, I’ve created my stored procedure to calculate the distance between points on earth using haversine formula. The native function is ~20x faster than the stored procedure (in an artificial benchmark, see below). This is not surprising, as stored procedures are slow computationally – especially for trigonometrical functions.
  3. New functions: GeoHash and GeoJSON. With GeoJSON we can generate the results that are ready for visualizing on Google Maps.
  4. New GIS implementation based on Boost.Geometry library. This is great news, as originally GIS was implemented independently from scratch with a very limited set of features. Manyi Lu from MySQL server team provides more reasoning behind the choice of Boost.Geometry.

This is the great news. The bad news is that except for the st_distance_sphere, all other functions use planar geometry (no change since MySQL 5.6) and do not support Spatial Reference System Identifier (SRID). That means that if I want to calculate the distance of my favorite bike path in miles or kilometers, I’ll still have to use a stored function (see below for an example) or write an application code for that. Native function st_distance will ignore SRID for now and return a value which represents a distance on a planar – not very useful for our purposes (may be useful for order by / compare).

Distance on Sphere

MySQL 5.7 introduces the function st_distance_sphere, which uses a haversine formula to calculate distance. He is the example:

mysql> select st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954));
+--------------------------------------------------------------------------------+
| st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)) |
+--------------------------------------------------------------------------------+
|                                                             3855600.7928957273 |
+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)

The distance is in meters by default (you can also change the radius of the earth to meters using the 3rd optional parameter, default: 6,370,986). Although our earth is represented as an oblate spheroid, all practical applications use the distance on a sphere. The difference between the haversine formula and more precise (and much slower) functions is negligible for our purposes.

The st_distance_sphere is much faster than using stored routines. Here is the artificial benchmark:

mysql> select benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947));
+-----------------------------------------------------------------------------------------+
| benchmark(1000000, haversine_distance_sp(37.60954, -122.38657, 35.890334, -78.7698947)) |
+-----------------------------------------------------------------------------------------+
|                                                                                       0 |
+-----------------------------------------------------------------------------------------+
1 row in set (22.55 sec)
mysql> select benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954)));
+----------------------------------------------------------------------------------------------------+
| benchmark(1000000, st_distance_sphere(point(-78.7698947, 35.890334), point(-122.38657, 37.60954))) |
+----------------------------------------------------------------------------------------------------+
|                                                                                                  0 |
+----------------------------------------------------------------------------------------------------+
1 row in set (0.77 sec)

haversine_distance_sp is a stored routine implementation of the same algorithm.

InnoDB GIS example: find 10 restaurants near me 

In my previous blog post I’ve demonstrated how to use st_within function to find restaurants inside my zipcode (US postal code) and sort by distance. In MySQL 5.7 there will be 2 changes:

  1. We can use InnoDB table
  2. We can use st_distance_sphere function

For this example, I’ve converted Open Street Map data to MySQL and then created a new InnoDB table:

CREATE TABLE `points_new` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `osm_id` text,
  `name` text,
  `barrier` text,
  `highway` text,
  `ref` text,
  `address` text,
  `is_in` text,
  `place` text,
  `man_made` text,
  `other_tags` text,
  UNIQUE KEY `OGR_FID` (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=InnoDB AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1

SHAPE is declared as geometry (and stores points in this table). We also have SPATIAL KEY SHAPE in the InnoDB table.

The following query will find all cafe or restaurants in Durham, NC (zipcode: 27701):

SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist
FROM points_new
WHERE st_within(shape,
      (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
	  and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
	  and name is not null
ORDER BY dist asc LIMIT 10;

Table tl_2013_us_zcta510 stores the shapes of polygons for all US zipcodes. (It needs to be converted to MySQL.) In this example I’m using st_within to filter only the POIs I need, and st_distance_sphere to get the distance from my location (-78.9064543 35.9975194 are the coordinates of Percona’s office in Durham) to the restaurants.

Explain plan:

mysql> EXPLAIN
    -> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist
    -> FROM points_new
    -> WHERE st_within(shape,
    ->       (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
    ->   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
    ->   and name is not null
    -> ORDER BY dist asc LIMIT 10G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: points_new
   partitions: NULL
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 21
     filtered: 18.89
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: SUBQUERY
        table: tl_2013_us_zcta510
   partitions: NULL
         type: ref
possible_keys: zcta5ce10
          key: zcta5ce10
      key_len: 8
          ref: const
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set, 1 warning (0.00 sec)

That looks pretty good: MySQL is using and index on the SHAPE field (even with the subquery, btw).

Results:

mysql> SELECT osm_id, name, round(st_distance_sphere(shape, st_geomfromtext('POINT (-78.9064543 35.9975194)', 1) ), 2) as dist, st_astext(shape)
    -> FROM points_new
    -> WHERE st_within(shape,
    ->       (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
    ->   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
    ->   and name is not null
    -> ORDER BY dist asc LIMIT 10;
+------------+----------------------------+--------+--------------------------------------+
| osm_id     | name                       | dist   | st_astext(shape)                     |
+------------+----------------------------+--------+--------------------------------------+
| 880747417  | Pop's                      | 127.16 | POINT(-78.9071795 35.998501)         |
| 1520441350 | toast                      | 240.55 | POINT(-78.9039761 35.9967069)        |
| 2012463902 | Pizzeria Toro              | 256.44 | POINT(-78.9036457 35.997125)         |
| 398941519  | Parker & Otis              | 273.39 | POINT(-78.9088833 35.998997)         |
| 881029843  | Torero's                   | 279.96 | POINT(-78.90829140000001 35.9995516) |
| 299540833  | Fishmonger's               | 300.01 | POINT(-78.90850250000001 35.9996487) |
| 1801595418 | Lilly's Pizza              | 319.83 | POINT(-78.9094462 35.9990732)        |
| 1598401100 | Dame's Chicken and Waffles | 323.82 | POINT(-78.9031929 35.9962871)        |
| 685493947  | El Rodeo                   | 379.18 | POINT(-78.909865 35.999523)          |
| 685504784  | Piazza Italia              | 389.06 | POINT(-78.9096472 35.9998794)        |
+------------+----------------------------+--------+--------------------------------------+
10 rows in set (0.13 sec)

0.13 seconds response time on AWS t2.medium box sounds reasonable to me. The same query on the MyISAM table shows ~same response time: 0.14 seconds.

GeoJSON feature and Google Maps

Another nice feature of MySQL 5.7 GIS is GeoJSON function: you can convert your result set to GeoJSON, which can be used with other applications (for example Google Maps API).

Let’s say I want to visualize the above result set on Google Map. As the API requires a specific format, I can use concat / group_concat to apply the format inside the SQL:

SELECT CONCAT('{
  "type": "FeatureCollection",
  "features": [
  ',
   GROUP_CONCAT('{
   "type": "Feature",
      "geometry": ', ST_AsGeoJSON(shape), ',
      "properties": {}
   }'),
  ']
}') as j
FROM points_new
WHERE st_within(shape,
      (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
	  and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
	  and name is not null

I will get all the restaurants and cafes in zipcode 27701. Here I’m using ST_AsGeoJSON(shape) to convert to GeoJSON, and concat/group_concat to “nest” the whole result into the format suitable for Google Maps.

Result:

mysql> set group_concat_max_len = 1000000;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT CONCAT('{
    '>   "type": "FeatureCollection",
    '>   "features": [
    '>   ',
    ->    GROUP_CONCAT('{
    '>    "type": "Feature",
    '>       "geometry": ', ST_AsGeoJSON(shape), ',
    '>       "properties": {}
    '>    }'),
    ->   ']
    '> }') as j
    -> FROM points_new
    -> WHERE st_within(shape,
    ->       (select shape from zcta.tl_2013_us_zcta510 where zcta5ce10='27701') )
    ->   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')
    ->   and name is not null
*************************** 1. row ***************************
j: {
  "type": "FeatureCollection",
  "features": [
  {
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.890852, 35.9903403]},
      "properties": {}
   },{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.8980807, 35.9933562]},
      "properties": {}
   },{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.89972490000001, 35.995879]},
      "properties": {}
   } ... ,{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.9103211, 35.9998494]},
      "properties": {}
   },{
   "type": "Feature",
      "geometry": {"type": "Point", "coordinates": [-78.9158326, 35.9967114]},
      "properties": {}
   }]
}
1 row in set (0.14 sec)

I did not include the full result set for the lack of space; I also had to change the group concat max length, otherwise MySQL will cut the result of the group_concat function.

Now I can visualize it:

 

 

 

 

 

 

 

 

 

 

Example: Find the longest bike path

MySQL 5.7 (as well as the older versions) supports st_length function to calculate a length of a linestring. However, even in MySQL 5.7 st_length can’t calculate the distance on earth. To find the distance of a linestring I’ve created a very simple stored procedure:

DELIMITER //
CREATE DEFINER=CURRENT_USER() FUNCTION `ls_distance_sphere`(ls GEOMETRY) RETURNS DECIMAL(20,8)
    DETERMINISTIC
BEGIN
DECLARE i, n INT DEFAULT 0;
DECLARE len DECIMAL(20,8) DEFAULT 0;
SET i = 1;
SET n = ST_NumPoints(ls);
 WHILE i < n DO
    SET len = len +  st_distance_sphere(st_pointN(ls, i), st_pointN(ls, i+1));
SET i = i + 2;
 END WHILE;
 RETURN len;
END //
DELIMITER ;

As the Open Street Map data has the information about roads in North America, we can use this function to calculate the length (in meters) for every road it stores:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null limit 10;
+---------------------------------------+---------------------------+
| name                                  | ls_distance_sphere(shape) |
+---------------------------------------+---------------------------+
| Highbury Park Drive Bypass            |                0.97386664 |
| Ygnacio Canal Trail                   |                0.86093199 |
| South Marion Parkway                  |                1.06723424 |
| New River Greenway                    |                1.65705401 |
| Northern Diversion Trail              |                2.08269808 |
| Gary L. Haller Trail;Mill Creek Trail |                2.09988209 |
| Loop 1                                |                2.05297129 |
| Bay Farm Island Bicycle Bridge        |                2.51141623 |
| Burrard Street                        |                1.84810259 |
| West 8th Avenue                       |                1.76338236 |
+---------------------------------------+---------------------------+
10 rows in set (0.00 sec)

Index the polygon/area distance using MySQL 5.7 virtual fields

To really answer the question “what is the longest bikepath (cyclepath) in North America?” we will have to order by stored function result. This will cause a full table scan and a filestort, which will be extremely slow for 30 millions of rows. The standard way to fix this is to materialize this road distance: add an additional field to the table and store the distance there.

In MySQL 5.7 we can actually use Generated (Virtual) Columns feature:

CREATE TABLE `lines_new` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `osm_id` int(11) DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL,
  `highway` varchar(60) DEFAULT NULL,
  `waterway` text,
  `aerialway` text,
  `barrier` text,
  `man_made` text,
  `other_tags` text,
  `linestring_length` decimal(15,8) GENERATED ALWAYS AS (st_length(shape)) VIRTUAL,
  PRIMARY KEY (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`),
  KEY `linestring_length` (`linestring_length`),
  KEY `highway_len` (`highway`,`linestring_length`)
) ENGINE=InnoDB AUTO_INCREMENT=27077492 DEFAULT CHARSET=latin1

Unfortunately, MySQL 5.7 does not support non-native functions (stored procedures or UDF) in generated columns, so I have to use st_length in this example. Ordering by value of st_length may be OK though:

mysql> select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10;
+-----------------------------+---------------------------+
| name                        | ls_distance_sphere(shape) |
+-----------------------------+---------------------------+
| Confederation Trail         |            55086.92572725 |
| Cowboy Trail                |            43432.06768706 |
| Down East Sunrise Trail     |            42347.39791330 |
| Confederation Trail         |            29844.91038542 |
| Confederation Trail         |            26141.04655981 |
| Longleaf Trace              |            29527.66063726 |
| Cardinal Greenway           |            30613.24487294 |
| Lincoln Prairie Grass Trail |            19648.26787218 |
| Ghost Town Trail            |            25610.52158647 |
| Confederation Trail         |            27086.54829531 |
+-----------------------------+---------------------------+
10 rows in set (0.02 sec)

The query is very fast as it uses an index on both highway and linestring:

mysql> explain select name, ls_distance_sphere(shape) from lines_new where highway = 'cycleway' and name is not null order by linestring_length desc limit 10G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: lines_new
   partitions: NULL
         type: ref
possible_keys: highway_len
          key: highway_len
      key_len: 63
          ref: const
         rows: 119392
     filtered: 90.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

Conclusion

MySQL 5.7 contains a great set of features to work with geospatial data. Finally, spatial indexes are supported in InnoDB; st_distance_sphere as a native function is very useful. Unfortunately, other spatial functions only work with planar coordinates and do not support SRID. I hope this will be fixed in the new releases.

Mar
13
2015
--

MySQL and geospatial programming: An introduction to GIS

Geographic information systems (GIS) are used by application developers to incorporate geographic information into their datasets to create apps with cool features such finding the address to the best steakhouse in town or the locations of local points of historical interest (the list is endless). In a nutshell, GIS captures, store, checks and displays data related to positions on Earth’s surface.

Next Wednesday I’m going provide an introduction to GIS functionality in MySQL along an overview of recent changes. I invite you to register now for this free webinar titled “MySQL and Geospatial Programming.” It starts at 10 a.m. Pacific time on March 18.

MySQL and Geospatial ProgrammingIn this talk I’ll discuss:

  • The various tasks that deal with geocoding
  • How MySQL can solve problems such as points within a radius (e.g., “Find the 10 closest coffee shops) among others
  • Highlights from some of the new features bundled in the upcoming 5.7 release, and what benefits they can bring to your applications

I’m happy to field questions in advance in the comments section below. This webinar, like all Percona webinars, will be recorded. In addition to the video, my slides will also be available for download.

Register now and I’ll talk to you next week!

The post MySQL and geospatial programming: An introduction to GIS appeared first on MySQL Performance Blog.

Jun
19
2014
--

Using UDFs for geo-distance search in MySQL

In my previous post about geo-spatial search in MySQL I described (along with other things) how to use geo-distance functions. In this post I will describe the geo-spatial distance functions in more details.

If you need to calculate an exact distance between 2 points on Earth in MySQL (very common for geo-enabled applications) you have at least 3 choices.

  • Use stored function and implement haversine formula
  • Use UDF (user defined function) for haversine (see below)
  • In MySQL 5.6 you can use st_distance function (newly documented), however, you will get the distance on plane and not on earth; the value returned will be good for sorting by distance but will not represent actual miles or kilometers.

MySQL stored function for calculating distance on Earth

I previously gave an example for a MySQL-stored function which implements the haversine formula. However, the approach I used was not very precise: it was optimized for speed. If you need a more precise haversine formula implementation you can use this function (result will be in miles):

delimiter //
create DEFINER = CURRENT_USER function haversine_distance_sp (lat1 double, lon1 double, lat2 double, lon2 double) returns double
 begin
   declare R int DEFAULT 3958.76;
   declare phi1 double;
   declare phi2 double;
   declare d_phi double;
   declare d_lambda double;
   declare a double;
   declare c double;
   declare d double;
   set phi1 = radians(lat1);
   set phi2 = radians(lat2);
   set d_phi = radians(lat2-lat1);
   set d_lambda = radians(lon2-lon1);
   set a = sin(d_phi/2) * sin(d_phi/2) +
         cos(phi1) * cos(phi2) *
         sin(d_lambda/2) * sin(d_lambda/2);
   set c = 2 * atan2(sqrt(a), sqrt(1-a));
   set d = R * c;
   return d;
   end;
//
delimiter ;

(the algorithm is based on the standard formula, I’ve used the well-known Movable Type scripts calculator)

This is a slower implementation as it uses arctangent, however it is more precise.  

MySQL UDF for Haversine distance

Another approach, which will give you much more performance is to use UDF. There are a number of implementations, I’ve used lib_mysqludf_haversine.

Here is the simple steps to install it in MySQL 5.6 (will also work with earlier versions):

$ wget 'https://github.com/lucasepe/lib_mysqludf_haversine/archive/master.zip'
$ unzip master.zip
$ cd lib_mysqludf_haversine-master/
$ make
mysql> show global variables like 'plugin%';
+---------------+-------------------------+
| Variable_name | Value                   |
+---------------+-------------------------+
| plugin_dir    | /usr/lib64/mysql/plugin |
+---------------+-------------------------+
1 row in set (0.00 sec)
$ sudo cp lib_mysqludf_haversine.so /usr/lib64/mysql/plugin/
mysql> CREATE FUNCTION haversine_distance RETURNS REAL SONAME 'lib_mysqludf_haversine.so';
mysql> select haversine_distance(37.470295464, -122.572938858498, 37.760150536, -122.20701914150199, 'mi') as dist_in_miles;
+---------------+
| dist_in_miles |
+---------------+
|     28.330467 |
+---------------+
1 row in set (0.00 sec)

Please note:

  • Make sure you have the mysql-devel or percona-server-devel package installed (MySQL development libraries) before installing.
  • You will need to specify the last parameter to be “mi” if you want to get the results in miles, otherwise it will give you kilometers.

MySQL ST_distance function

In MySQL 5.6 you can use ST_distance function:

mysql> select st_distance(point(37.470295464, -122.572938858498), point( 37.760150536, -122.20701914150199)) as distance_plane;
+---------------------+
| distance_plane      |
+---------------------+
| 0.46681174155173943 |
+---------------------+
1 row in set (0.00 sec)

As we can see it does not give us an actual distance in mile or kilometers as it does not take into account that we have latitude and longitude, rather than X and Y on plane.

Geo Distance Functions Performance

The stored procedures and functions in MySQL are known to be slower, especially with trigonometrical functions. I’ve did a quick test, using MySQL function benchmark.

First I set 2 points (10 miles from SFO airport)

set @rlon1 = 122.572938858498;
set @rlat1 = 37.470295464;
set @rlon2 = -122.20701914150199;
set @rlat2 = 37.760150536;

Next I use 4 function to benchmark:

  • Less precise stored function (haversine)
  • More precise stored function (haversine)
  • UDF for haversine
  • MySQL 5.6 native ST_distance (plane)

The benchmark function will execute the above function 100000 times.

Here are the results:

mysql>  select benchmark(100000,  haversine_old_sp(@rlat1, @rlon1, @rlat2, @rlon2)) as less_precise_mysql_stored_proc;
+--------------------------------+
| less_precise_mysql_stored_proc |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (1.46 sec)
mysql>  select benchmark(100000,  haversine_distance_sp(@rlat1, @rlon1, @rlat2, @rlon2)) as more_precise_mysql_stored_proc;
+--------------------------------+
| more_precise_mysql_stored_proc |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (2.58 sec)
mysql>  select benchmark(100000,  haversine_distance(@rlat1, @rlon1, @rlat2, @rlon2, 'mi')) as udf_haversine_function;
+------------------------+
| udf_haversine_function |
+------------------------+
|                      0 |
+------------------------+
1 row in set (0.17 sec)
mysql> select benchmark(100000, st_distance(point(@rlat1, @rlon1), point(@rlat2, @rlon1))) as mysql_builtin_st_distance;
+---------------------------+
| mysql_builtin_st_distance |
+---------------------------+
|                         0 |
+---------------------------+
1 row in set (0.10 sec)

As we can see the UDF gives much faster response time (which is comparable to built-in function).

Benchmark chart (smaller the better)

Conclusion

The lib_mysqludf_haversine UDF provides a good function for geo-distance search in MySQL. Please let me know in the comments what geo-distance functions or approaches do you use in your applications.

The post Using UDFs for geo-distance search in MySQL appeared first on MySQL Performance Blog.

Mar
24
2014
--

Creating GEO-enabled applications with MySQL 5.6

In my previous post I’ve showed some new MySQL 5.6 features which can be very helpful when creating geo-enabled applications. In this post I will show how we can obtain open-source GIS data, convert it to MySQL and use it in our GEO-enabled applications. I will also present at the upcoming Percona Live conference on this topic.

Data sources (US)

For the U.S. we may look at 2 major data sources:

1. ZIP codes with latitude, longitude and zip code boundaries (polygon). This can be downloaded from the U.S. Census website: US Zipcodes direct link

2. Point of interests, roads, boundaries, etc. The Openstreatmap website provides an excellent source of the GIS data. North American data can be downloaded here (updates frequently)

Data formats and conversion

U.S. Census data is stored in Shapefile (.shp, .shx, .dbf) format. Openstreetmap uses its own XML format (OSM)  and/or Protocolbuffer Binary Format. We can convert this to MySQL with GDAL server (on Linux) and ogr2ogr utility. To convert Shapefile any version of GDAL will work, however, for OSM/PBF we will need to use v. 1.10. The easiest way to get the GDAL 1.10 is to use Ubuntu + ubuntugis-unstable repo.

Here are the commands I’ve used to install:

apt-add-repository ppa:ubuntugis/ubuntugis-unstable
apt-get update
apt-get install gdal-bin

This will install gdal server. Make sure it is latest version and support OSM format:

ogr2ogr --version
GDAL 1.10.1, released 2013/08/26
ogrinfo --formats|grep OSM
-> "OSM" (readonly)

Now we can convert it to MySQL. First, make sure MySQL has the default storage engine = MyISAM (yes, GDAL will use MyISAM to be able to add a spatial index) and the max_allowed_packet is large enough:

mysql -e "set global max_allowed_packet = 16777216*10; set global default_storage_engine = MyISAM; "

ZIP codes and boundaries conversion

Now we can start conversion:

# ogr2ogr -overwrite -progress -f "MySQL" MYSQL:zcta,user=root tl_2013_us_zcta510.shp
0...10...20...30...40...50...60...70...80...90...100 - done.

The only thing we need to specify is db name and user name (assuming it will write to the localhost, otherwise specify the MySQL host). ogr2org will create all needed tables.

mysql> use zcta
Database changed
mysql> show tables;
+--------------------+
| Tables_in_zcta     |
+--------------------+
| geometry_columns   |
| spatial_ref_sys    |
| tl_2013_us_zcta510 |
+--------------------+
3 rows in set (0.00 sec)

The  geometry_columns and spatial_ref_sys are the reference tables only. All zip codes and boundaries will be stored in tl_2013_us_zcta510 table:

mysql> show create table tl_2013_us_zcta510\G
*************************** 1. row ***************************
       Table: tl_2013_us_zcta510
Create Table: CREATE TABLE `tl_2013_us_zcta510` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `zcta5ce10` varchar(5) DEFAULT NULL,
  `geoid10` varchar(5) DEFAULT NULL,
  `classfp10` varchar(2) DEFAULT NULL,
  `mtfcc10` varchar(5) DEFAULT NULL,
  `funcstat10` varchar(1) DEFAULT NULL,
  `aland10` double DEFAULT NULL,
  `awater10` double DEFAULT NULL,
  `intptlat10` varchar(11) DEFAULT NULL,
  `intptlon10` varchar(12) DEFAULT NULL,
  UNIQUE KEY `OGR_FID` (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=MyISAM AUTO_INCREMENT=33145 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Example 1. Selecting zip code boundaries for a given zipcode (Durham, NC):

mysql> select astext(shape) from zcta.tl_2013_us_zcta510 where zcta5ce10='27701'\G
*************************** 1. row ***************************
astext(shape): POLYGON((-78.902351 35.988107,-78.902436 35.988116,-78.902597 35.98814,-78.902725 35.988147,-78.902992 35.988143,-78.903117 35.988129,... -78.902351 35.988107))

Example 2. Find ZIP code for the given point (Lat, Lon): Percona HQ in Durham, NC

mysql> SELECT zcta5ce10 as ZIP
FROM tl_2013_us_zcta510
WHERE
st_contains(shape,
POINT(-78.90423, 36.004122));
+-------+
| ZIP   |
+-------+
| 27701 |
+-------+
1 row in set (0.00 sec)

Converting OpenStreetMap (OSM) data 

Converting OSM is the same:

ogr2ogr -overwrite -progress -f "MySQL" MYSQL:osm,user=root north-america-latest.osm.pbf

Please note, that it will take a long time to convert (8-12+ hours, depends upon the hardware).

Tables:

mysql> use osm
Database changed
mysql> show tables;
+------------------+
| Tables_in_osm    |
+------------------+
| geometry_columns |
| lines            |
| multilinestrings |
| multipolygons    |
| other_relations  |
| points           |
| spatial_ref_sys  |
+------------------+
7 rows in set (0.00 sec)

Points of interest are stored in “points” table. “Lines” and “multilinestrings” tables contain streets, hiking trails, bike paths, etc:

mysql> show create table points\G
*************************** 1. row ***************************
       Table: points
Create Table: CREATE TABLE `points` (
  `OGR_FID` int(11) NOT NULL AUTO_INCREMENT,
  `SHAPE` geometry NOT NULL,
  `osm_id` text,
  `name` text,
  `barrier` text,
  `highway` text,
  `ref` text,
  `address` text,
  `is_in` text,
  `place` text,
  `man_made` text,
  `other_tags` text,
  UNIQUE KEY `OGR_FID` (`OGR_FID`),
  SPATIAL KEY `SHAPE` (`SHAPE`)
) ENGINE=MyISAM AUTO_INCREMENT=13660668 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

“Shape” is the point (in spatial format) and other_tags will contain some additional format (in JSON format), this is how ogr2ogr converts it by default. See the GDAL documentation on the OSM driver for more information.

OSM data may contain the zip code, but this is not guaranteed.  Here is the example how we can find all coffee shops in ZIP code 27701:

mysql> select shape into @shape
from zcta.tl_2013_us_zcta510
where zcta5ce10='27701';
Query OK, 1 row affected (0.00 sec)
mysql> SELECT name, st_distance(shape, centroid(@shape) ) as dist
FROM points
WHERE st_within(shape, @shape)
and other_tags like '%"amenity"=>"cafe"%'  limit  10;
+--------------------+----------------------+
| name               | dist                 |
+--------------------+----------------------+
| Blue Coffee Cafe   |  0.00473103443182092 |
| Amelia Cafe        | 0.013825134250907745 |
| Serrano's Delicafe | 0.013472792849827055 |
| Blend              | 0.009123578862847042 |
+--------------------+----------------------+
4 rows in set (0.09 sec)

First, I have selected the ZIP code boundaries into MySQL variable (I could have used subquery, in MySQL 5.6 the performance will be very similar; this is a little bit outside of the current blog post topic, so I will not compare the 2 methods here).

Second I’ve used this variable to find all point which will fit into our boundaries and filter by ”amenity”=>”cafe”. I have to use like ‘%..%’ here, but I’m relying on the spatial index here. Explain plan:

mysql> explain SELECT name, st_distance(shape, centroid(@shape) ) as dist  FROM osm.points   WHERE st_within(shape, @shape)  and other_tags like '%"amenity"=>"cafe"%'  limit  10\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: points
         type: range
possible_keys: SHAPE
          key: SHAPE
      key_len: 34
          ref: NULL
         rows: 10
        Extra: Using where

Conclusion

Using open source spatial data is a great way to enrich your application and add new features. You can store this data in MySQL so the application will be able to perform a join to the existing data. For example, if you store ZIP code for a user you can use OpenStreetMap data to show the appropriate content for this user. I will also provide more examples in my upcoming Talk @ Percona Live 2014 as well as share it in this blog in a future post.

I’ve also created a Public Amazon AMI: GIS-MySQL-Ubuntu – ami-ddfdf5b4. The AMI has the ZIP code and OSM data in MySQL 5.6 as well as the GDAL server installed (under /data, mounted on EBS). Please feel free to give it a try. As always I appreciate any comments/questions/thoughts/etc.

The post Creating GEO-enabled applications with MySQL 5.6 appeared first on MySQL Performance Blog.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com