Nov
13
2013
--

MySQL & Sphinx: Register now for Nov. 20 Webinar

How to Optimally Configure Sphinx Search for MySQL. Webinar: Nov. 20, 2013Sphinx is a free, open-source search server that integrates nicely with MySQL. It provides a fast, scalable, and pluggable search framework. The Sphinx engine possesses a variety of tools enabling you to customize how searching/indexing interacts with or becomes a part of your environment.

Join me and Sphinx Search CEO/CTO Andrew Aksyonoff, the founder and creative force behind Sphinx, on Wednesday, November 20th at 10 a.m. PST as we discuss how to get started with Sphinx and seamlessly integrate it into your applications and MySQL. The title of our webinar is, “How to Optimally Configure Sphinx Search for MySQL” and you can register here.

Attendees of this webinar will also get an understanding of tuning and optimization basics, tips for high availability, and how Sphinx scales.

Feel free to ask questions in advance here in the comments section. The webinar will be recorded and available for replay here shortly afterward.

We hope to see you on Wednesday!

The post MySQL & Sphinx: Register now for Nov. 20 Webinar appeared first on MySQL Performance Blog.

Jun
19
2013
--

What technologies are you running alongside MySQL?

What technologies are you running alongside MySQL?In many environments MySQL is not the only technology used to store in-process data.

Quite frequently, especially with large-scale or complicated applications, we use MySQL alongside other technologies for certain tasks of reporting, caching as well as main data-store for portions of application.

What technologies for data storage and processing do you use alongside MySQL in your environment? Please feel free to elaborate in the comments about your use case and experiences!

Note: There is a poll embedded within this post, please visit the site to participate in this post’s poll.

The post What technologies are you running alongside MySQL? appeared first on MySQL Performance Blog.

Jan
16
2013
--

Sphinx search performance optimization: multi-threaded search

Queries in MySQL, Sphinx and many other database or search engines are typically single-threaded. That is when you issue a single query on your brand new r910 with 32 CPU cores and 16 disks, the maximum that is going to be used to process this query at any given point is 1 CPU core and 1 disk. In fact, only one or the other.

Seriously, if query is CPU intensive, it is only going to be using 3% of the available CPU capacity (for the same 32-core machine). If disk IO intensive – 6% of the available IO capacity (for the 16-disk RAID10 or RAID0 for that matter).

Let me put it another way. If your MySQL or Sphinx query takes 10s to run on a machine with a single CPU core and single disk, putting it on a machine with 32 such cores and 16 such disks will not make it any better.

But you knew this already. Question is – can you do something about it?

In case of Sphinx – indeed you can! And with very little effort. In fact, it does not require any changes to your application or database whatsoever, it is only a matter of small changes to the sphinx configuration.

The Plan

First of all, let me explain what we are trying to achieve here.

Sphinx has the ability to do distributed search out of the box – it was designed to scale out that way very early on. If your sphinx index does not fit to one machine, you would index different parts of it from different machines and then you would have an aggregator node that receives the request from application, issues search requests to all data nodes in parallel, merges results from all of the data nodes and returns results back to the application as if it was just one server serving the request in the first place.

Well, guess what – you can actually utilize this feature to your advantage even if your data can easily fit into one machine and all you want is your queries to be many times faster. Even more so, Sphinx now supports this out of the box, so you don’t need to pretend you are querying remote nodes.

Also, you get a bonus: once you configure server for distributed search, you can do indexing in parallel too!

Word of caution: while this technique will improve most types of search queries, there are some that aren’t going to benefit greatly from parallel execution. The reason is that while search is done on data nodes (even if local ones) and in parallel, merging of results is done by the aggregator and therefore it is single-threaded. Merging includes some CPU-intensive operations such as ranking, ordering or even COUNT with GROUP BY and if data nodes return large amounts of data to post-process, aggregator may well become a bottle-neck due to its single-threaded nature.

This is however easy to check – look at your Sphinx query log and look at the number of results matched per query – this should give you a clue.

Execution

Let us assume you have this one index on one server with the following basic configuration (many irrelevant details omitted):

source src1
{
	type = mysql
	sql_query = SELECT id, text FROM table
}

index idx1
{
	type = plain
	source = src1
}

searchd
{
	dist_threads = 0 # default
}

And now we want it to utilize 3 CPU cores and/or disks on a local machine for this index of ours – idx1. Here’s how we would change the configuration:

source src1
{
	type = mysql
	sql_query = SELECT id, text FROM table
}

source src1p0 : src1
{
	sql_query = SELECT id, text FROM table WHERE id % 3 = 0;
}

source src1p1 : src1
{
	sql_query = SELECT id, text FROM table WHERE id % 3 = 1;
}

source src1p2 : src1
{
	sql_query = SELECT id, text FROM table WHERE id % 3 = 2;
}

index idx1_template
{
	type = plain
	source = src1
}

index idx1p0 : idx1_template
{
	source = src1p0
}

index idx1p1 : idx1_template
{
	source = src1p1
}

index idx1p2 : idx1_template
{
	source = src1p2
}

index idx1
{
	type = distributed
	local = idx1p0
	local = idx1p1
	local = idx1p2
}

searchd
{
	dist_threads = 3
}

And you’re done. Of course, you need to reindex all of the indexes, but you can now do it in parallel – just run a separate indexer for every index idx1p0 through idx1p2.

BTW, using div operator is not necessarily the best way to split data, you may want these to be ranges by using a helper table in MySQL to define those ranges, mixing it with sql_query_range or something else, depending on how your data looks like.

Finishing line

I always loved how Sphinx scales out easily with as many machines you need and have been running it this way for many years now, however I think I don’t utilize this feature to make queries even faster on a one-machine show nearly as often as I should. Well, it’s not like it is slow or anything, but queries are never too fast, are they? :)

The post Sphinx search performance optimization: multi-threaded search appeared first on MySQL Performance Blog.

Jan
15
2013
--

Sphinx search performance optimization: attribute-based filters

One of the most common causes of a poor Sphinx search performance I find our customers face is misuse of search filters. In this article I will cover how Sphinx attributes (which are normally used for filtering) work, when they are a good idea to use and what to do when they are not, but you still want to take advantage of otherwise superb Sphinx performance.

The Problem

While Sphinx is great for full text search, you can certainly go beyond full text search, but before you go there, it is a good idea to make sure you’re doing it the right way.

In Sphinx, columns are basically one of two kinds:

a) full text
b) attributes

Speaking in MySQL terms, Full text columns are always indexed and using the very powerful extended query syntax you can do search against certain columns only, or against all of them – the fewer results your full text query matches, the faster the query will be. That’s self-evident, I guess. Every time a keyword matches a document, Sphinx needs to resolve an appropriate document and evaluate the result. If your keywords match all of your 100M records, it is going to be a lot of work to do this. However with just a few hundred thousand records it is going to be much much faster.

Attributes on the other hand are sort of like unindexed MySQL columns. They are the extra details you may want to filter by, which is usually things like gender, status, age, group_id etc. The effect of them being unindexed is that whenever you are using attributes – it is a full scan of this attribute for all the records that matched the full text search query. For few hundred thousand of matched records, checking attributes is not going to slow down performance of queries significantly. But – and this is the misuse that I see a lot – when you are NOT doing full text search, only using attribute-based filters means a full scan of all records for that attribute.

Because attributes are not B-tree structured (and therefore are slow to work with), by default Sphinx actually stores them in memory. In fact, it requires that all attributes fit in memory or the server performance will be simply unbearable. However, that still does not mean that you can use attributes to find all the records that match group_id=10 – that query will have to check all 100M of records.

BTW internally there’s some differences between numeric attributes and character based as well as multi-value attributes (MVAs), but for the purpose of our discussion it’s enough to know that attributes are not indexed.

For example..

Now let me give you few examples so it’s not just an empty talk. For the examples below I will be using SphinxQL protocol which looks like talking to MySQL server, but it’s not. It is me talking to Sphinx server.

First of all, let us see how many records we have in this index and how long does it take to do a full scan:

sphinx> select * from catalog;
...
20 rows in set (0.70 sec)

sphinx> show meta;
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| total         | 1000     |
| total_found   | 10309972 |
| time          | 0.700    |
+---------------+----------+
3 rows in set (0.00 sec)

Note this is a real index used in production – a catalog of books, so if same query happens to give slightly different results it could be because the indexing occurred between different iterations.

If you are seeing this SphinxQL output first time, it maybe a little confusing, but let me explain. Query returned 20 rows because unless you specify an explicit LIMIT, it defaults to 20. Total says 1000 because by default query is limited to 1000 best results to process (it still searches the entire index though).

Otherwise, takeaway is that this index has 10M records and it takes 700ms to do a full scan.

Now, let us find all records that match user_id = 50:

sphinx> select * from catalog where user_id = 50;
...
20 rows in set (0.16 sec)

sphinx> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 287   |
| total_found   | 287   |
| time          | 0.155 |
+---------------+-------+
3 rows in set (0.00 sec)

Pretty bad, isn’t it? 287 records returned in 155ms. Doing the same thing in MySQL, assuming user_id is indexed and in cache, would take less than a millisecond, so it is definitely not the best use case for Sphinx.

When you have full text search keywords that match many documents (and therefore are slow), using attributes may reduce the number of results matched significantly. But not the amount of time it takes to do that:

sphinx> select * from catalog where match('Great') and user_id = 50;
...
5 rows in set (0.10 sec)

sphinx> show meta;
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| total         | 5      |
| total_found   | 5      |
| time          | 0.096  |
| keyword[0]    | great  |
| docs[0]       | 200084 |
| hits[0]       | 216948 |
+---------------+--------+
6 rows in set (0.00 sec)

Solution

Solution may not be obvious first, but you will see that it makes sense. So, the strength of Sphinx is full text search. I suggest we exploit that to get good performance on attributes that are highly selective, such as the user_id example above. In fact, I’ve been using this technique with great success for many years now.

First, I would add the following extra item to fetch when indexing the catalog:

CONCAT(‘userkey_’, user_id) userkey

And now I have an extra column in a full text index that I can use for filtering:

sphinx> select * from catalog where match('@userkey userkey_50');
...
20 rows in set (0.00 sec)

sphinx> show meta;
+---------------+------------+
| Variable_name | Value      |
+---------------+------------+
| total         | 287        |
| total_found   | 287        |
| time          | 0.000      |
| keyword[0]    | userkey_50 |
| docs[0]       | 287        |
| hits[0]       | 287        |
+---------------+------------+
6 rows in set (0.00 sec)

That looks much better and I can mix it with other search keywords:

sphinx> select * from catalog where match('Great @userkey userkey_50');
...
5 rows in set (0.01 sec)

sphinx> show meta;
+---------------+------------+
| Variable_name | Value      |
+---------------+------------+
| total         | 5          |
| total_found   | 5          |
| time          | 0.013      |
| keyword[0]    | great      |
| docs[0]       | 200088     |
| hits[0]       | 216952     |
| keyword[1]    | userkey_50 |
| docs[1]       | 287        |
| hits[1]       | 287        |
+---------------+------------+
9 rows in set (0.00 sec)

Highly selective columns only

I thought I would emphasize – while it is a neat performance optimization for highly selective attributes, this is certainly not something you would want to use for every attribute. There’s few reasons for that:

  • it does use more disk space for the index (although it’s not as bad as you might think)
  • attributes are still a good way to filter out data when your search queries don’t match many records
  • in fact, it could reduce performance of queries that otherwise match few records

Let me illustrate that. I have created another full text indexed column for a skewed boolean attribute “ancient” which identifies books published before year 1500 and after, and now I will run some different queries against the two:


sphinx> select * from catalog where ancient = 1;
| total_found   | 1499266 | 14%
| time          | 0.552   |

sphinx> select * from catalog where ancient = 0;
| total_found   | 8852086 | 86%
| time          | 0.662   |

sphinx> select * from catalog where match('ancientkey_1');
| total_found   | 1499266 |
| time          | 0.227   |

sphinx> select * from catalog where match('ancientkey_0');
| total_found   | 8852086 |
| time          | 1.309   |

sphinx> select * from catalog where match('solar');
| total_found   | 2510  |
| time          | 0.001 |

sphinx> select * from catalog where match('solar @ancientkey ancientkey_0');
| total_found   | 2176  |
| time          | 0.434 |

sphinx> select * from catalog where match('solar @ancientkey ancientkey_1');
| total_found   | 334   |
| time          | 0.077 |

sphinx> select * from catalog where match('solar') and ancient = 1;
| total_found   | 334   |
| time          | 0.002 |

sphinx> select * from catalog where match('solar') and ancient = 0;
| total_found   | 2176  |
| time          | 0.003 |

What you can see here is that while there’s very little difference when using only the check against “ancient” – it is very slow in both cases – when doing a selective search and then filtering, using attribute is orders of magnitude better than using a fulltext key.

That being said, for highly selective columns, even more so if they will be used in queries without full text search keywords, having them full text indexed is a very good way to improve such Sphinx search query performance.

The post Sphinx search performance optimization: attribute-based filters appeared first on MySQL Performance Blog.

Aug
21
2012
--

Webinar for Full Text Search Throwdown

Tomorrow, August 22 at 10:00am PDT, I’ll present a webinar called Full Text Search Throwdown.  This is a no-nonsense performance comparison of solutions for full text indexing for MySQL applications, including:

  • LIKE predicates and regular expressions
  • MyISAM FULLTEXT indexes
  • InnoDB FULLTEXT indexes
  • Apache Solr
  • Sphinx Search
  • Trigraphs

I’ll compare the performance for building indexes and querying indexes.

If you’re developing an application with text search features, this will be a very practical and informative overview of your technology options!

Register for this free webinar at http://www.percona.com/webinars/2012-08-22-full-text-search-throwdown

Sep
10
2010
--

First Sphinx Conference is Announced

The First ever Sphinx Users Conference is announced to take place in Moscow, Russia on October 24th,
which is the Sunday before Highload.ru conference, so if you’re attending that you may just drop by to this event too.
this is going to be free technically focused event, close in spirit to Percona Performance Conference, we organized back in 2009

I’m going to be giving a talk on this conference focused on using Sphinx Together and instead of MySQL. There are going to be number of talks which describe Sphinx deployment alongside with MySQL so
this conference should be interested to a lot of MySQL users.

This is going to be predominately Russian speaking event. Sphinx Founder – Andrew Aksyonoff speaks Russian and so do I. At the same time we expect some talks from English speakers as well.
Call for papers is still open and if you have something to say about Sphinx we welcome you to submit the talk !


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Apr
14
2010
--

Slides from my Sphinx talk at RIT++ 2010

While the majority of Percona gang travelled to California for the MySQL event of the year, I headed in the opposite direction to Moscow for RIT++ 2010 conference where I presented a talk on Sphinx. You can get the PDF file here – Improving MySQL-based applications performance with Sphinx.

 

 

I have been invited to talk at Open Source Data Center Conference in Nürnberg, Germany in June this year, so I hope I can meet some of you there.


Entry posted by Maciej Dobrzanski |
One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Dec
04
2009
--

Sphinx 0.9.9 is finally here.

Sphinx 0.9.9 is finally released to General Availability. The previous version 0.9.9-rc2 was released back in April so there was quite a gap. This release fixes about 40 bugs which are mostly rare or insignificant. Fixing bugs in 0.9.9 was smaller portion of work done in Sphinx during last half a year – 0.9.9-rc2 was quite scale and in fact a lot of users were using 0.9.9-x branch instead of old 0.9.8 branch in production. Most of work was done on new features in 0.9.10 as well as live updates.


Entry posted by peter |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Written by in: sphinx,Zend Developer |

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com