Oct
24
2014
--

Sundar Pichai’s Ascension At Google Could Herald New Platform Unification

Screen Shot 2014-10-24 at 4.56.41 PM In a sweeping transfer of power, Google CEO Larry Page will bequeath operational leadership of a host of core products to his lieutenant Sundar Pichai. Pichai, a well-known Google executive, will add control of Google+, search, maps, infrastructure and ads to his portfolio, which already includes Chrome’s browser and operating system efforts, Google Apps and Android. Read More

Oct
24
2014
--

Terminal’s Containers Pioneer A New Way Of Developing Apps From The Cloud

terminal Terminal, a San Francisco-based startup founded by some ex-Facebook and Google technical talent, is trying to transform the way we do software development. They’ve built system supporting containers, or ultra-fast virtual machines that will let developers write, ship and collaborate on code directly from the browser. While there are other somewhat comparable startups like Docker,… Read More

Oct
24
2014
--

IBM Earnings Reflect Just How Difficult Transformation Really Is

IBM logo on earthscape. The news came out this week that IBM had another bad quarterly report card. Unfortunately for IBM it marked the 10th straight quarter of falling revenue, indicating that maybe something’s not quite right at Big Blue. IBM’s earnings illustrate the challenge they face as they transform themselves into a cloud-centric company.
It could be that it will take some time for that… Read More

Oct
23
2014
--

MySQL 5.6 Full Text Search Throwdown: Webinar Q&A

href="http://www.percona.com/blog/wp-content/uploads/2014/10/MySQL-5.6-Full-Text-Search.jpg"> class="alignright size-medium wp-image-26465" src="http://www.percona.com/blog/wp-content/uploads/2014/10/MySQL-5.6-Full-Text-Search-300x199.jpg" alt="MySQL 5.6 Full Text Search Throwdown: Webinar Q&A" width="300" height="199" />Yesterday (Oct. 22) I gave a presentation titled “ href="http://www.percona.com/resources/mysql-webinars/mysql-56-full-text-search-throwdown" >MySQL 5.6 Full Text Search Throwdown.” If you missed it, you can still href="http://www.percona.com/resources/mysql-webinars/mysql-56-full-text-search-throwdown" >register to view the recording and my slides.

Thanks to everyone who attended, and especially to folks who asked the great questions. I answered as many as we had time for during the session, but here are all the questions with my complete answers:

Q: Does Solr automatically maintain its index against MySQL? Do you have to hit the Solr server with a specific query to keep the index ‘warm’?

There are several strategies for updating a Solr index. In my examples, I showed only a “full import” which is what you would do to create an index by reading all the source data.

You can also perform a “delta import” periodically, to add a subset of the source data to an existing index, for example to add data that has changed since the last time you updated the Solr index. See the documentation for rel="nofollow" href="http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command" rel="nofollow">Using delta-import command and also rel="nofollow" href="http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport" rel="nofollow">Using query attribute for both full and delta import.

The delta import would typically be something you would invoke from a cron job, perhaps every hour. But that means that a Solr search might not find data that has changed in MySQL more recently than the last delta import. Depending on the application, a delay of up to 60 minutes might be acceptable, or else maybe you have strict requirements that all data must be in sync instantly.

You could also update the Solr index one document at a time using its Java API or web service API. This would require you to write code in your application. Every time you INSERT or UPDATE or DELETE a document in MySQL that you want to be kept in sync with the Solr index, you would write more code to do a similar operation in the Solr index. That way every single text change would be searchable nearly immediately.

Q: Did you test Elasticsearch? (several people asked about this)

I did not test rel="nofollow" href="http://www.elasticsearch.org/overview/elasticsearch/" rel="nofollow">Elasticsearch, but according to their technology overview: “Elasticsearch uses Lucene under the covers.” So I expect that this part of Elasticsearch performs similarly to what I saw from Apache Solr, which also uses Lucene internally.

Q: One question I could not understand, how to maintain Sphinx index in sync with data? Can be it in real time?

The Sphinx Search index does not automatically refresh as your MySQL data changes. You would have to write application code to invoke the indexing process. There’s a page in the Sphinx Search documentation about rel="nofollow" href="http://sphinxsearch.com/docs/2.2.5/live-updates.html" rel="nofollow">Live Index Updates, that gives an overview of the two methods, and links to further reading.

This is definitely the most inconvenient aspect of Sphinx Search. Queries are very fast, but it’s expensive to do incremental updates to an index. So it’s ideal for indexing an archive of text that doesn’t change very frequently, but not as easy to use it for indexing rapidly-changing content.

Q: I have over 800,000 PDF documents to index (in several languages), any recommendations?

I said during the webinar that I recalled there exists tools to extract searchable text from a PDF file. I found one such project called rel="nofollow" href="https://pdfbox.apache.org/index.html" rel="nofollow">Apache PDFBox includes this capability, and they have a page describing a helper class for doing rel="nofollow" href="https://pdfbox.apache.org/cookbook/textextraction.html" rel="nofollow">PDF parsing and extraction combined with Lucene indexing. I haven’t used it myself, so I can’t comment on its performance for indexing 800,000 PDF documents, but it seems like you could write a Java program to iterate over your collection of PDF’s, and index them using this class.

Q: What is your suggestion to use Sphinx Search for single column searches?

You can use any SQL query in the sphinx.conf to define the source data to index. You can select one column, multiple columns, or even multiple columns from joined tables. The result from any SQL query you write can be used as the data source.

Q: Which modules did you use with Sphinx Search? Did you use its built-in stemmers and metaphone package, etc.?

I installed the default modules. I don’t know if there is a significant performance difference from using optional packages.

Q: What about quality of results from each solution? I remember reading an article on percona.com several months ago comparing MyISAM fulltext vs InnoDB fulltext, and there were concerns about the results from InnoDB. Did you do any testing on this?

Indeed, here’s a link to the excellent blog post by my colleague Ernie Souhrada in which he found some surprises in the results from InnoDB FTS: href="http://www.percona.com/blog/2013/03/04/innodb-full-text-search-in-mysql-5-6-part-2-the-queries/" >InnoDB Full-text Search in MySQL 5.6: Part 2, The Queries!

I was just doing some comparison for performance in the current MySQL 5.7 milestone. I didn’t compare the query results this time.

Q: Is there any full text search in Percona Server with XtraDB?

Percona Server is based on the upstream MySQL Community Edition of the respective version number. So Percona Server has the builtin FULLTEXT index types for MyISAM and InnoDB, and we have not changed this part of the code. Percona Server does not bundle Sphinx Search, but it’s not too difficult to install Sphinx Search as a complementary technology, just as you would install other packages that are commonly used parts of an application infrastructure, for example Memcached or HA-proxy.

Q: Is MySQL going to improve the built-in InnoDB FTS in the near future?

They are continuing to add features that improve FTS, for example:

  • You can now write your own plugins for fulltext parsing (that is, parsing the input data to identify “words” to index; you may have your own idea about how to split text into words).
  • Both B-tree and full-text types now uses bulk-loading to make it faster and more efficient to build the index.

I’m not aware of any work to improve the performance of fulltext queries significantly.

Q: What is the performance comparison between MyISAM and InnoDB for inline index updating?

I didn’t test performance of incremental index updates this time. I only populated my tables from the StackOverflow data using LOAD XML, and then I created fulltext indexes on the populated tables. But I generally favor moving all important data to InnoDB, and not using MyISAM tables. It’s hard to imagine that the performance of index updates would be so much better that would convince me to use MyISAM. It’s more likely that the accuracy of search results would be a good reason to use MyISAM. Even then, I’d keep the original data in InnoDB and use MyISAM only as a copy of the data, to create a disposable fulltext index.

Thanks again for attending my webinar! For more great content, please join Percona and the MySQL community at our href="http://www.percona.com/live/conferences" >conference events. The next one is href="http://www.percona.com/live/london-2014/" >Percona Live London 2014 on November 3-4. We also look forward to the href="http://www.percona.com/live/openstack-live-2015/" >Open Stack Live 2015 in Santa Clara, California April 13-14, in the same venue with href="http://www.percona.com/live/mysql-conference-2015/" >Percona Live MySQL Conference and Expo 2015, April 13-16.

Also watch more href="http://www.percona.com/resources/mysql-webinars">webinars from Percona in the future!

The post rel="nofollow" href="http://www.percona.com/blog/2014/10/23/mysql-5-6-full-text-search/">MySQL 5.6 Full Text Search Throwdown: Webinar Q&A appeared first on rel="nofollow" href="http://www.percona.com/blog">MySQL Performance Blog.

Oct
23
2014
--

Jostle Raises $2M For Its Intranet Platform

google-apps-devices If you have ever worked for a large corporation, then you have probably seen how outdated some of the intranet platforms on the market today are. the Vancouver-based startup Jostle wants to put a fresh face on intranet software. The service aims to make it easier for companies to publish news stories and announcements, host online discussions and share other… Read More

Oct
23
2014
--

Google’s DeepMind Acqui-Hires Two AI Teams In The UK, Partners With Oxford

oxford Earlier this year Google acquired DeepMind in the UK to expand the work that it is doing in artificial intelligence, and today the company announced that it is making some more significant moves to build this out even further. It is acqui-hiring the two academic teams of founders, seven people in all, behind Dark Blue Labs and Vision Factory, two deep learning startups based in the UK, and… Read More

Oct
22
2014
--

Verizon Leads Flint Mobile’s $9.4M Round As Payment Service Comes To The Web

flint Flint Mobile, a point-of-sale mobile payments solution originally built around snapping photos of your credit card instead of dongles or using other hardware to make payments, is today announcing that it has raised another $9.4 million in funding led by new, strategic investor Verizon via its Verizon Ventures arm; as well as an expansion of its service to the wider internet in a new… Read More

Oct
22
2014
--

Verizon Leads Flint Mobile’s $9.4M Round As Payment Service Comes To The Web

flint Flint Mobile, a point-of-sale mobile payments solution originally built around snapping photos of your credit card instead of dongles or using other hardware to make payments, is today announcing that it has raised another $9.4 million in funding led by new, strategic investor Verizon via its Verizon Ventures arm; as well as an expansion of its service to the wider internet in a new… Read More

Oct
22
2014
--

Progress Software Buys Telerik for $262.5M As Buying Spree Continues

canstockphoto19542244 In a deal today involving two companies with links to the Boston area, Progress Software, which helps companies build business software, announced it was buying Telerik for $262.5M. Telerik’s portfolio includes a .net toolbox, a mobile development platform and a CMS called Sitefinity, as well as access to a developer community of over 1.4M people. According to Karen Tegan… Read More

Oct
22
2014
--

Progress Software Buys Telerik for $262.5M As Buying Spree Continues

canstockphoto19542244 In a deal today involving two companies with links to the Boston area, Progress Software, which helps companies build business software, announced it was buying Telerik for $262.5M. Telerik’s portfolio includes a .net toolbox, a mobile development platform and a CMS called Sitefinity, as well as access to a developer community of over 1.4M people. According to Karen Tegan… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com