Sep
27
2018
--

Dropbox overhauls internal search to improve speed and accuracy

Over the last several months, Dropbox has been undertaking an overhaul of its internal search engine for the first time since 2015. Today, the company announced that the new version, dubbed Nautilus, is ready for the world. The latest search tool takes advantage of a new architecture powered by machine learning to help pinpoint the exact piece of content a user is looking for.

While an individual user may have a much smaller body of documents to search across than the World Wide Web, the paradox of enterprise search says that the fewer documents you have, the harder it is to locate the correct one. Yet Dropbox faces of a host of additional challenges when it comes to search. It has more than 500 million users and hundreds of billions of documents, making finding the correct piece for a particular user even more difficult. The company had to take all of this into consideration when it was rebuilding its internal search engine.

One way for the search team to attack a problem of this scale was to put machine learning to bear on it, but it required more than an underlying level of intelligence to make this work. It also required completely rethinking the entire search tool from an architectural level.

That meant separating two main pieces of the system, indexing and serving. The indexing piece is crucial of course in any search engine. A system of this size and scope needs a fast indexing engine to cover the number of documents in a whirl of changing content. This is the piece that’s hidden behind the scenes. The serving side of the equation is what end users see when they query the search engine, and the system generates a set of results.

Nautilus Architecture Diagram: Dropbox

Dropbox described the indexing system in a blog post announcing the new search engine: “The role of the indexing pipeline is to process file and user activity, extract content and metadata out of it, and create a search index.” They added that the easiest way to index a corpus of documents would be to just keep checking and iterating, but that couldn’t keep up with a system this large and complex, especially one that is focused on a unique set of content for each user (or group of users in the business tool).

They account for that in a couple of ways. They create offline builds every few days, but they also watch as users interact with their content and try to learn from that. As that happens, Dropbox creates what it calls “index mutations,” which they merge with the running indexes from the offline builds to help provide ever more accurate results.

The indexing process has to take into account the textual content assuming it’s a document, but it also has to look at the underlying metadata as a clue to the content. They use this information to feed a retrieval engine, whose job is to find as many documents as it can, as fast it can and worry about accuracy later.

It has to make sure it checks all of the repositories. For instance, Dropbox Paper is a separate repository, so the answer could be found there. It also has to take into account the access-level security, only displaying content that the person querying has the right to access.

Once it has a set of possible results, it uses machine learning to pinpoint the correct content. “The ranking engine is powered by a [machine learning] model that outputs a score for each document based on a variety of signals. Some signals measure the relevance of the document to the query (e.g., BM25), while others measure the relevance of the document to the user at the current moment in time,” they explained in the blog post.

After the system has a list of potential candidates, it ranks them and displays the results for the end user in the search interface, but a lot of work goes into that from the moment the user types the query until it displays a set of potential files. This new system is designed to make that process as fast and accurate as possible.

Jul
25
2018
--

Google brings its search technology to the enterprise

One of Google’s first hardware products was its search appliance, a custom-built server that allowed businesses to bring Google’s search tools to the data behind their firewalls. That appliance is no more, but Google today announced the spiritual successor to it with an update to Cloud Search. Until today, Cloud Search only indexed G Suite data. Now, it can pull in data from a variety of third-party services that can run on-premise or in the cloud, making the tool far more useful for large businesses that want to make all of their data searchable by their employees.

“We are essentially taking all of Google expertise in search and are applying it to your enterprise content,” Google said.

One of the launch customers for this new service is Whirlpool, which built its own search portal and indexed more than 12 million documents from more than a dozen services using this new service.

“This is about giving employees access to all the information from across the enterprise, even if it’s traditionally siloed data, whether that’s in a database or a legacy productivity tool and make all of that available in a single index,” Google explained.

To enable this functionality, Google is making a number of software adapters available that will bridge the gap between these third-party services and Cloud Search. Over time, Google wants to add support for more services and bring this cloud-based technology on par with what its search appliance was once capable of.

The service is now rolling out to a select number of users. Over time, it’ll become available to both G Suite users and as a standalone version.

Jul
10
2018
--

Box acquires Butter.ai to make search smarter

Box announced today that it has acquired Butter.ai, a startup that helps customers search for content intelligently in the cloud. The terms of the deal were not disclosed, but the Butter.AI team will be joining Box.

Butter.AI was started by two ex-Evernote employees, Jack Hirsch and Adam Walz. The company was partly funded by Evernote founder and former CEO Phil Libin’s Turtle Studios. The latter is a firm established with a mission to use machine learning to solve real business problems like finding the right document wherever it is.

Box has been adding intelligence to its platform for some time, and this acquisition brings the Butter.AI team on board and gives them more machine learning and artificial intelligence known-how while helping to enhance search inside of the Box product.

“The team from Butter.ai will help Box to bring more intelligence to our Search capabilities, enabling Box’s 85,000 customers to more easily navigate through their unstructured information — making searching for files in Box more contextualized, predictive and personalized,” Box’s Jeetu Patel wrote in a blog post announcing the acquisition.

That means taking into account the context of the search and delivering documents that make sense given your role and how you work. For instance, if you are a salesperson and you search for a contract, you probably want a sales contract and not one for a freelancer or business partnership.

For Butter, the chance to have access to all those customers was too good to pass up. “We started Butter.ai to build the best way to find documents at work. As it turns out, Box has 85,000 customers who all need instant access to their content. Joining Box means we get to build on our original mission faster and at a massive scale,” company CEO and co-founder Jack Hirsch said.

The company launched in September 2017, and up until now it has acted as a search assistant inside Slack you can call upon to search for documents and find them wherever they live in the cloud. The company will be winding down that product as it becomes part of the Box team.

As is often the case in these deals, the two companies have been working closely together and it made sense for Box to bring the Butter.AI team into the fold where it can put its technology to bear on the Box platform.

“After launching in September 2017 our customers were loud and clear about wanting us to integrate with Box and we quickly delivered. Since then, our relationship with Box has deepened and now we get to build on our vision for a MUCH larger audience as part of the Box team,” the founders wrote in a Medium post announcing the deal.

The company raised $3.3 million over two seed rounds. Investors included Slack and General Catalyst.

Mar
06
2018
--

Lucidworks launches site search as a service tool

Lucidworks has been helping large organizations like Reddit with complex content build search tools that reach across massive content stores, but the company wanted to make the underlying search technology available to a wider market. Today, it released Lucidworks Site Search, a cloud service that enables companies to embed Lucidworks search in any application or website with a couple of lines of code.

It’s more of a pre-packaged solution, but it still takes advantage of the same natural language processing (NLP) and machine learning as its more complex and flexible cousin. It has been tuned specifically to engage the user in your site or application, and designed to provide a quick way to narrow their search based on factors you might know about them.

CEO Will Hayes says the company wanted to take the power of Fusion search and apply to it to applications, particularly around site search. “What we have done is turn this into SaaS service as a way to consume the Fusion data,” he said. “We have been building a smart data platform and search is how you engage and ranking and relevance is how you push the best user experiences,” he added.

The approach is to make it as simple as possible to insert Lucidworks search into an application or website simply by adding a couple of lines of javascript and then connecting some data. As soon as the data sources are configured, it’s basically ready to go, he said.

The underlying artificial intelligence also monitors what it knows about the visitor to help customize the content that it surfaces for that person. “Better data experience is low hanging fruit in terms of uplift. You can always enhance that experience by providing better data. Let us crawl your content, and look at web logs and user behavior and we will start displaying better content for your users.”

In terms of privacy especially in light of the upcoming GDPR regulations in the EU, Hayes says his company has been working with enterprise companies for some time, who have needed to do things like isolate personally identifiable information (PII) and enforce policies around geography, so they are ready for that as anyone.

Hayes says this just the first of many tools it plans to roll out in the future built on top of the Lucidworks platform.

Mar
06
2018
--

Lucidworks launches site search as a service tool

 Lucidworks has been helping large organizations like Reddit with complex content build search tools that reach across massive content stores, but the company wanted to make the underlying search technology available to a wider market. Today, it released Lucidworks Site Search, a cloud service that enables companies to embed Lucidworks search in any application or website with a couple of lines… Read More

Sep
07
2017
--

Reddit teams with Lucidworks to build new search framework

 Reddit revealed today that it has teamed with Lucidworks to provide a long-needed, modern search tool for the immensely popular online discussion platform. When you face the kind of scale that Reddit does with over 300 million monthly active users generating 5 million comments and a staggering 40 million searches every day across a more than a million communities, it’s a daunting task… Read More

May
03
2017
--

Slack beefs up its search to find the right person to ask a question

 Slack wants to become a kind of lexicon of information, with everything easily accessible, and it’s starting to do that today with a big update to its search function. Read More

Oct
26
2016
--

Elastic brings order to its product line with Elastic Stack

Woman with reflection of search button in her glasses. For the last several years, Elastic has offered a range of analytics and visualization tools to go with its open source search engine. Today, it announced it was pulling those pieces together into an integrated stack. The new product known as Elastic Stack includes all of the company’s products: Elasticsearch, Kibana, Logstash and Beats. It’s available for download or as part of… Read More

Jul
12
2016
--

Google aqui-hires deep search engine Kifi to enhance its Spaces group chat app

spaces google Google has made another small acquisition to help it continue building out its latest efforts in social apps. The search and Android giant has hired the team behind Kifi, a startup that was building extensions to collect and search links shared in social apps, as well as provide recommendations for further links — such as this tool, Kifi for Twitter. Terms of the deal are not… Read More

Mar
09
2016
--

Cisco bringing advanced search to Spark platform with Synata acquisition

Cisco logo in browser. Cisco announced it has purchased Synata, a search startup that allows users to search across on-premise or cloud repositories simultaneously. Cisco plans to integrate the Synata technology into its Spark communications platform (which should not to be confused with Apache Spark). While Cisco usually reveals purchase prices, because it was under $100 million, it chose to keep this one… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com