MachEye raises $4.6M for its business intelligence platform

We’ve seen our fair share of business intelligence (BI) platforms that aim to make data analysis accessible to everybody in a company. Most of them are still fairly complicated, no matter what their marketing copy says. MachEye, which is launching its AI-powered BI platform today, is offering a new twist on this genre. In addition to its official launch, the company also today announced a previously unreported $4.6 seed funding round led by Canaan Partners with participation from WestWave Capital.

MachEye is not just what its founder and CEO Ramesh Panuganty calls a “low-prep, no-prep” BI platform, but it uses natural language processing to allow anybody to query data using natural language — and it can then automatically generate interactive data stories on the fly that put the answer into context. That’s quite a different approach from its more dashboard-centric competition.

Image Credits: MachEye

“I have seen the business intelligence problems in the past,” Panuganty said. “And I saw that Traditional BI, even though it has existed for 30 or 40 years, had this paradigm of ‘what you ask is what you get.’ So the business user asks for something, either in an email, on the phone or in person, and then he gets an answer to that question back. That essentially has these challenges of being dependent on the experts and there is a time that is lost to get the answers — and then there’s a lack of exploratory capabilities for the business user. and the bigger problem is that they don’t know what they don’t know.”

Panuganty’s background includes time at Sun Microsystems and Bell Labs, working on their operating systems before becoming an entrepreneur. He built three companies over the last 12 years or so. The first was a cloud management platform, Cloud360, which was acquired by Cognizant. The second was analytics company Drastin, which got acquired by Splunk in 2017, and the third was the AI-driven educational platform SelectQ, which Thinkster acquired this April. He also holds 15 patents related to machine learning, analytics and natural language processing.

Given that track record, it’s probably no surprise why VCs wanted to invest in his new startup, too. Panuganty tells me that when he met with Canaan Partners, he wasn’t really looking for an investment. He had already talked to the team while building SelectQ, but Canaan never got to make an investment because the company got acquired before it needed to raise more funding. But after an informal meeting that ended up lasting most of the day, he received an offer the next morning.

MachEye’s approach is definitely unique. “Generating audio-visuals on enterprise data, we are probably the only company that does it,” Panuganty said. But it’s important to note that it also offers all of the usual trappings of a BI service. If you really want dashboards, you can build those, and developers can use the company’s APIs to use their data elsewhere, too. The service can pull in data from most of the standard databases and data warehousing services, including AWS Redshift, Azure Synapse, Google BigQuery, Snowflake and Oracle. The company promises that it only takes 30 minutes from connecting a data source to being able to ask questions about that data.

Interestingly, MachEye’s pricing plan is per seat and doesn’t limit how much data you can query. There’s a free plan, but without the natural search and query capabilities, an $18/month/user plan that adds those capabilities and additional search features, but it takes the enterprise plan to get the audio narrations and other advanced features. The team is able to use this pricing model because it is able to quickly spin up the container infrastructure to answer a query and then immediately shut it down again — all within about two minutes.


Outlier raises $6.2 M Series A to change how companies use data

Traditionally, companies have gathered data from a variety of sources, then used spreadsheets and dashboards to try and make sense of it all. Outlier wants to change that and deliver a handful of insights right to your inbox that matter most for your job, company and industry. Today the company announced a $6.2 million Series A to further develop that vision.

The round was led by Ridge Ventures with assistance from 11.2 Capital, First Round Capital, Homebrew, Susa Ventures and SV Angel. The company has raised over $8 million.

The startup is trying to solve a difficult problem around delivering meaningful insight without requiring the customer to ask the right questions. With traditional BI tools, you get your data and you start asking questions and seeing if the data can give you some answers. Outlier wants to bring a level of intelligence and automation by pointing out insight without having to explicitly ask the right question.

Company founder and CEO Sean Byrnes says his previous company, Flurry, helped deliver mobile analytics to customers, but in his travels meeting customers in that previous iteration, he always came up against the same question: “This is great, but what should I look for in all that data?”

It was such a compelling question that after he sold Flurry in 2014 to Yahoo for more than $200 million, that question stuck in the back of his mind and he decided to start a business to solve it. He contends that the first 15 years of BI was about getting answers to basic questions about company performance, but the next 15 will be about finding a way to get the software to ask good questions based on the huge amounts of data.

Byrnes admits that when he launched, he didn’t have much sense of how to put this notion into action, and most people he approached didn’t think it was a great idea. He says he heard “No” from a fair number of investors early on because the artificial intelligence required to fuel a solution like this really wasn’t ready in 2015 when he started the company.

He says that it took four or five iterations to get to today’s product, which lets you connect to various data sources, and using artificial intelligence and machine learning delivers a list of four or five relevant questions to the user’s email inbox that points out data you might not have noticed, what he calls “shifts below the surface.” If you’re a retailer that could be changing market conditions that signal you might want to change your production goals.

Outlier email example. Photo: Outlier

The company launched in 2015. It took some time to polish the product, but today they have 14 employees and 14 customers including Jack Rogers, Celebrity Cruises and Swarovski.

This round should allow them to continuing working to grow the company. “We feel like we hit the right product-market fit because we have customers [generating] reproducible results and really changing the way people use the data,” he said.


More Rain For Cloud Business Intelligence As Birst Raises $65M

shutterstock_171635108 Birst, a cloud-based business intelligence (BI) platform, has raised another $65 million in funding — a Series F round that CEO Jay Larson said will be “the last one” before it gears up for an IPO. “We think it will not be this year, we’re not giving specific direction,” he said. “But the combination of the size of the BI market and us, we think… Read More


Intro to OLAP

This is the first of a series of posts about business intelligence tools, particularly OLAP (or online analytical processing) tools using MySQL and other free open source software. OLAP tools are a part of the larger topic of business intelligence, a topic that has not had a lot of coverage on MPB. Because of this, I am going to start out talking about these topics in general, rather than getting right to gritty details of their performance.

I plan on covering the following topics:

  1. Introduction to OLAP and business intelligence. (this post)
  2. Identifying the differences between a data warehouse, and a data mart.
  3. Introduction to MDX queries and the kind of SQL which a ROLAP tool must generate to answer those queries.
  4. Performance challenges with larger databases, and some ways to help performance using aggregation.
  5. Using materialized views to automate that aggregation process.
  6. Comparing the performance of OLAP with and without aggregation over multiple MySQL storage engines at various data scales.

What is BI?
Chances are that you have heard the term business intelligence. Business intelligence (or BI) is a term which encompasses many different tools and methods for analyzing data, usually presenting it in a way that is easily consumed by upper management. This analysis is often used to determine how effectively the business has been at meeting certain performance goals, and to forecast how they will do in the future. To put it another way the tools are designed to provide insight about the business process, hence the name. Probably the most popular BI activity for web sites is click analysis.

As far as BI is concerned, this series of posts focuses on OLAP analysis and in a lesser sense, on data warehousing. Data warehouses often provide the information upon which OLAP analysis is performed, but more on this in post #2.

OLAP? What is that?
OLAP is an acronym which stands for online analytical processing. OLAP analysis, which is really just another name for multidimensional analysis, consists of displaying summary aggregations of the data broken down into different groups. A typical OLAP analysis might show “sale total, by year, by sales rep, by product category”. OLAP analysis is usually used for reporting on current data, looking at historical trends and trying to make predictions about future trends.

Multidimensional Analysis
Multidimensional analysis is a form of statistical analysis. In multidimensional analysis samples representing a particular measure are compared or broken down into different dimensions. For example, in a sales analysis, the “sale amount” is a measure. Measures are always aggregated values. That is, total sales might be expressed as SUM(sale_amt). This is because the SUM of the individual sales will be grouped along different dimensions, such as by year or by product. I’m getting a little ahead of myself. Before we talk about measures and dimensions, we should talk about the two ways in which this information can be stored.

There are two main ways to store multidimensional data for OLAP analysis
OLAP servers typically come in two basic flavors. Some servers have specialized data stores which store data in a form which is highly effective for multidimensional analysis. These servers are termed MOLAP and they tend to have exceptional performance due to their specialized data store. Almost all MOLAP solutions pre-compute many (or even all) of the possible answers to multi-dimensional queries. Palo is an example of an open source version of this technology. ESSbase is an example of closed source product. MOLAP servers often feature extensive compression of data which can improve performance. Loading data into a MOLAP server usually takes a very long time because many of the answers in the cube must be calculated. The extra time spent during the load is usually called “processing” time.

A relational OLAP (or ROLAP) server uses data stored in an RDBMS. These systems trade the performance of a multidimensional store for the convenience of an RDBMS. These servers almost always query over a database which is structured as a STAR or snowflake type schema. To go back to the sales analysis example above, in a STAR schema the facts about the sales would be stored in the fact table, and the list of customers and products would be stored in separate dimension tables. Some ROLAP servers support the aggregation of data into additional tables, and can use the tables automatically. These servers can approach the performance of MOLAP with the convenience of ROLAP, but there are still challenges with this approach. The biggest challenges are the amount of time that it takes to keep the tables updated and in the complexity of the many scripts or jobs which might be necessary to keep the tables in sync. Part five of my series will introduce materialized views which attempt to address these challenges in a manageable way.

What makes a ROLAP so great?
An OLAP server usually returns information to the user as a ‘pivot table‘ or ‘pivot report’. While you could create such a report in a spreadsheet, the ROLAP tool is designed to deal with millions or even billions of rows of data, much more than a spreadsheet can usually handle. MOLAP servers usually require that all, or almost all of the data must fit it memory. Another difference is the ease by which this analysis is constructed. You don’t necessarily have to write queries or drag and drop a report together in order to analyze multidimensional data using an OLAP tool.

Data before pivoting:
Example image from Wikimedia commons showing detail data for sales

Data summarized in pivot form:
Wikimedia commons image showing data summarized in pivot format

ROLAP tools use star schema
As I said before, a sale amount would be considered a measure, and it would usually be aggregated with SUM. The other information about the sale, such as the product, when it was sold and to whom it was sold would be defined in dimension tables. The fact table contains columns which are joined to the dimension tables, such as product_id and customer_id. These are often defined as foreign keys from the fact table to the dimension tables.

A note about degenerate dimensions:
Any values in the fact table that don’t join to dimensions are either considered degenerate dimensions or measures. In the example below the status of the order is a degenerate dimension. A degenerate dimension is stored as an ENUM in many cases. In the example below that there is no actual dimension table which includes the two different order statuses. Such a dimension would add an extra join, which is expensive. Any yes/no field and/or fields with a very low cardinality (such as gender or order status) will probably be stored in the fact table instead of in a dedicated dimension. In the “pivot data” example above, all the dimensions are degenerate: gender, region, style, date.

Star schema with degenerate dimension

Example star schema about sales.

Often a dimension will include redundant information to make reporting easier, a process called “denormalization”. Hierarchical information may be stored in a single dimension. For example, a dimension for products may include both the category AND a sub-category. A time dimension includes year, month and quarter. You can create multiple different hierarchies from a single dimension. This allows ‘drill down’ into the dimension. By default the data would be summarized by year, but you can drill down to quarter or month level aggregation.
Sample date hierarchy, showing quarter, month, year and day hierarchies.

The screenshots here in the jPivot (an OLAP cube browser) documentation can give you a better idea about the display of data. The examples break down sales by product, by category, and by region.

The information is presented in such a fashion that it can be “drilled into” and “filtered on” to provide an easy to use interface to the underlying data. Graphical display of the data as pie, line or bar charts is possible.

Focusing on ROLAP.
This is the MySQL performance blog, and as such an in depth discussion of MOLAP technology is not particularly warranted here. Our discussion will focus on Mondrian. Mondrian is an open source ROLAP server featuring an in-memory OLAP cache. Mondrian is part of the Pentaho open source business intelligence suite. Mondrian is also used by other projects such as Wabit and Jaspersoft. If you are using open source BI then you are probably already using Mondrian. Closed source ROLAP servers include Microstrategy, Microsoft Analysis Services and Oracle BI.

Mondrian speaks MDX, olap4j and XML for analysis. This means that there is a very high chance that your existing BI tools (if you have them) will work with it. MDX is a query language that looks similar to SQL but is actually very different. Olap4j is an OLAP interface for java applications. XML for analysis (XMLA) is an industry standard analytical interface originally created by Microsoft, SAS and Hyperion.

Whats next?
Next we’ll talk about the difference between data marts and data warehouses. The former are usually used for OLAP analysis, but they can be fundamentally related to a warehouse.

Entry posted by Justin Swanhart |
No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com