Nov
26
2019
--

New Amazon capabilities put machine learning in reach of more developers

Today, Amazon announced a new approach that it says will put machine learning technology in reach of more developers and line of business users. Amazon has been making a flurry of announcements ahead of its re:Invent customer conference next week in Las Vegas.

While the company offers plenty of tools for data scientists to build machine learning models and to process, store and visualize data, it wants to put that capability directly in the hands of developers with the help of the popular database query language, SQL.

By taking advantage of tools like Amazon QuickSight, Aurora and Athena in combination with SQL queries, developers can have much more direct access to machine learning models and underlying data without any additional coding, says VP of artificial intelligence at AWS, Matt Wood.

“This announcement is all about making it easier for developers to add machine learning predictions to their products and their processes by integrating those predictions directly with their databases,” Wood told TechCrunch.

For starters, Wood says developers can take advantage of Aurora, the company’s MySQL (and Postgres)-compatible database to build a simple SQL query into an application, which will automatically pull the data into the application and run whatever machine learning model the developer associates with it.

The second piece involves Athena, the company’s serverless query service. As with Aurora, developers can write a SQL query — in this case, against any data store — and based on a machine learning model they choose, return a set of data for use in an application.

The final piece is QuickSight, which is Amazon’s data visualization tool. Using one of the other tools to return some set of data, developers can use that data to create visualizations based on it inside whatever application they are creating.

“By making sophisticated ML predictions more easily available through SQL queries and dashboards, the changes we’re announcing today help to make ML more usable and accessible to database developers and business analysts. Now anyone who can write SQL can make — and importantly use — predictions in their applications without any custom code,” Amazon’s Matt Asay wrote in a blog post announcing these new capabilities.

Asay added that this approach is far easier than what developers had to do in the past to achieve this. “There is often a large amount of fiddly, manual work required to take these predictions and make them part of a broader application, process or analytics dashboard,” he wrote.

As an example, Wood offers a lead-scoring model you might use to pick the most likely sales targets to convert. “Today, in order to do lead scoring you have to go off and wire up all these pieces together in order to be able to get the predictions into the application,” he said. With this new capability, you can get there much faster.

“Now, as a developer I can just say that I have this lead scoring model which is deployed in SageMaker, and all I have to do is write literally one SQL statement that I do all day long into Aurora, and I can start getting back that lead scoring information. And then I just display it in my application and away I go,” Wood explained.

As for the machine learning models, these can come pre-built from Amazon, be developed by an in-house data science team or purchased in a machine learning model marketplace on Amazon, says Wood.

Today’s announcements from Amazon are designed to simplify machine learning and data access, and reduce the amount of coding to get from query to answer faster.

Nov
04
2019
--

Microsoft’s Azure Synapse Analytics bridges the gap between data lakes and warehouses

At its annual Ignite conference in Orlando, Fla., Microsoft today announced a major new Azure service for enterprises: Azure Synapse Analytics, which Microsoft describes as “the next evolution of Azure SQL Data Warehouse.” Like SQL Data Warehouse, it aims to bridge the gap between data warehouses and data lakes, which are often completely separate. Synapse also taps into a wide variety of other Microsoft services, including Power BI and Azure Machine Learning, as well as a partner ecosystem that includes Databricks, Informatica, Accenture, Talend, Attunity, Pragmatic Works and Adatis. It’s also integrated with Apache Spark.

The idea here is that Synapse allows anybody working with data in those disparate places to manage and analyze it from within a single service. It can be used to analyze relational and unstructured data, using standard SQL.

Screen Shot 2019 10 31 at 10.11.48 AM

Microsoft also highlights Synapse’s integration with Power BI, its easy to use business intelligence and reporting tool, as well as Azure Machine Learning for building models.

With the Azure Synapse studio, the service provides data professionals with a single workspace for prepping and managing their data, as well as for their big data and AI tasks. There’s also a code-free environment for managing data pipelines.

As Microsoft stresses, businesses that want to adopt Synapse can continue to use their existing workloads in production with Synapse and automatically get all of the benefits of the service. “Businesses can put their data to work much more quickly, productively, and securely, pulling together insights from all data sources, data warehouses, and big data analytics systems,” writes Microsoft CVP of Azure Data, Rohan Kumar.

In a demo at Ignite, Kumar also benchmarked Synapse against Google’s BigQuery. Synapse ran the same query over a petabyte of data in 75% less time. He also noted that Synapse can handle thousands of concurrent users — unlike some of Microsoft’s competitors.

Jun
12
2019
--

Apollo raises $22M for its GraphQL platform

Apollo, a San Francisco-based startup that provides a number of developer and operator tools and services around the GraphQL query language, today announced that it has raised a $22 million growth funding round co-led by Andreessen Horowitz and Matrix Partners. Existing investors Trinity Ventures and Webb Investment Network also participated in this round.

Today, Apollo is probably the biggest player in the GraphQL ecosystem. At its core, the company’s services allow businesses to use the Facebook -incubated GraphQL technology to shield their developers from the patchwork of legacy APIs and databases as they look to modernize their technology stacks. The team argues that while REST APIs that talked directly to other services and databases still made sense a few years ago, it doesn’t anymore now that the number of API endpoints keeps increasing rapidly.

Apollo replaces this with what it calls the Data Graph. “There is basically a missing piece where we think about how people build apps today, which is the piece that connects the billions of devices out there,” Apollo co-founder and CEO Geoff Schmidt told me. “You probably don’t just have one app anymore, you probably have three, for the web, iOS and Android . Or maybe six. And if you’re a two-sided marketplace you’ve got one for buyers, one for sellers and another for your ops team.”

Managing the interfaces between all of these apps quickly becomes complicated and means you have to write a lot of custom code for every new feature. The promise of the Data Graph is that developers can use GraphQL to query the data in the graph and move on, all without having to write the boilerplate code that typically slows them down. At the same time, the ops teams can use the Graph to enforce access policies and implement other security features.

“If you think about it, there’s a lot of analogies to what happened with relational databases in the ’80s,” Schmidt said. “There is a need for a new layer in the stack. Previously, your query planner was a human being, not a piece of software, and a relational database is a piece of software that would just give you a database. And you needed a way to query that database, and that syntax was called SQL.”

Geoff Schmidt, Apollo CEO, and Matt DeBergalis, CTO

GraphQL itself, of course, is open source. Apollo is now building a lot of the proprietary tools around this idea of the Data Graph that make it useful for businesses. There’s a cloud-hosted graph manager, for example, that lets you track your schema, as well as a dashboard to track performance, as well as integrations with continuous integration services. “It’s basically a set of services that keep track of the metadata about your graph and help you manage the configuration of your graph and all the workflows and processes around it,” Schmidt said.

The development of Apollo didn’t come out of nowhere. The founders previously launched Meteor, a framework and set of hosted services that allowed developers to write their apps in JavaScript, both on the front-end and back-end. Meteor was tightly coupled to MongoDB, though, which worked well for some use cases but also held the platform back in the long run. With Apollo, the team decided to go in the opposite direction and instead build a platform that makes being database agnostic the core of its value proposition.

The company also recently launched Apollo Federation, which makes it easier for businesses to work with a distributed graph. Sometimes, after all, your data lives in lots of different places. Federation allows for a distributed architecture that combines all of the different data sources into a single schema that developers can then query.

Schmidt tells me the company started to get some serious traction last year and by December, it was getting calls from VCs that heard from their portfolio companies that they were using Apollo.

The company plans to use the new funding to build out its technology to scale its field team to support the enterprises that bet on its technology, including the open-source technologies that power both the services.

“I see the Data Graph as a core new layer of the stack, just like we as an industry invested in the relational database for decades, making it better and better,” Schmidt said. “We’re still finding new uses for SQL and that relational database model. I think the Data Graph is going to be the same way.”

Jun
10
2019
--

Qubole launches Quantum, its serverless database engine

Qubole, the data platform founded by Apache Hive creator and former head of Facebook’s Data Infrastructure team Ashish Thusoo, today announced the launch of Quantum, its first serverless offering.

Qubole may not necessarily be a household name, but its customers include the likes of Autodesk, Comcast, Lyft, Nextdoor and Zillow . For these users, Qubole has long offered a self-service platform that allowed their data scientists and engineers to build their AI, machine learning and analytics workflows on the public cloud of their choice. The platform sits on top of open-source technologies like Apache Spark, Presto and Kafka, for example.

Typically, enterprises have to provision a considerable amount of resources to give these platforms the resources they need. These resources often go unused and the infrastructure can quickly become complex.

Qubole already abstracts most of this away, offering what is essentially a serverless platform. With Quantum, however, it is going a step further by launching a high-performance serverless SQL engine that allows users to query petabytes of data with nothing else but ANSI-SQL, giving them the choice between using a Presto cluster or a serverless SQL engine to run their queries, for example.

The data can be stored on AWS and users won’t have to set up a second data lake or move their data to another platform to use the SQL engine. Quantum automatically scales up or down as needed, of course, and users can still work with the same metastore for their data, no matter whether they choose the clustered or serverless option. Indeed, Quantum is essentially just another SQL engine without Qubole’s overall suite of engines.

Typically, Qubole charges enterprises by compute minutes. When using Quantum, the company uses the same metric, but enterprises pay for the execution time of the query. “So instead of the Qubole compute units being associated with the number of minutes the cluster was up and running, it is associated with the Qubole compute units consumed by that particular query or that particular workload, which is even more fine-grained,” Thusoo explained. “This works really well when you have to do interactive workloads.”

Thusoo notes that Quantum is targeted at analysts who often need to perform interactive queries on data stored in object stores. Qubole integrates with services like Tableau and Looker (which Google is now in the process of acquiring). “They suddenly get access to very elastic compute capacity, but they are able to come through a very familiar user interface,” Thusoo noted.

 

May
02
2019
--

Microsoft brings Azure SQL Database to the edge (and Arm)

Microsoft today announced an interesting update to its database lineup with the preview of Azure SQL Database Edge, a new tool that brings the same database engine that powers Azure SQL Database in the cloud to edge computing devices, including, for the first time, Arm-based machines.

Azure SQL Edge, Azure corporate vice president Julia White writes in today’s announcement, “brings to the edge the same performant, secure and easy to manage SQL engine that our customers love in Azure SQL Database and SQL Server.”

The new service, which will also run on x64-based devices and edge gateways, promises to bring low-latency analytics to edge devices as it allows users to work with streaming data and time-series data, combined with the built-in machine learning capabilities of Azure SQL Database. Like its larger brethren, Azure SQL Database Edge will also support graph data and comes with the same security and encryption features that can, for example, protect the data at rest and in motion, something that’s especially important for an edge device.

As White rightly notes, this also ensures that developers only have to write an application once and then deploy it to platforms that feature Azure SQL Database, good old SQL Server on premises and this new edge version.

SQL Database Edge can run in both connected and fully disconnected fashion, something that’s also important for many use cases where connectivity isn’t always a given, yet where users need the kind of data analytics capabilities to keep their businesses (or drilling platforms, or cruise ships) running.

Aug
29
2017
--

Salesforce is using AI to democratize SQL so anyone can query databases in natural language

 SQL is about as easy as it gets in the world of programming, and yet its learning curve is still steep enough to prevent many people from interacting with relational databases. Salesforce’s AI research team took it upon itself to explore how machine learning might be able to open doors for those without knowledge of SQL. Read More

May
16
2017
--

With version 2.0, Crate.io’s database tools put an emphasis on IoT

 Crate.io, the winner of our Disrupt Europe 2014 Battlefield, is launching version 2.0 of its CrateDB database today. The tool, which is available in both an open source and enterprise version, started out as a general-purpose but highly scalable SQL database. Over time, though, the team found that many of its customers were using the service for managing their machine data. Read More

Mar
27
2017
--

What’s Next for SQL Databases?

SQL Databases

SQL DatabasesIn this blog, I’ll go over my thoughts on what we can expect in the world of SQL databases.

After reading Baron’s prediction on databases, here:

https://www.xaprb.com/blog/defining-moments-in-database-history/

I want to provide my own view on what’s coming up next for SQL databases. I think we live in interesting times, when we can see the beginning of the next-generation of RDBMSs.

There are defining characteristics of such databases:

  1. Auto-scaling. The ability to add and use resources depending on the current load and database size. This is done transparently for users and DBAs.
  2. Auto-healing. The automatic handling of node failures.
  3. Multi-regional, cloud-agnostic, geo-distributed. The ability to support multiple data centers and multiple clouds, in different parts of the world.
  4. Transactional. All the above, with the ability to support multi-statements transactional workloads.
  5. Strong consistency. The full definition of strong consistency is pretty involved. For simplicity, let’s say it means that reads (in the absence of ongoing writes) will return the same data, despite what region or data center you are getting it from. A simple counter-example is the famous MySQL asynchronous replication, where (with the slave delay) reading the data on a slave can return very outdated data. I am focusing on reads, because in a distributed environment the consistent reads performance will be affected. This is where network latency (often limited by the speed of light) will define performance.
  6. SQL language. SQL, despite being old and widely criticized, is not going anywhere. This is a universal language for app developers to access data.

With this, I see following interesting projects:

  • Google Cloud Spanner (https://cloud.google.com/spanner/). Recently announced and still in the Beta stage. Definitely an interesting projects, with the obvious limitation of running only in Google Cloud.
  • FaunaDB (https://fauna.com/). Also very recently announced, so it is hard to say how it performs. The major downside I see is that it does not provide SQL access, but uses a custom language.
  • Two open source projects:
    • CockroachDB (https://www.cockroachlabs.com/). This is still in the Beta stage, but definitely an interesting project to follow. Initially, the project planned to support only key-value access, but later they made a very smart decision to provide SQL access via a PostgreSQL-compatible protocol.
    • TiDB (https://github.com/pingcap/tidb). Right now in RC stages, and the target is to provide SQL access over a MySQL compatible protocol (and later PostgreSQL protocol).

Protocol compatibility is a wise approach, although not strictly necessary. It lowers an entry barrier for the existing applications.

Both CockroachDB and TiDB, at the moment of this writing, still have rough edges and can’t be used in serious deployments (from my experience). I expect both projects will make a big progress in 2017.

What shared characteristics can we expect from these systems?

As I mentioned above, we may see that the read performance is degraded (as latency increases), and often it will be defined more by network performance than anything else. Storage IO and CPU cycles will be secondary factors. There will be more work on how to understand and tune the network traffic.

We may need to get used to the fact that point or small range selects become much slower. Right now, we see very fast point selects for traditional RDBM (MySQL, PostgreSQL, etc.).

Heavy writes will be problematic. The problem is that all writes will need to go through the consistency protocol. Write-optimized storage engines will help (both CockroachDB and TiDB use RocksDB in the storage layer).

The long transactions (let’s say changing 100000 or more rows) also will be problematic. There is just too much network round-trips and housekeeping work on each node, making long transactions an issue for distributed systems.

Another shared property (at least between CockroachDB and TiDB) is the active use of the Raft protocol to achieve consistency. So it will be important to understand how this protocol works to use it effectively. You can find a good overview of the Raft protocol here: http://container-solutions.com/raft-explained-part-1-the-consenus-problem/.

There probably are more NewSQL technologies than I have mentioned here, but I do not think any of them captured critical market- or mind-share. So we are at the beginning of interesting times . . .

What about MySQL? Can MySQL become the database that provides all these characteristics? It is possible, but I do not think it will happen anytime soon. MySQL would need to provide automatic sharding to do this, which will be very hard to implement given the current internal design. It may happen in the future, though it will require a lot of engineering efforts to make it work properly.

Dec
29
2016
--

Query Language Type Overview

Query Language Type

Query Language TypeThis blog provides a query language type overview.

The idea for this blog originated from some customers asking me questions. When working in a particular field, you often a dedicated vocabulary that makes sense to your peers. It often includes phrases and abbreviations because it’s efficient. It’s no different in the database world. Much of this language might make sense to DBA’s, but it might sound like “voodoo” to people not used to it. The overview below covers the basic types of query languages inside SQL. I hope it clarifies what they mean, how they’re used and how you should interpret them.

DDL (Data Definition Language)

A database schema is a visualization of information. It contains the data structure separated by tables structures, views and anything that contains structure for your data. It defines how you want to store and visualize the information.

It’s like a skeleton, defining how data is organized. Any action that creates/updates/changes this skeleton is DDL.

Do you remember spreadsheets? A table definition describes something like:

Account number Account name Account owner Creation date Amount
Sorted ascending Unique, indexed Date, indexed Number, linked with transactions


Whenever you want to create a table like this, you must use a DDL query. For example:

CREATE TABLE Accounts
(
Account_number Bigint(16) ,
Account_name varchar(255),
Account_name varchar(255),
Creation_date date,
Amount Bigint(16),
PRIMARY KEY (Account_number),
UNIQUE(Account_name),
FOREIGN KEY (Amount) REFERENCES transactions(Balancevalue)
);

CREATE, ALTER, DROP, etc.: all of these types of structure modification queries are DDL queries!

Defining the structure of the tables is important as this defines how you would potentially access the information stored in the database while also defining how you might visualize it.

Why should you care that much?

DDL queries define the structure on which you develop your application. Your structure will also define how the database server searches for information in a table, and how it is linked to other tables (using foreign keys, for example).

You must design your MySQL schema before adding information to it (unlike NoSQL solutions such as MongoDB). MySQL might be more rigid in this manner, but it often makes sense to design the pattern for how you want to store your information and query it properly.

Due to the rigidity of an RDBMS system, changing the data structure (or table schema) requires the system to rebuild the actual table in most cases. This is potentially problematic for performance or table availability (locking). Often this is a “hot” procedure (since MySQL 5.6), requiring no downtime for active operations. Additionally, tools like pt-osc or other open source solutions can be used for migrating the data structure to a new format without requiring downtime.

An example:

ALTER TABLE accounts ADD COLUMN wienietwegisisgezien varchar(20)

DML (Data Manipulation Language)

Data manipulation is what it sounds like: working with information inside a structure. Inserting information and deleting information (adding rows, deleting rows) are examples of data manipulation.

An example:

INSERT into resto_visitor values(5,'Julian',’highway 5’,12);
UPDATE resto_visitor set name='Evelyn',age=17 where id=103;

Sure, but why should I use it?

Having a database environment makes no sense unless you insert and fetch information out of it. Remember that databases are plentiful in the world: whenever you click on a link on your favorite blog website, it probably means you are fetching information out of a database (and that data was at one time inserted or modified).

Interacting with a database requires that you write DML queries.

DCL (Data Control Language)

Data control language is anything that is used for administrating access to the database content. For example, GRANT queries:

GRANT ALL PRIVILEGES ON database.table to ‘jeffbridges’@’ourserver’;

Well that’s all fine, but why another subset “language” in SQL?

As a user of database environments, at some point you’ll get access permission from someone performing a DCL query. Data control language is used to define authorization rules for accessing the data structures (tables, views, variables, etc.) inside MySQL.

TCL (Transaction Control Language) Queries

Transaction control language queries are used to control transactional processing in a database. What do we mean by transactional processes? Transactional processes are typically bundled DML queries. For example:

BEGIN
FETCH INFORMATION OF TABLE B
INSERT DATA INTO A
REMOVE STALE DATA FROM B
COMMIT or ROLLBACK

This gives you the ability to perform or rollback a complete action. Only storage engines offering transaction support (like InnoDB) can work with TCL.

Yet another term, but why?

Ever wanted to combine information and perform it as one transaction? In some circumstances, for example, it makes sense to make sure you perform an insert first and then perform an update. If you don’t use transactions, the insert might fail and the associated update might be an invalid entry. Transactions make sure that either the complete transaction (a group of DML queries) takes place, or it’s completely rolled back (this is also referred to as atomicity).

Conclusion

Hopefully this blog post helps you understand some of the “insider” database speech. Post comments below.

Mar
10
2016
--

With SQL Server 2016, Microsoft focuses on speed, security and luring customers away from Oracle

data_Illustration_cloud A few days ago, Microsoft shocked us when it announced that it would soon bring its SQL Server database to Linux. It’ll take until 2017 before SQL Server will be available on Linux, though. Until then, the company’s database focus remains squarely on the upcoming release of SQL Server 2016, which is currently available as a release candidate and which will become generally… Read More

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com