Jun
12
2019
--

Apollo raises $22M for its GraphQL platform

Apollo, a San Francisco-based startup that provides a number of developer and operator tools and services around the GraphQL query language, today announced that it has raised a $22 million growth funding round co-led by Andreessen Horowitz and Matrix Partners. Existing investors Trinity Ventures and Webb Investment Network also participated in this round.

Today, Apollo is probably the biggest player in the GraphQL ecosystem. At its core, the company’s services allow businesses to use the Facebook -incubated GraphQL technology to shield their developers from the patchwork of legacy APIs and databases as they look to modernize their technology stacks. The team argues that while REST APIs that talked directly to other services and databases still made sense a few years ago, it doesn’t anymore now that the number of API endpoints keeps increasing rapidly.

Apollo replaces this with what it calls the Data Graph. “There is basically a missing piece where we think about how people build apps today, which is the piece that connects the billions of devices out there,” Apollo co-founder and CEO Geoff Schmidt told me. “You probably don’t just have one app anymore, you probably have three, for the web, iOS and Android . Or maybe six. And if you’re a two-sided marketplace you’ve got one for buyers, one for sellers and another for your ops team.”

Managing the interfaces between all of these apps quickly becomes complicated and means you have to write a lot of custom code for every new feature. The promise of the Data Graph is that developers can use GraphQL to query the data in the graph and move on, all without having to write the boilerplate code that typically slows them down. At the same time, the ops teams can use the Graph to enforce access policies and implement other security features.

“If you think about it, there’s a lot of analogies to what happened with relational databases in the ’80s,” Schmidt said. “There is a need for a new layer in the stack. Previously, your query planner was a human being, not a piece of software, and a relational database is a piece of software that would just give you a database. And you needed a way to query that database, and that syntax was called SQL.”

Geoff Schmidt, Apollo CEO, and Matt DeBergalis, CTO

GraphQL itself, of course, is open source. Apollo is now building a lot of the proprietary tools around this idea of the Data Graph that make it useful for businesses. There’s a cloud-hosted graph manager, for example, that lets you track your schema, as well as a dashboard to track performance, as well as integrations with continuous integration services. “It’s basically a set of services that keep track of the metadata about your graph and help you manage the configuration of your graph and all the workflows and processes around it,” Schmidt said.

The development of Apollo didn’t come out of nowhere. The founders previously launched Meteor, a framework and set of hosted services that allowed developers to write their apps in JavaScript, both on the front-end and back-end. Meteor was tightly coupled to MongoDB, though, which worked well for some use cases but also held the platform back in the long run. With Apollo, the team decided to go in the opposite direction and instead build a platform that makes being database agnostic the core of its value proposition.

The company also recently launched Apollo Federation, which makes it easier for businesses to work with a distributed graph. Sometimes, after all, your data lives in lots of different places. Federation allows for a distributed architecture that combines all of the different data sources into a single schema that developers can then query.

Schmidt tells me the company started to get some serious traction last year and by December, it was getting calls from VCs that heard from their portfolio companies that they were using Apollo.

The company plans to use the new funding to build out its technology to scale its field team to support the enterprises that bet on its technology, including the open-source technologies that power both the services.

“I see the Data Graph as a core new layer of the stack, just like we as an industry invested in the relational database for decades, making it better and better,” Schmidt said. “We’re still finding new uses for SQL and that relational database model. I think the Data Graph is going to be the same way.”

May
02
2019
--

Couchbase’s mobile database gets built-in ML and enhanced synchronization features

Couchbase, the company behind the eponymous NoSQL database, announced a major update to its mobile database today that brings some machine learning smarts, as well as improved synchronization features and enhanced stats and logging support, to the software.

“We’ve led the innovation and data management at the edge since the release of our mobile database five years ago,” Couchbase’s VP of Engineering Wayne Carter told me. “And we’re excited that others are doing that now. We feel that it’s very, very important for businesses to be able to utilize these emerging technologies that do sit on the edge to drive their businesses forward, and both making their employees more effective and their customer experience better.”

The latter part is what drove a lot of today’s updates, Carter noted. He also believes that the database is the right place to do some machine learning. So with this release, the company is adding predictive queries to its mobile database. This new API allows mobile apps to take pre-trained machine learning models and run predictive queries against the data that is stored locally. This would allow a retailer to create a tool that can use a phone’s camera to figure out what part a customer is looking for.

To support these predictive queries, Couchbase mobile is also getting support for predictive indexes. “Predictive indexes allow you to create an index on prediction, enabling correlation of real-time predictions with application data in milliseconds,” Carter said. In many ways, that’s also the unique value proposition for bringing machine learning into the database. “What you really need to do is you need to utilize the unique values of a database to be able to deliver the answer to those real-time questions within milliseconds,” explained Carter.

The other major new feature in this release is delta synchronization, which allows businesses to push far smaller updates to the databases on their employees’ mobile devices. That’s because they only have to receive the information that changed instead of a full updated database. Carter says this was a highly requested feature, but until now, the company always had to prioritize work on other components of Couchbase.

This is an especially useful feature for the company’s retail customers, a vertical where it has been quite successful. These users need to keep their catalogs up to data and quite a few of them supply their employees with mobile devices to help shoppers. Rumor has it that Apple, too, is a Couchbase user.

The update also includes a few new features that will be more of interest to operators, including advanced stats reporting and enhanced logging support.

Feb
21
2019
--

Redis Labs changes its open-source license — again

Redis Labs, fresh off its latest funding round, today announced a change to how it licenses its Redis Modules. This may not sound like a big deal, but in the world of open-source projects, licensing is currently a big issue. That’s because organizations like Redis, MongoDB, Confluent and others have recently introduced new licenses that make it harder for their competitors to take their products and sell them as rebranded services without contributing back to the community (and most of these companies point directly at AWS as the main offender here).

“Some cloud providers have repeatedly taken advantage of successful opensource projects, without significant contributions to their communities,” the Redis Labs team writes today. “They repackage software that was not developed by them into competitive, proprietary service offerings and use their business leverage to reap substantial revenues from these open source projects.”

The point of these new licenses it to put a stop to this.

This is not the first time Redis Labs has changed how it licenses its Redis Modules (and I’m stressing the “Redis Modules” part here because this is only about modules from Redis Labs and does not have any bearing on how the Redis database project itself is licensed). Back in 2018, Redis Labs changed its license from AGPL to Apache 2 modified with Commons Clause. The “Commons Clause” is the part that places commercial restrictions on top of the license.

That created quite a stir, as Redis Labs co-founder and CEO Ofer Bengal told me a few days ago when we spoke about the company’s funding.

“When we came out with this new license, there were many different views,” he acknowledged. “Some people condemned that. But after the initial noise calmed down — and especially after some other companies came out with a similar concept — the community now understands that the original concept of open source has to be fixed because it isn’t suitable anymore to the modern era where cloud companies use their monopoly power to adopt any successful open source project without contributing anything to it.”

The way the code was licensed, though, created a bit of confusion, the company now says, because some users thought they were only bound by the terms of the Apache 2 license. Some terms in the Commons Clause, too, weren’t quite clear (including the meaning of “substantial,” for example).

So today, Redis Labs is introducing the Redis Source Available License. This license, too, only applies to certain Redis Modules created by Redis Labs. Users can still get the code, modify it and integrate it into their applications — but that application can’t be a database product, caching engine, stream processing engine, search engine, indexing engine or ML/DL/AI serving engine.

By definition, an open-source license can’t have limitations. This new license does, so it’s technically not an open-source license. In practice, the company argues, it’s quite similar to other permissive open-source licenses, though, and shouldn’t really affect most developers who use the company’s modules (and these modules are RedisSearch, RedisGraph, RedisJSON, RedisML and RedisBloom).

This is surely not the last we’ve heard of this. Sooner or later, more projects will follow the same path. By then, we’ll likely see more standard licenses that address this issue so other companies won’t have to change multiple times. Ideally, though, we won’t need it because everybody will play nice — but since we’re not living in a utopia, that’s not likely to happen.

Jan
31
2019
--

Google’s Cloud Firestore NoSQL database hits general availability

Google today announced that Cloud Firestore, its serverless NoSQL document database for mobile, web and IoT apps, is now generally available. In addition, Google is also introducing a few new features and bringing the service to 10 new regions.

With this launch, Google is giving developers the option to run their databases in a single region. During the beta, developers had to use multi-region instances, and, while that obviously has some advantages with regard to resilience, it’s also more expensive and not every app needs to run in multiple regions.

“Some people don’t need the added reliability and durability of a multi-region application,” Google product manager Dan McGrath told me. “So for them, having a more cost-effective regional instance is very attractive, as well as data locality and being able to place a Cloud Firestore database as close as possible to their user base.”

The new regional instance pricing is up to 50 percent cheaper than the current multi-cloud instance prices. Which solution you pick does influence the SLA guarantee Google gives you, though. While the regional instances are still replicated within multiple zones inside the region, all of the data is still within a limited geographic area. Hence, Google promises 99.999 percent availability for multi-region instances and 99.99 percent availability for regional instances.

And talking about regions, Cloud Firestore is now available in 10 new regions around the world. Firestore launched with a single location when it launched and added two more during the beta. With this, Firestore is now available in 13 locations (including the North America and Europe multi-region offerings). McGrath tells me Google is still in the planning stage for deciding the next phase of locations, but he stressed that the current set provides pretty good coverage across the globe.

Also new in this release is deeper integration with Stackdriver, the Google Cloud monitoring service, which can now monitor read, write and delete operations in near-real time. McGrath also noted that Google plans to add the ability to query documents across collections and increment database values without needing a transaction.

It’s worth noting that while Cloud Firestore falls under the Google Firebase brand, which typically focuses on mobile developers, Firestore offers all of the usual client-side libraries for Compute Engine or Kubernetes Engine applications, too.

“If you’re looking for a more traditional NoSQL document database, then Cloud Firestore gives you a great solution that has all the benefits of not needing to manage the database at all,” McGrath said. “And then, through the Firebase SDK, you can use it as a more comprehensive back-end as a service that takes care of things like authentication for you.”

One of the advantages of Firestore is that it has extensive offline support, which makes it ideal for mobile developers but also IoT solutions. Maybe it’s no surprise, then, that Google is positioning it as a tool for both Google Cloud and Firebase users.

Nov
28
2018
--

AWS launches new time series database

AWS announced a new time series database today at AWS re:Invent in Las Vegas. The new product called DynamoDB On-Demand is a fully managed database designed to track items over time, which can be particularly useful for Internet of Things scenarios.

“With time series data each data point consists of a timestamp and one or more attributes and it really measures how things change over time and helps drive real time decisions,” AWS CEO Andy Jassy explained.

He sees a problem though with existing open source and commercial solutions, which says don’t scale well and hard to manage. This is of course a problem that a cloud service like AWS often helps solve.

Not surprising as customers were looking for a good time series database solution, AWS decided to create one themselves. “Today we are introducing Amazon DynamoDB on-demand, a flexible new billing option for DynamoDB capable of serving thousands of requests per second without capacity planning,” Danilo Poccia from AWS wrote in the blog post introducing the new service.

Jassy said that they built DynamoDB on-demand from the ground up with an architecture that organizes data by time intervals and enables time series specific data compression, which leads to less scanning and faster performance.

He claims it will be a thousand times faster at a tenth of cost, and of course it scales up and down as required and includes all of the analytics capabilities you need to understand all of the data you are tracking.

This new service is available across the world starting today.

more AWS re:Invent 2018 coverage

Sep
24
2018
--

Microsoft updates its planet-scale Cosmos DB database service

Cosmos DB is undoubtedly one of the most interesting products in Microsoft’s Azure portfolio. It’s a fully managed, globally distributed multi-model database that offers throughput guarantees, a number of different consistency models and high read and write availability guarantees. Now that’s a mouthful, but basically, it means that developers can build a truly global product, write database updates to Cosmos DB and rest assured that every other user across the world will see those updates within 20 milliseconds or so. And to write their applications, they can pretend that Cosmos DB is a SQL- or MongoDB-compatible database, for example.

CosmosDB officially launched in May 2017, though in many ways it’s an evolution of Microsoft’s existing Document DB product, which was far less flexible. Today, a lot of Microsoft’s own products run on CosmosDB, including the Azure Portal itself, as well as Skype, Office 365 and Xbox.

Today, Microsoft is extending Cosmos DB with the launch of its multi-master replication feature into general availability, as well as support for the Cassandra API, giving developers yet another option to bring existing products to CosmosDB, which in this case are those written for Cassandra.

Microsoft now also promises 99.999 percent read and write availability. Previously, it’s read availability promise was 99.99 percent. And while that may not seem like a big difference, it does show that after more of a year of operating Cosmos DB with customers, Microsoft now feels more confident that it’s a highly stable system. In addition, Microsoft is also updating its write latency SLA and now promises less than 10 milliseconds at the 99th percentile.

“If you have write-heavy workloads, spanning multiple geos, and you need this near real-time ingest of your data, this becomes extremely attractive for IoT, web, mobile gaming scenarios,” Microsoft CosmosDB architect and product manager Rimma Nehme told me. She also stressed that she believes Microsoft’s SLA definitions are far more stringent than those of its competitors.

The highlight of the update, though, is multi-master replication. “We believe that we’re really the first operational database out there in the marketplace that runs on such a scale and will enable globally scalable multi-master available to the customers,” Nehme said. “The underlying protocols were designed to be multi-master from the very beginning.”

Why is this such a big deal? With this, developers can designate every region they run Cosmos DB in as a master in its own right, making for a far more scalable system in terms of being able to write updates to the database. There’s no need to first write to a single master node, which may be far away, and then have that node push the update to every other region. Instead, applications can write to the nearest region, and Cosmos DB handles everything from there. If there are conflicts, the user can decide how those should be resolved based on their own needs.

Nehme noted that all of this still plays well with CosmosDB’s existing set of consistency models. If you don’t spend your days thinking about database consistency models, then this may sound arcane, but there’s a whole area of computer science that focuses on little else but how to best handle a scenario where two users virtually simultaneously try to change the same cell in a distributed database.

Unlike other databases, Cosmos DB allows for a variety of consistency models, ranging from strong to eventual, with three intermediary models. And it actually turns out that most CosmosDB users opt for one of those intermediary models.

Interestingly, when I talked to Leslie Lamport, the Turing award winner who developed some of the fundamental concepts behind these consistency models (and the popular LaTeX document preparation system), he wasn’t all that sure that the developers are making the right choice. “I don’t know whether they really understand the consequences or whether their customers are going to be in for some surprises,” he told me. “If they’re smart, they are getting just the amount of consistency that they need. If they’re not smart, it means they’re trying to gain some efficiency and their users might not be happy about that.” He noted that when you give up strong consistency, it’s often hard to understand what exactly is happening.

But strong consistency comes with its drawbacks, too, which leads to higher latency. “For strong consistency there are a certain number of roundtrip message delays that you can’t avoid,” Lamport noted.

The CosmosDB team isn’t just building on some of the fundamental work Lamport did around databases, but it’s also making extensive use of TLA+, the formal specification language Lamport developed in the late 90s. Microsoft, as well as Amazon and others, are now training their engineers to use TLA+ to describe their algorithms mathematically before they implement them in whatever language they prefer.

“Because [CosmosDB is] a massively complicated system, there is no way to ensure the correctness of it because we are humans, and trying to hold all of these failure conditions and the complexity in any one person’s — one engineer’s — head, is impossible,” Microsoft Technical Follow Dharma Shukla noted. “TLA+ is huge in terms of getting the design done correctly, specified and validated using the TLA+ tools even before a single line of code is written. You cover all of those hundreds of thousands of edge cases that can potentially lead to data loss or availability loss, or race conditions that you had never thought about, but that two or three years ago after you have deployed the code can lead to some data corruption for customers. That would be disastrous.”

“Programming languages have a very precise goal, which is to be able to write code. And the thing that I’ve been saying over and over again is that programming is more than just coding,” Lamport added. “It’s not just coding, that’s the easy part of programming. The hard part of programming is getting the algorithms right.”

Lamport also noted that he deliberately chose to make TLA+ look like mathematics, not like another programming languages. “It really forces people to think above the code level,” Lamport noted and added that engineers often tell him that it changes the way they think.

As for those companies that don’t use TLA+ or a similar methodology, Lamport says he’s worried. “I’m really comforted that [Microsoft] is using TLA+ because I don’t see how anyone could do it without using that kind of mathematical thinking — and I worry about what the other systems that we wind up using built by other organizations — I worry about how reliable they are.”

more Microsoft Ignite 2018 coverage

Sep
20
2018
--

MariaDB acquires Clustrix

MariaDB, the company behind the eponymous MySQL drop-in replacement database, today announced that it has acquired Clustrix, which itself is a MySQL drop-in replacement database, but with a focus on scalability. MariaDB will integrate Clustrix’s technology into its own database, which will allow it to offer its users a more scalable database service in the long run.

That by itself would be an interesting development for the popular open source database company. But there’s another angle to this story, too. In addition to the acquisition, MariaDB also today announced that cloud computing company ServiceNow is investing in MariaDB, an investment that helped it get to today’s acquisition. ServiceNow doesn’t typically make investments, though it has made a few acquisitions. It is a very large MariaDB user, though, and it’s exactly the kind of customer that will benefit from the Clustrix acquisition.

MariaDB CEO Michael Howard tells me that ServiceNow current supports about 80,000 instances of MariaDB. With this investment (which is actually an add-on to MariaDB’s 2017 Series C round), ServiceNow’s SVP of Development and Operations Pat Casey will join MariaDB’s board.

Why would MariaDB acquire a company like Clustrix, though? When I asked Howard about the motivation, he noted that he’s now seeing more companies like ServiceNow that are looking at a more scalable way to run MariaDB. Howard noted that it would take years to build a new database engine from the ground up.

“You can hire a lot of smart people individually, but not necessarily have that experience built into their profile,” he said. “So that was important and then to have a jumpstart in relation to this market opportunity — this mandate from our market. It typically takes about nine years, to get a brand new, thorough database technology off the ground. It’s not like a SaaS application where you can get a front-end going in about a year or so.

Howard also stressed that the fact that the teams at Clustrix and MariaDB share the same vocabulary, given that they both work on similar problems and aim to be compatible with MySQL, made this a good fit.

While integrating the Clustrix database technology into MariaDB won’t be trivial, Howard stressed that the database was always built to accommodate external database storage engines. MariaDB will have to make some changes to its APIs to be ready for the clustering features of Clustrix. “It’s not going to be a 1-2-3 effort,” he said. “It’s going to be a heavy-duty effort for us to do this right. But everyone on the team wants to do it because it’s good for the company and our customers.

MariaDB did not disclose the price of the acquisition. Since it was founded in 2006, though, the Y Combinator-incubated Clustrix had raised just under $72 million, though. MariaDB has raised just under $100 million so far, so it’s probably a fair guess that Clustrix didn’t necessarily sell for a large multiple of that.

May
15
2018
--

MemSQL raises $30M Series D round for its real-time database

MemSQL, a company best known for the real-time capabilities of its eponymous in-memory database, today announced that it has raised a $30 million Series D round, bringing the company’s overall funding to $110 million. The round was led by GV (the firm you probably still refer to as Google Ventures) and Glynn Capital. Existing investors Accell, Caffeinated Capital, Data Collective and IA Ventures also participated.

The MemSQL database offers a distributed, relational database that uses standard SQL drivers and queries for transactions and analytics. Its defining feature is the combination of its data ingestions technology that allows users to push millions of events per day into the service while its users can query the records in real time. The company recently showed that its tools can deliver a scan rate of over a trillion rows per second on a cluster with 12 servers.

The database is available for deployments on the major public clouds and on-premises.

MemSQL recently announced that it saw its fourth-quarter commercial booking hit 200 percent year-over-year growth — and that’s typically the kind of growth that investors like to see, even as MemSQL plays in a very competitive market with plenty of incumbents, startups and even open-source projects. Current MemSQL users include the likes of Uber, Akamai, Pinterest, Dell EMC and Comcast.

“MemSQL has achieved strong enterprise traction by delivering a database that enables operational analysis at unique speed and scale, allowing customers to create dynamic, intelligent applications,” said Adam Ghobarah, general partner at GV, in today’s announcement. “The company has demonstrated measurable success with its growing enterprise customer base and we’re excited to invest in the team as they continue to scale.”

Mar
08
2018
--

Binlog Encryption with Percona Server for MySQL

binlog encryption

In this blog post, we’ll look at how to turn on binlog encryption in Percona Server for MySQL.

Why do I need this?

As you probably know, Percona Server for MySQL’s binlog contains sensitive information. Replication uses the binlog to copy events between servers. They contain all the information from one server so that it can be applied on another. In other words, if somebody has access to a binlog, it means they have access to all the data in the server. Moreover, said person (or, “Hacker”) could create a clone copy of our server by just making a replica of it. In the end, they have access to our binlog. This shows how important protecting a binlog really is – leakage of binlogs not only make a particular table/tablespace or a group of tables accessible to a hacker, but literally the whole server is at risk. The same situation is true with relay log – a relay log is really a copy of binlog on the slave server.

But have no fear – a new feature to the rescue – binary log encryption. Since Percona Server for MySQL version 5.7.20-19 (beta version) it is possible to enable binlog encryption for all the binlogs and relay logs produced by the server.

How do you turn it on?

To start binlog encryption, you need to start the server with –encrypt-binlog=1. This, in turn, requires –master_verify_checksum and –binlog_checksum both to be ON. Also, you need to install one of the keyring plugins.

From now on all the binlogs and relay logs produced by the server get encrypted. However, for the replication to be safe as a whole the connection between servers also has to be encrypted. See https://dev.mysql.com/doc/refman/5.7/en/replication-solutions-encrypted-connections.html for details on how to do this.

Please note that this does not mean that all binlogs in our replication schema get encrypted. Remember you need to turn on encrypt-binlog on slave servers too, even if they do not produce binlog files. Slave servers still produce relay logs when replicating from a master server. Turn on encrypt-binlog on slave servers so that their relay logs also get encrypted.

How does this work in the big picture?

The master encrypts the event before writing it into the binlog. The slave connects to master and ask for events. The master decrypts the events from the binary log and sends them over to slave.

Note that events send between the master and slave servers are not encrypted! This is why the connection between the master and slave needs to use a secure channel, i.e., TLS.

The slave receives events from the master, encrypts them and writes them down into the relay log.

That is why we need to enable encrypt-binlog on a slave. The relay log has to get encrypted too.

Next, the slave decrypts events from relay log and applies them. After applying the event the slave encrypts it and writes it down into its binlog file (given binlog is enabled on the slave).

In summary to make our replication secure, we need:

  • Turn on encrypt-binlog on the master
  • Turn on encrypt-binlog on the slave
  • The connection between master and slave needs to use TLS.

It’s worth noting that servers in replication have no idea if other servers are encrypted or not.

Why do master_verify_checksum and binlog_checksum need to be turned ON?

This is needed for “authenticate encryption”. Simply put, this is how we make sure that what we decrypt has not been changed by a third party. Also, it checks if the key that was used to decrypt the event was the correct one.

Digging deeper with mysqlbinlog

Mysqlbinlog is a standalone application that lets you read binlog files. As I write this blog post, it is not capable of decrypting binary logs – at least not by itself. However, it still can read encrypted binlog files when using a running Percona Server for MySQL. Use option –read-from-remote-server to read binary log produced by a given server.

Let’s see what happens when we try to read an encrypted binlog with mysqlbinlog without read-from-remote-server enabled. You will get something like this:

As you can see it is only possible to read binary log till event type 9f gets read. This event is the Start_encryption_event. After this event the rest of the binlog is encrypted. One thing to note is that Start_encryption_event is never propagated in replication. For instance, the master server is run with –encryt_binlog. This means that the server writes Start_encryption_event to its binary logs. However it is never sent to the slave server (the slave has no idea whether the master is encrypted).

Another option you can use with mysqlbinlog is –force option. It forces mysqlbinlog to read all the events from the binlog, even if they are encrypted. You will see something like this in the output:

As you can see, it is only possible to read two first events – until the Start_encryption_event. However, this time we can see that there are other events that follow, which are encrypted.

Running mysqlbinlog (without –read-from-remote) on encrypted binary logs may only make sense if we want to see if a given binary log is encrypted. For point-in-time recovery, and for other purposes that might require reading encrypted binlog, we would use mysqlbinlog with –read-from-remote option.

For instance, if we want to read binlog master-bin.000001, and Percona Server for MySQL is running on 127.0.0.1, port 3033, with user:robert, password:hard_password, we would use mysqlbinlog like this:

mysqlbinlog –read-from-remote-server –protocol=tcp –host=127.0.0.1 –port=3033 –user=robert –password=hard_password master-bing.000001.

When you look at the output of this command, you see something like this:

You can now see the decrypted binlog. One interesting thing to note here is that we do not see our Start_encryption_event (type 9f). This proves my point – Start_encryption_event never leaves the server (we are reading from the server now as we use –read-from-remote-server).

For more information how to use mysqlbinlog for point-in-time recovery see https://dev.mysql.com/doc/refman/5.7/en/point-in-time-recovery.html.

However, for more modern approaches for point-in-time recovery that do not use mysqlbinlog and make use of parallel appliers, see here:

Have fun with binlog encryption!

Mar
07
2018
--

Percona Live 2018 Featured Talk: Securing Your Data on PostgreSQL with Payal Singh

Payal PostgreSQL 1

Percona Live 2018 Featured TalkWelcome to another interview blog for the rapidly-approaching Percona Live 2018. Each post in this series highlights a Percona Live 2018 featured talk at the conference and gives a short preview of what attendees can expect to learn from the presenter.

This blog post highlights Payal Singh, DBA at OmniTI Computer Consulting Inc. Her talk is titled Securing Your Data on PostgreSQL. There is often a lack of understanding about how best to manage minimum basic application security features – especially with major security features being released with every major version of PostgreSQL. In our conversation, we discussed how Payal works to improve application security using Postgres:

Percona: Who are you, and how did you get into databases? What was your path to your current responsibilities?

Payal: I’m primarily a data addict. I fell in love with databases when it was first taught to me in high school. The declarative SQL syntax was intuitive to me, and efficient compared to other languages I had used (C and C++). I realized that if given the opportunity, I’d choose to become a database administrator. I joined OmniTI in summer of 2012 as a web engineer intern during my Masters, but grabbed the chance to work on an internal database migration project. Working with the DBA team gave me a lot of new insight and exposure, especially into open source databases. The more I learned, the more I loved my job. Right after completing my Masters I joined OmniTI as a full-time database administrator, and never looked back!

Percona: Your talk is titled ” Securing Your Data on PostgreSQL”. Why do you think that security (or the lack of it) is such an issue?

Payal: Securing your data is critical. In my experience, the one reason people using commercial databases are apprehensive of switching to open source alternatives is a lack of exposure to security features. If you look at open source databases today, specifically PostgreSQL, it has the most advanced security features: data encryptionauditingrow-level security to name a few. People don’t know about them, though. As a FOSS project, we don’t have a centralized marketing team to advertise these features to our potential user base, which makes it necessary to spread information through other channels. Speaking about it at a popular conference like Percona Live is one of them!

In addition to public awareness, Postgres is advancing at a lightning pace. With each new major version released every year, a bunch of new security feature additions and major improvements in existing security features are added. So much so that it becomes challenging to keep up with all these features, even for existing Postgres users. My talk on Postgres security aims to inform current as well as prospective Postgres users about the advanced security features that exist and their use case, useful tips to use them, the gotchas, what’s lacking and what’s currently under development.

Percona: Is PostgreSQL better or worse with security and security options than either MySQL or MongoDB? Why?

Payal PostgreSQL 1Payal: I may be a little biased, but I think Postgres is the best database from a security point of view. MySQL is pretty close though! There are quite a few reasons why I consider Postgres to be the best, but I’d like to save that discussion for my talk at Percona Live! For starters though, I think that Postgres’s authentication and role architecture significantly clearer and more straightforward than MySQL’s implementation. Focusing strictly on security, I’d also say that access control and management is more granular and customizable in Postgres than it is in MySQL – although here I’d have to say MySQL’s ACL is easier and more intuitive to manage.

Percona: What is the biggest challenge for database security we are facing?

Payal: For all the databases? I’d say with the rapid growth of IoT, encrypted data processing is a huge requirement that none of the well-known databases currently provide. Even encryption of data at rest outside of the IoT context requires more attention. It is one of the few things that a DBMS can do as a last-ditch effort to protect its data in SQL injection attacks, if all other layers of security (network, application layer, etc.) have failed (which very often is the case).

Percona: Why should people attend your talk? What do you hope people will take away from it? 

Payal: My talk is a run-through of all current and future Postgres security features, from the basic to the very advanced and niche. It is not an isolated talk that assumes Postgres is the only database in the world. I often compare and contrast other database implementations of similar security features as well. Not only is it a decent one-hour primer for people new and interested in Postgres, but also a good way to weigh the pros and cons among databases from a security viewpoint.

Percona: What are you looking forward to at Percona Live (besides your talk)?

Payal: I’m looking forward to all the great talks! I got a lot of information out of the talks at Percona Live last year. The tutorials on new MySQL features were especially great!

Want to find out more about this Percona Live 2018 featured talk, and Payal and PostgreSQL security? Register for Percona Live 2018, and see her talk Securing Your Data on PostgreSQL. Register now to get the best price! Use the discount code SeeMeSpeakPL18 for 10% off.

Percona Live Open Source Database Conference 2018 is the premier open source event for the data performance ecosystem. It is the place to be for the open source community. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

The Percona Live Open Source Database Conference will be April 23-25, 2018 at the Hyatt Regency Santa Clara & The Santa Clara Convention Center.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com