Percona Is a Finalist for Best Use of Open Source Technologies in 2021!

Percona Finalist Open Source

Percona has been named a finalist in the Computing Technology Product Awards for Best Use of Open Source Technologies. If you’re a customer, partner, or just a fan of Percona and what we stand for, we’d love your vote.

With Great Power…

You know the phrase. We’re leaving it to you and your peers in the tech world to push us to the top.

Computing’s Technology Product Awards are open to a public vote until October 29. Vote Here!

percona Best Use of Open Source Technologies

Thank you for supporting excellence in the open source database industry. We look forward to the awards ceremony on Friday, November 26, 2021.

Why We’re an Open Source Finalist

A contributing factor to our success has been Percona Monitoring and Management (PMM), an open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical MySQL, MongoDB, PostgreSQL, and MariaDB database environments, no matter where they are located or deployed. It’s impressing customers, and even competitors, in the industry.

If you want to see how Percona became a finalist, learn more about Percona Monitoring and Management, and be sure to follow @Percona on all platforms.

Vote Today!


Do You Believe in the Future of Open Source Databases? Tell Us Your Views!

2021 Percona Open Source Survey

2021 Percona Open Source SurveyComplete the 2021 Open Source Data Management Software Survey to share your knowledge and experience, and help inform the open source database community.

In 2020 we ran our second Open Source Data Management Software Survey. This resulted in some interesting data on the state of the open source database market. 

Some key statistics:

  • 41% of buying decisions are now made by architects, giving them significant power over software adoption within a company.
  • 81% of respondents gave cost savings as the most important reason for adoption. In this challenging economic climate, many companies are actively avoiding vendor license costs and lock-in.
  • 82% of respondents reported at least a 5% database footprint growth over the last year, with 62% reporting more significant growth and 12% growing over 50%.
  • Although promoted as a cheap and convenient alternative, cloud costs can spiral, with 22% of companies spending more on cloud hosting than planned.

To see how the landscape has changed over the last (turbulent) year, we are pleased to announce the launch of our third annual Open Source Data Management Software Survey

The final results will be 100% anonymous and made freely available via Creative Commons Attribution ShareAlike (CC BY-SA).

Access to Accurate Market Data is Important

There are millions of open source projects currently running, and most are dependent on databases. 

Accurate data helps people track the popularity of different databases, and see how and where these databases are running successfully. 

The information helps people build better software and take advantage of shifting trends. It also helps businesses understand industry direction and make informed decisions on the software and services they choose.

We want to help developers, architects, DBAs, and any other users choose the best database or tool for their business, and understand how and where to deploy it. 

How Can You Help This Survey Succeed?

Firstly, we would love you to complete the survey and share your insight into current trends and new developments in open source database software. 

Secondly, we hope you will share this survey with other people who use open source database software and encourage them to contribute.

The more responses we receive, the more useful this data will be to the community. If there is anything we missed or you would like to ask in the future, we welcome your feedback.

The survey should take around 10 minutes to complete. 

Click here to complete the survey!


Lessons From Peter Zaitsev…

Tom Basil Peter Zaitsev Percona

Tom Basil Peter Zaitsev PerconaPeter Zaitsev, known to all as Percona’s founder and CEO, asked me to pen a few reminisces as I retire.  How do I sum up 22 years in the MySQL world?  How do I even begin?  I best start with gratitude. I’ve had the amazing privilege of being on the inside of the leadership teams of two highly impactful software startups, MySQL and Percona.

I now find myself as Percona’s first-ever retiree.  So I write as PZ requested of me, with the caveat that these reflections are wholly my own.  They do not necessarily reflect Percona policy or even Percona’s current conditions, as I’ve been less in the front lines in recent times.

Tom, Monty Widenius (MySQL Founder), Heikki Turri (InnoDB Creator), & Peter, at the 2nd ever MySQL User Conference, 2004

Tom, Monty Widenius (MySQL Founder), Heikki Turri (InnoDB Creator), & Peter, at the 2nd ever MySQL User Conference, 2004

I became a MySQL DBA in late 1999.  My then-boss in Maryland agreed to buy the highest tier support offered by the fledgling MySQL company in Finland.  A $12,000 annual payment got you the personal telephone numbers of everyone in the company, then maybe a half dozen persons, and quick answers to your emails often direct from company founder Monty Widenius.  The passionate intensity of MySQL support was unlike anything I’d ever experienced.  I loved it, while the speed and simplicity of MySQL delighted me.

Monty Widenius and co-founder David Axmark pioneered the now-familiar pattern of open source software employment – no in-person job interviews, work at home from anywhere via the internet, then fly all over the world for meetings and conferences.  But when Monty invited me to join MySQL Ab in 2001, this was still pretty weird stuff, at least in the USA.  My wife was incredulous that I’d take a job from some guy in Finland whom I’d never met and who gave away his software for free.  As I once wrote in 2007:

I had a much better Oracle DBA offer from a large bank, and six kids (ages 2 to 13) plus my wife at home.  To her the bank was a sure bet and a virtual boss in Finland an absurd risk.  But MySQL struck me as an exciting company that kept its promises.  What to do?  I asked a trusted (Catholic) priest for his opinion.  He surprised me and voted for MySQL.  I borrowed a bedroom from the kids as my office and joined MySQL as employee #11.

Monty, Vadim, & Peter share dinner, 2008

Monty, Vadim, & Peter Share Dinner, 2008

Monty appointed me Director of Support.  This ultimately grew to be a team of 60 high-end experts.  This was another of Monty’s innovations – technical support was not an entry-level role but a senior-level one, a prestigious career destination for top experts.

Twenty years seems not that long ago.  Yet so much that’s considered ordinary now wasn’t then.  Neighbors assumed I was really unemployed until I started to travel to Europe pretty regularly.  Everything inside MySQL was transacted by email.  No Zoom, no Slack, no Confluence, nor any other of today’s common tools were then known.  Maybe within a year of my joining MySQL, we began using Internet Relay Chat, or IRC.  Having real-time chat among a global team was breakthrough technology.  Yet, the IRC command line was too geeky for admin staff, so it never became a company-wide tool.

Another novelty though now commonplace, was a 100% global workforce.  I had been in my mid-30s when the Cold War ended in 1991.  Throughout my formative years, Russians, Ukrainians, Estonians, Bulgarians, and Serbs lived trapped behind the Iron Curtain, far distant from ordinary Americans like me.  Now they had become my daily co-workers and personal friends.  For me, it was another bit of cultural whiplash of the MySQL era.

MySQL experts were then extremely scarce.   In 2002 Monty called me excitedly.  A Russian MySQL prodigy had just accepted his job offer.  His name was Peter Zaitsev, and Monty wanted me to be his manager.  That year the entire MySQL company fit into one bus and rode from Helsinki to St. Petersburg to meet our new Russian colleagues.  Peter looked to me like a teenager.  In reality, he was age 20, married with one child, had a Master’s Degree in Computer Science, and was already a serial entrepreneur. He’d built his Russian startup Spylog around InnoDB, making himself one of the world’s first experts on this now ubiquitous storage engine.

Peter & Monty at the MySQL User Conference, 2008

Peter & Monty at the MySQL User Conference, 2008

I created the “High-Performance Group” within MySQL Support as a home for Peter to run.  He was in demand all over the globe to troubleshoot difficult MySQL cases.  He badly needed a deputy, and this is how Vadim Tkachenko entered my life.  It also became the first of many business lessons I would learn from Peter.  Peter is young enough to be my son.  But regarding entrepreneurship, it has been the reverse, Peter the father teaching me the son.

One lesson personified in Vadim was how demanding Peter was in his hiring standards.  He disqualified many applicants who struck me as very well qualified.  Peter painstakingly probed and tested Vadim’s coding expertise.  But the fruit of that care has shown in their long partnership and the technical excellence that Percona is known for.  More troubling is that when I approved Vadim’s hiring, I had no warning of his wry sense of humor or that I would ever become its victim.  How often I have paid for this oversight in the ensuing years.

In 2006 Peter and Vadim left MySQL to launch Percona and invited me to join them.  MySQL had by now become well established, with 400+ staff and strong VC backing.  Peter had only Vadim and a good reputation, but no money.  Peter argued loudly with me – he is known for this – that Percona would become the future of MySQL.  Peter said trends were shifting in his favor, and if I had any sense, I’d see it and get onboard.

I was then comfortable and established in MySQL Ab.  I saw no need at age 53 to risk everything (again) on a two-person startup.  But in 2008, Sun acquired MySQL for $1 billion. Sun’s culture was big corporate America and so unlike the freewheeling Scandinavian culture of MySQL.  And being the CEO’s friend at a tiny startup seemed a better place to live than the anonymous middle tiers of a downsizing megafirm.

So I quit Sun and became Percona’s first COO, a title I held for seven years.  Later, Peter named me Percona’s Chief of Staff, a role I held until my retirement.  In total, I was Peter’s boss for four years, and then he was my boss for 13 years.  Vadim went from my second-level report to being my second boss. The arrangements of fate are indeed curious.

So what have I witnessed at Percona, and what have I learned from it all?  

Foremost is that in a certain sense, Percona is a web of friendship. It’s a nexus of skilled people who cooperate, communicate, help, and labor hard as friends around a common endeavor.  Seeing Percona as a family is an exaggeration, but seeing it as a community of highly interdependent friends is not.  The enjoyment of friendships I’ve seen among staff, especially at conferences and meetings, has been deep and real.

Tom jumps off a cliff with Percona colleagues in Cancun, 2011

Tom jumps off a cliff with Percona colleagues in Cancun, 2011

My memories of Percona include events in California, Texas, New York, North Carolina, Quebec, Mexico, the Dominican Republic, Ireland, the United Kingdom, the Netherlands, Portugal, Switzerland, Estonia, Ukraine, Croatia, and Montenegro.  These remain precious memories, and I’m certain I’m not alone in this.  The relationships nurtured in these events build up reservoirs of trust and goodwill.  This is what oils the progress made during the long months of working in isolation from home.

Percona also welcomed families to most Percona events.  My wife and I shared seven trips with Percona to attractive locales.  She made Percona friends of her own with whom she’d enjoy reunions.  Kids came too, a few times.  We also hosted Percona guests in our home.

Vadim with Tom & his wife Kathleen at their home in Maryland, 2007.

Vadim with Tom & his wife Kathleen at their home in Maryland, 2007.

Consciously making Percona a family affair was another of Peter’s ideas and a very good one for business.  That my family’s welfare was directly tied to Percona’s success was understood by each of us.  When I had to upend our schedule for the Percona crisis du jour, family complaints were few.  My wife and children’s connectedness with Percona also made working remotely less isolating for me.

I think the interruption of professional friendships has been the worst consequence of the Covid pandemic for Percona.  Friendships can be built online too, but it’s much harder when done among those who’ve never once met. There’s something about relating in person that’s unique.  The virtual world can imitate but not replace it.

When friends are dealing with friends, all problems get resolved without managerial escalation or bureaucratic morass.  All sorts of improvements get implemented with minimal friction.  Peter usually fosters debate and invites dissenting opinions, as did Monty and MySQL CEO Marten Mickos before him.  Yet honest debate and credible decisions require trust.  And trust is best fostered within an atmosphere of authentic friendships.

This dynamic was at play inside MySQL Ab, and I think a big factor in powering MySQL to prominence.  Percona, at its best, has continued with this and many other MySQL traditions.  For instance, Percona’s monthly All Company Zoom calls are where the entire staff is trusted with batches of corporate metrics, both good and bad, and staff can directly question the CEO.

Yet friendship is not indulgence.  Another PZ business lesson I’ve imbibed is that deserved firing is an act of friendship towards the company as a whole.  I didn’t fully grasp this pre-Percona.  Not everyone makes a worthy friend, and some need to depart before they drag down the whole.  Friendship is also not indiscreet.  Transparency has proper limits, and some things cannot be explained or even acknowledged.

Another PZ lesson is that what (or who) got you here isn’t what will get you where you need to go.  At a certain point, Percona outgrew me as COO.  I had the work ethic but not the skills or experience for Percona’s larger corporate stage.  Peter helpfully explained this to me at one of our less agreeable meetings.  Needful directness is another of his business attributes.

A few years ago, a high-priced consultant flattered our Executive Team by explaining that only 0.4% of all startups reach Percona’s age and size.  That means 99.6% of startups don’t survive at all or end up far smaller than Percona if they endure.

The absolute necessity of constant adaptation to a swirl of change became another lesson.  I saw in Peter a constant lookout for what’s the next opportunity, the next danger, the process to strengthen, etcetera.  My temperament favors stability and predictability, but I learned its limits in a competitive marketplace.

Seizing an unexpected opportunity was part of this lesson. It’s the lesson of taking action now when an opportunity appears, not when it’s convenient. It’s hard to believe, but Percona software wasn’t, at first, part of any grand master plan.  Percona did consulting, period.  But MySQL Ab had let InnoDB development languish for tangled reasons.  Desperate for a workaround, some experts inside MySQL privately asked us to release all of the InnoDB performance patches Percona had accumulated.

These patches were at first bundled as XtraDB, a drop-in replacement for the InnoDB engine.  Later this grew into the Percona Server for MySQL, a drop-in replacement for MySQL.  Ultimately it grew into a full-fledged engineering team with Percona versions of MySQL, MongoDB, and PostgreSQL, plus the Percona Monitoring and Management (PMM) dashboard.  But in a certain sense, Percona’s entire future sprung from seizing one unexpected and seemingly small opportunity.

Peter awards Tom a Percona University PhD!

Peter awards Tom a Percona University Ph.D.!

What does the CEO’s life consist of?  What I witnessed includes –  Endless meetings.  Upset customers.  Aggressive competitors.  Flaming disagreements.  Sudden resignations.  Regulatory swamps.  No money for payroll.  Paying others but not yourself.  Failing products.  Whispering critics.  Continual interruptions.

Crisis after crisis gets lobbed at the CEO, often several at once.   It takes guts just to persevere and to improvise when no visible solution exists.  I recall an early Executive Team meeting that Peter ran for sixteen straight hours trying to deal with everything; then he reconvened us for more of the same after a few hours of sleep.

I hope Peter and Vadim get rich.  Guts should have its reward.  I might not have thought this pre-Percona.  But life in its trenches convinced me.  Expert tech opinion bet heavily against Percona early on.  Voices said there’s no profit in consulting, you’re too niche to survive, VC money is your only hope, and similar refrains.  It takes some guts to stand alone and not cave into the doubters, especially when there’s a faction on your own leadership team chanting “give up now.” Simply surviving is underappreciated for the victory it represents.

How do you explain business risk to those who’ve never lived it?  What does it feel like to have hundreds of families depend on your payroll?  How do you cope knowing that other people pay hard for your own mistakes?  Percona once misjudged and had to quickly lay off 20 people.  It was painful not only for them, since Peter as CEO bore conspicuous responsibility.

I, too, dreamed of VC money as easier than the painful austerity of a bootstrap.  Peter was a fanatic about keeping Percona independent.  Only gradually did I see how Percona’s freedom from VC interference was paying off.  A near-term VC exit strategy looks very different from a lifetime venture strategy.  Perhaps VC oversight would have given Percona more discipline and consistency inside or had other benefits.  But I doubt it would have been worth it.

My career ended as it began.  My boss told me he’d hired a Russian prodigy and asked me to become his manager.  The boss this time was Peter, and the employee was Daniil Bazhenov.  Daniil barely spoke English, but Peter assured me he was a quick learner and that we’d find a way.  We did, along with some amusement.  In another bit of cultural whiplash, Daniil’s worked for me from Ulyanovsk, Lenin’s hometown.

Tom & Daniil on the beach in Punta Cana, 2020

Tom & Daniil on the beach in Punta Cana, 2020

I could ramble on and on.  But I close my blog and professional life with images of so many wonderful people, which flood my mind.

I think of Monty, who opened the door for my MySQL future, and of my wife Kathleen, who trusted me to walk through it.  I think of those very first MySQL employees, Sinisa Milivojevic, Jani Tolonen, Tim Smith, Matt Wagner, Jeremey Cole, and Indrek Siitan, who first wooed me with MySQL’s beauty and excellence.

I think of the MySQL Support Team leaders to whom I owe so much:  Dean Ellis, Salle Kermidarski, Lachlan Mulcahay, Bryan Alsdorf, and Miguel Solorzano.  Joined with them are Mark Leith, Tonci Grgin, Hartmut Holtzgraffe, Victoria Reznichenko, Todd Farmer, Geert Vanderkelen, Domas Mituzas, Harrison Fisk, Hartmut Holzgraefe, Kolbe Kegel, Matt Lord, Shawn Green, Ligaya Isler-Turmelle, and many others to whom I owe a great debt, including Ulf Sandberg, my boss.

Finally, from my MySQL years, I think of David Axmark, Marten Mickos, Kaj Arno, Zack Urlocker, Edwin Desouza, Brian Aker, Boel Larsen, and other key players who, with Monty, navigated MySQL to stunning success. Please forgive me if the fading of time means I’ve omitted a name I should never forget.

And of my Percona years, how can I begin to name with gratitude everyone I ought with whom my Percona life has intersected.  I can only say thank you and again ask forgiveness for any omissions.

First, I think of those who were part of my teams over the years or whom I helped recruit:  Mark Sexton, RIP.  Svetlana Prozhogin.  Kortney Runyan.  Natalie Kesler.  Andrey Maksimov.  Agustin Gallego.  Colin Charles.  Drew Sieman.  Lorraine Pocklington.  Daniil Bazhenov.  Laura Byrnes.  Aleksandra Abramova.  Fredel Mamindra.  Jana Carmack.

I think of those Percona experts with whom I had the chance to closely interact at different times:  Michal Coburn. George Lorch.  Alexander Rubin.  Ovais Tariq.  Marcos Albe.  Alkin Tezuysal.  Marco Tusa.  Liz van Dijk.  Kenny Gryp.  Dimitry Vanoverbeke.  Przemek Malkowski.  Lenz Grimmer.  Ibrar Ahmed.  Tate McDaniel.  Yves Trudeau.  Yura Surokin.  Mykola Marhazan.

I think of those EMT colleagues with whom I shared so many long meetings:  Baron Schwartz.  Bill Schuler.  Ann Schlemmer.  John Breitenfeld.  Matt Yonkovit.  Sam Duffort.  Bennie Grant.  Jim Doherty.  Plus Keith Moulsdale of Whiteford, Taylor, Preston, Percona’s expert legal counsel for many years.

I think of those who’ve moved on but remained friends of Percona:  Peter Farkas.  Ignacio Nin.  Bill Karwin.  Aurimas Mikalauskas.  Sasha Pachev.  Raghu Prabhu.  Peter Schwaller.  Roel van de Paar.  Evgeniy Stepchenko.  Morgan Tocker.  Brian Walters.  Ewen Fortune.  Ryan Lowe.

Tom thanks Vadim for his gift of retirement guidance!

Tom thanks Vadim for his gift of retirement guidance!

And Vadim Tkachenko, how could I almost forget you?!

And Peter Zaitsev.  I am so glad we met.


Modern Web-Based Application Architecture 101

Modern Web-Based Application Architecture 101

Modern Web-Based Application Architecture 101This article is meant to provide a high-level overview of how a web-based application is commonly structured nowadays. Keep in mind the topic presented is very simplified. It is meant as an introduction only and hopefully encourages the reader to investigate some of the concepts in more depth.

Monolith vs Microservices

With the rise of containerization and microservices, applications nowadays tend to be composed of loosely coupled services that interact with each other. For example, an e-commerce site would have separate “microservices” for things like orders, questions, payments, and so on.

In addition to this, data is oftentimes geographically distributed in an effort to bring it closer to the end-user.

For example, instead of having a single database backend, we can have N smaller database backends distributed across different regions. Each of these would hold a portion of the data set related to the users located closer to it.

The Many Faces of Caching

As traffic grows, eventually the database + application servers combination is no longer the best cost-effective way to scale. Instead, we can do so by introducing some additional elements, each of these targeting a specific part of the user experience.

Here’s how things would look like from a single microservice’s perspective:

Modern Application Architecture

Starting from what’s closest to the end-user, let’s briefly describe all these components.

Content Delivery Networks

You can think of the content delivery network (CDN) as a geo-distributed network of proxies with the goal of improving the response times for the end-user.

CDN were traditionally used to cache “static” assets like web objects, downloadable files, streaming media, etc. Nowadays they can also be used to deliver some dynamic content as well.

Popular providers include Akamai, Cloudflare, and Fastly.

HTTP Caches

The HTTP cache or web accelerator layer is typically deployed between the web servers and the CDN.

Their purpose is to reduce the access times using a variety of techniques (like caching, prefetching, compression, etc) while also reducing resource utilization on the application servers.

For example, Varnish is a web application accelerator for content-heavy dynamic websites that caches HTTP requests and functions as a reverse proxy. Other examples include nginx and squid.

Database Caches

Next comes the database cache layer. This usually consists of an in-memory key/value store that stores results of read database queries, allowing to scale reads without introducing additional database servers.

Cache invalidation can be performed explicitly from the application or simply by defining a TTL for each object.

One important point to consider in regards to caching is that there might be application flows that require strict read-after-write semantics.

Some commonly used solutions for database caching are Redis and Memcached.

The Data Layer

Finally, the data layer can include online transaction processing (OLTP) and analytics/reporting (DW) components.

To scale the database, we have to consider reads and writes separately. We can scale reads by introducing replicas, while for writes we can do data partitioning or sharding.

For the data warehouse, there are many possibilities, even using a general-purpose database like MySQL. One might want to also consider a columnar storage, which is typically better for analytic queries (e.g. Clickhouse).

The data warehouse is refreshed periodically via an ETL process. Some of the available solutions for ETL include Fivetran and Debezium.


There are some special considerations for write-intensive services. For example, data sources sending a continuous stream of data, or things like incoming orders which might have “spikes” at certain times.

It is a common pattern to deploy a queueing system (like Activemq), or a more complex streams-processing system (like Apache Kafka) in front of the database layer.

The queue acts as a “buffer” for the incoming data sent by the application. Since we can control the amount of write activity the queue does, we avoid overloading the database when there is a spike of requests.

Final Words

These are just some of the patterns used to architect modern web applications in a cost-effective and scalable way, some of the challenges frequently encountered, and a few of the available solutions to help overcome them.


Mirantis launches cloud-native data center-as-a-service software

Mirantis has been around the block, starting way back as an OpenStack startup, but a few years ago the company began to embrace cloud-native development technologies like containers, microservices and Kubernetes. Today, it announced Mirantis Flow, a fully managed open source set of services designed to help companies manage a cloud-native data center environment, whether your infrastructure lives on-prem or in a public cloud.

“We’re about delivering to customers an open source-based cloud-to-cloud experience in the data center, on the edge, and interoperable with public clouds,” Adrian Ionel, CEO and co-founder at Mirantis explained.

He points out that the biggest companies in the world, the hyperscalers like Facebook, Netflix and Apple, have all figured out how to manage in a hybrid cloud-native world, but most companies lack the resources of these large organizations. Mirantis Flow is aimed at putting these same types of capabilities that the big companies have inside these more modest organizations.

While the large infrastructure cloud vendors like Amazon, Microsoft and Google have been designed to help with this very problem, Ionel says that these tend to be less open and more proprietary. That can lead to lock-in, which today’s large organizations are looking desperately to avoid.

“[The large infrastructure vendors] will lock you into their stack and their APIs. They’re not based on open source standards or technology, so you are locked in your single source, and most large enterprises today are pursuing a multi-cloud strategy. They want infrastructure flexibility,” he said. He added, “The idea here is to provide a completely open and flexible zero lock-in alternative to the [big infrastructure providers, but with the] same cloud experience and same pace of innovation.”

They do this by putting together a stack of open source solutions in a single service. “We provide virtualization on top as part of the same fabric. We also provide software-defined networking, software-defined storage and CI/CD technology with DevOps as a service on top of it, which enables companies to automate the entire software development pipeline,” he said.

As the company describes the service in a blog post published today, it includes “Mirantis Container Cloud, Mirantis OpenStack and Mirantis Kubernetes Engine, all workloads are available for migration to cloud native infrastructure, whether they are traditional virtual machine workloads or containerized workloads.”

For companies worried about migrating their VMware virtual machines to this solution, Ionel says they have been able to move these VMs to the Mirantis solution in early customers. “This is a very, very simple conversion of the virtual machine from VMware standard to an open standard, and there is no reason why any application and any workload should not run on this infrastructure — and we’ve seen it over and over again in many many customers. So we don’t see any bottlenecks whatsoever for people to move right away,” he said.

It’s important to note that this solution does not include hardware. It’s about bringing your own hardware infrastructure, either physical or as a service, or using a Mirantis partner like Equinix. The service is available now for $15,000 per month or $180,000 annually, which includes: 1,000 core/vCPU licenses for access to all products in the Mirantis software suite plus support for 20 virtual machine (VM) migrations or application onboarding and unlimited 24×7 support. The company does not charge any additional fees for control plane and management software licenses.


Confluent CEO Jay Kreps is coming to TC Sessions: SaaS for a fireside chat

As companies process ever-increasing amounts of data, moving it in real time is a huge challenge for organizations. Confluent is a streaming data platform built on top of the open source Apache Kafka project that’s been designed to process massive numbers of events. To discuss this, and more, Confluent CEO and co-founder Jay Kreps will be joining us at TC Sessions: SaaS on Oct 27th for a fireside chat.

Data is a big part of the story we are telling at the SaaS event, as it has such a critical role in every business. Kreps has said in the past the data streams are at the core of every business, from sales to orders to customer experiences. As he wrote in a company blog post announcing the company’s $250 million Series E in April 2020, Confluent is working to process all of this data in real time — and that was a big reason why investors were willing to pour so much money into the company.

“The reason is simple: though new data technologies come and go, event streaming is emerging as a major new category that is on a path to be as important and foundational in the architecture of a modern digital company as databases have been,” Kreps wrote at the time.

The company’s streaming data platform takes a multi-faceted approach to streaming and builds on the open source Kafka project. While anyone can download and use Kafka, as with many open source projects, companies may lack the resources or expertise to deal with the raw open source code. Many a startup have been built on open source to help simplify whatever the project does, and Confluent and Kafka are no different.

Kreps told us in 2017 that companies using Kafka as a core technology include Netflix, Uber, Cisco and Goldman Sachs. But those companies have the resources to manage complex software like this. Mere mortal companies can pay Confluent to access a managed cloud version or they can manage it themselves and install it in the cloud infrastructure provider of choice.

The project was actually born at LinkedIn in 2011 when their engineers were tasked with building a tool to process the enormous number of events flowing through the platform. The company eventually open sourced the technology it had created and Apache Kafka was born.

Confluent launched in 2014 and raised over $450 million along the way. In its last private round in April 2020, the company scored a $4.5 billion valuation on a $250 million investment. As of today, it has a market cap of over $17 billion.

In addition to our discussion with Kreps, the conference will also include Google’s Javier Soltero, Amplitude’s Olivia Rose, as well as investors Kobie Fuller and Casey Aylward, among others. We hope you’ll join us. It’s going to be a thought-provoking lineup.

Buy your pass now to save up to $100 when you book by October 1. We can’t wait to see you in October!


October 13, 2021: Open Mic on Open Source Takes on MongoDB

Percona Open Source MongoDB

Percona Open Source MongoDBHave a burning question about MongoDB?

Database experts will be leading an open forum discussion based on attendees’ interests. Are you ahead of the curve? Just trying to keep up? Get the best of MongoDB.

Live Stream: October 13 at 11:30 am CT (30 min)

Watch this upcoming Open Mic on Open Source to learn the latest from the experts. Will Fromme, Percona Solutions Engineer, and Michal Nosek, Percona Enterprise Architect, will share their insights into what’s facing MongoDB admins right now.

Will and Michal will bring you up to speed on:

  • How to convert a standalone MongoDB Community Edition to Percona Server for MongoDB
  • How to convert an entire replica set running MongoDB Community Edition to Percona Server for MongoDB
  • How to run Kubernetes Operators with Percona Server for MongoDB

Percona’s pro pairing will bring together in-depth operational knowledge of MongoDB, Percona’s open source tools, and a fully-supported MongoDB Community distribution. They’ll also remind you how to best scale a cluster, use the self-healing feature, backup and restore your database, and modify parameters.

Hear what other open source enthusiasts want to know about!

This is an open forum for a reason. If you get stumped sometimes, don’t worry; you’re not alone! Our hosts will take your questions in REAL-TIME. And you can remain anonymous even if you want to ask a question.

Register to Attend!

When you register, our hosts will make sure they provide a webinar worth the half-hour!

Space is limited. Secure your spot now and we’ll see you on October 13 at 11:30 am.


Explosion snags $6M on $120M valuation to expand machine learning platform

Explosion, a company that has combined an open source machine learning library with a set of commercial developer tools, announced a $6 million Series A today on a $120 million valuation. The round was led by SignalFire, and the company reported that today’s investment represents 5% of its value.

Oana Olteanu from SignalFire will be joining the board under the terms of the deal, which includes warrants of $12 million in additional investment at the same price.

“Fundamentally, Explosion is a software company and we build developer tools for AI and machine learning and natural language processing. So our goal is to make developers more productive and more focused on their natural language processing, so basically understanding large volumes of text, and training machine learning models to help with that and automate some processes,” company co-founder and CEO Ines Montani told me.

The company started in 2016 when Montani met her co-founder, Matthew Honnibal in Berlin where he was working on the spaCy open source machine learning library. Since then, that open source project has been downloaded over 40 million times.

In 2017, they added Prodigy, a commercial product for generating data for the machine learning model. “Machine learning is code plus data, so to really get the most out of the technologies you almost always want to train your models and build custom systems because what’s really most valuable are problems that are super specific to you and your business and what you’re trying to find out, and so we saw that the area of creating training data, training these machine learning models, was something that people didn’t pay very much attention to at all,” she said.

The next step is a product called Prodigy Teams, which is a big reason the company is taking on this investment. “Prodigy Teams is [a hosted service that] adds user management and collaboration features to Prodigy, and you can run it in the cloud without compromising on what people love most about Prodigy, which is the data privacy, so no data ever needs to get seen by our servers,” she said. They do this by letting the data sit on the customer’s private cluster in a private cloud, and then use Prodigy Team’s management features in the public cloud service.

Today, they have 500 companies using Prodigy including Microsoft and Bayer in addition to the huge community of millions of open source users. They’ve built all this with just six early employees, a number that has grown to 17 recently (they hope to reach 20 by year’s end).

She believes if you’re thinking too much about diversity in your hiring process, you probably have a problem already. “If you go into hiring and you’re thinking like, oh, how can I make sure that the way I’m hiring is diverse, I think that already shows that there’s maybe a problem,” she said.

“If you have a company, and it’s 50 dudes in their 20s, it’s not surprising that you might have problems attracting people who are not white dudes in their 20s. But in our case, our strategy is to hire good people and good people are often very diverse people, and again if you play by the [startup] playbook, you could be limited in a lot of other ways.”

She said that they have never seen themselves as a traditional startup following some conventional playbook. “We didn’t raise any investment money [until now]. We grew the team organically, and we focused on being profitable and independent [before we got outside investment],” she said.

But more than the money, Montani says that they needed to find an investor that would understand and support the open source side of the business, even while they got capital to expand all parts of the company. “Open source is a community of users, customers and employees. They are real people, and [they are not] pawns in [some] startup game, and it’s not a game. It’s real, and these are real people,” she said.

“They deserve more than just my eyeballs and grand promises. […] And so it’s very important that even if we’re selling a small stake in our company for some capital [to build our next] product [that open source remains at] the core of our company and that’s something we don’t want to compromise on,” Montani said.


myloader Stops Causing Data Fragmentation

myloader Stops Causing Data Fragmentation

myloader Stops Causing Data FragmentationDuring the development of the myloader –innodb-optimize-keys option, which was released in version 0.10.7, we found several issues and opportunities to improve the process. We had to change the approach, reimplement some of the core functionality and add a couple of data structures. That allowed us to implement, at a really low cost, a feature that executes the files that contain INSERT statements, sorted by Primary Key. This is desirable to reduce page splits, which cause on-disk tablespace fragmentation.

In this blog post, I will present the differences in data fragmentation for each version.

Test Details

These are local vm tests as there is no intention to show performance gain.

The table that I used is:

CREATE TABLE `perf_test` (
 `val` varchar(108) DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `val` (`val`(2)),
 KEY `val_2` (`val`(4)),
 KEY `val_3` (`val`(8))

And I inserted the data with:

INSERT INTO perf_test(val) SELECT concat(uuid(),uuid(),uuid()) FROM perf_test;

The graphs below were made with innodb_ruby (more info about it in this blog post) and based on a table of 131K rows with –rows 100. The intention of this test was to create a lot of files that will cause better spread in the Primary Key. The timings are over the same table structure but the table has 32M rows. Finally, I performed the test with 1 and 4 threads and with –innodb-optimize-keys when possible.

Tests Performed

In myloader v0.10.5 there was no file sorting, which is why we can see that lower Primary Key values were updated recently:


It doesn’t matter the number of threads, we can see how pages, across the whole file, are being updated at any time. 

This is happening because mydumper exported the files in order with these min_id and max_id values:

File min_id max_id
test.perf_test.00000.sql 1 21261
test.perf_test.00001.sql 21262 42522
test.perf_test.00002.sql 42523 49137
test.perf_test.00003.sql 65521 85044
test.perf_test.00004.sql 85045 98288
test.perf_test.00006.sql 131056 148827
test.perf_test.00007.sql 148828 170088
test.perf_test.00008.sql 170089 191349
test.perf_test.00009.sql 191350 196591
test.perf_test.00012.sql 262126 276393

But, during import, there was no order, let’s see the log:

** Message: 12:55:12.267: Thread 3 restoring `test`.`perf_test` part 1476. Progress 1 of 1589 .
** Message: 12:55:12.269: Thread 1 restoring `test`.`perf_test` part 87. Progress 2 of 1589 .
** Message: 12:55:12.269: Thread 2 restoring `test`.`perf_test` part 1484. Progress 3 of 1589 .
** Message: 12:55:12.269: Thread 4 restoring `test`.`perf_test` part 1067. Progress 4 of 1589 .
** Message: 12:55:13.127: Thread 1 restoring `test`.`perf_test` part 186. Progress 5 of 1589 .
** Message: 12:55:13.128: Thread 4 restoring `test`.`perf_test` part 1032. Progress 6 of 1589 .

With these max_id and max_id per file:

File min_id max_id
test.perf_test.01476.sql 31381237 31402497
test.perf_test.00087.sql 1849708 1870968
test.perf_test.01484.sql 31551325 31572585
test.perf_test.01067.sql 22685488 22706748
test.perf_test.00186.sql 3954547 3975807
test.perf_test.01032.sql 21941353 21962613

With this kind of insert order, you can only imagine the amount of page splits that cause the fragmentation in the InnoDB datafile.

Timings were:

0.10.5/mydumper/myloader  -t 1 6:52
0.10.5/mydumper/myloader  -t 4 4:55

In v0.10.7-2 we have the same behavior:


But we have a small performance increase:

0.10.7-2/mydumper/myloader  -t 1 6:49 
0.10.7-2/mydumper/myloader  -t 4 4:47

We see the same pattern, even if we use the –innodb-optimize-keys:


The main difference is the index creation stage.

0.10.7-2/mydumper/myloader --innodb-optimize-keys -t 1 6:07 
0.10.7-2/mydumper/myloader --innodb-optimize-keys -t 4 5:53

Now, in v0.10.9, where we have table and file sorting, the graphs have a significant change: 


It is also a bit shocking the difference between the 2 graphs, not about color trending, but about the number of pages used which indicates a high fragmentation when multiple threads are used.

master/mydumper/myloader  -t 1 5:50 
master/mydumper/myloader  -t 4 4:29

Let’s check now with –innodb-optimize-keys:


This is what we are looking for! As you can see with 1 thread is perfect, but with 4 threads there is some odd distribution, but for sure, much better than the other options.

However, the timings are not the best:

master/mydumper/myloader --innodb-optimize-keys -t 1 5:33 
master/mydumper/myloader --innodb-optimize-keys -t 4 5:10

Let’s compare them:

Data       | Index      | Total      | Table 
0 00:05:50 | 0 00:00:00 | 0 00:05:50 | `test`.`perf_test`  -t 1 
0 00:04:29 | 0 00:00:00 | 0 00:04:29 | `test`.`perf_test`  -t 4 
0 00:02:33 | 0 00:02:59 | 0 00:05:33 | `test`.`perf_test`  -t 1 --innodb-optimize-keys 
0 00:02:01 | 0 00:03:09 | 0 00:05:10 | `test`.`perf_test`  -t 4 --innodb-optimize-keys

But that makes sense if you read this blog post. Actually, it would be really nice to have a feature that determines when –innodb-optimize-keys needs to be used.


Version 0.10.9 of MyDumper will allow myloader to insert better than previous versions. Multithreaded inserts sorted by Primary Key are now possible and faster than ever!


Would the math work if Databricks were valued at $38B?

Databricks, the open-source data lake and data management powerhouse has been on quite a financial run lately. Today Bloomberg reported the company could be raising a new round worth at least $1.5 billion at an otherworldly $38 billion valuation. That price tag is up $10 billion from its last fundraise in February when it snagged $1 billion at a $28 billion valuation.

Databricks declined to comment on the Bloomberg post and its possible new valuation.

The company has been growing like gangbusters, giving credence to the investor thesis that the more your startup makes, the more it is likely to make. Consider that Databricks closed 2020 with $425 million in annual recurring revenue, which in itself was up 75% from the previous year.

As revenue goes up so does valuation, and Databricks is a great example of that rule in action. In October 2019, the company raised $400 million at a seemingly modest $6.2 billion valuation (if a valuation like that can be called modest). By February 2021, that had ballooned to $28 billion, and today it could be up to $38 billion if that rumor turns out to be true.

One of the reasons that Databricks is doing so well is it operates on a consumption model. The more data you move through the Databricks product family, the more money it makes, and with data exploding, it’s doing quite well, thank you very much.

It’s worth noting that Databricks’s primary competitor, Snowflake went public last year and has a market cap of almost $83 billion. In that context, the new figure doesn’t feel quite so outrageous, But what does it mean in terms of revenue to warrant a valuation like that. Let’s find out.

Valuation math

Let’s rewind the clock and observe the company’s recent valuation marks and various revenue results at different points in time:

  • Q3 2019: $200 million run rate, $6.2 billion valuation
  • Q3 2020: $350 million run rate, no known valuation change
  • EoY 2020: $425 million run rate, $28 billion valuation (Q1 valuation)
  • Q3 2021: Unclear run rate, possible $38 billion valuation

The company’s 2019 venture round gave Databricks a 31x run rate multiple. By the first quarter of 2021, that had swelled to a roughly 66x multiple if we compare its final 2020 revenue pace to its then-fresh valuation. Certainly software multiples were higher at the start of 2021 than they were in late 2019, but Databricks’s $28 billion valuation was still more than impressive; investors were betting on the company like it was going to be a key breakout winner, and a technology company that would go public eventually in a big way.

To see the company possibly raise more funds would therefore not be surprising. Presumably the company has had a good few quarters since its last round, given its history of revenue accretion. And there’s only more money available today for growing software companies than before.

But what to make of the $38 billion figure? If Databricks merely held onto its early 2021 run rate multiple, the company would need to have reached a roughly $575 million run rate, give or take. That would work out to around 36% growth in the last two-and-a-bit quarters. That works out to less than $75 million in new run rate per quarter since the end of 2020.

Is that possible? Yeah. The company added $75 million in run rate between Q3 2020 and the end of the year. So you can back-of-the-envelope the company’s growth to make a $38 billion valuation somewhat reasonable at a flat multiple. (There’s some fuzz in all of our numbers, as we are discussing rough timelines from the company; we’ll be able to go back and do more precise math once we get the Databricks S-1 filing in due time.)

All this raises the question of whether Databricks should be able to command such a high multiple. There’s some precedent. Recently, public software company Monday.com has a run rate multiple north of 50x, for example. It earned that mark on the back of a strong first quarter as a public company.

Databricks securing a higher multiple while private is not crazy, though we wonder if the data-focused company is managing a similar growth rate. Monday.com grew 94% on a year-over-year basis in its most recent quarter.

All this is to say that you can make the math shake out for Databricks to raise at a $38 billion valuation, but built into that price is quite a lot of anticipated growth. Top quartile public software companies today trade for around 23x their forward revenues, and around 27x their present-day revenues, per Bessemer. To defend its possible new valuation when public, then, leaves quite a lot of work ahead of Databricks.

The company’s CEO, Ali Ghodsi, will join us at TC Sessions: SaaS on October 27th, and we should know by then if this rumor is, indeed true. Either way, you can be sure we are going to ask him about it.


Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com