Jan
22
2021
--

Drupal’s journey from dorm-room project to billion-dollar exit

Twenty years ago Drupal and Acquia founder Dries Buytaert was a college student at the University of Antwerp. He wanted to put his burgeoning programming skills to work by building a communications tool for his dorm. That simple idea evolved over time into the open-source Drupal web content management system, and eventually a commercial company called Acquia built on top of it.

Buytaert would later raise over $180 million and exit in 2019 when the company was acquired by Vista Equity Partners for $1 billion, but it took 18 years of hard work to reach that point.

When Drupal came along in the early 2000s, it wasn’t the only open-source option, but it was part of a major movement toward giving companies options by democratizing web content management.

Many startups are built on open source today, but back in the early 2000s, there were only a few trail blazers and none that had taken the path that Acquia took. Buytaert and his co-founders decided to reduce the complexity of configuring a Drupal installation by building a hosted cloud service.

That seems like a no-brainer now, but consider at the time in 2009, AWS was still a fledgling side project at Amazon, not the $45 billion behemoth it is today. In 2021, building a startup on top of an open-source project with a SaaS version is a proven and common strategy. Back then nobody else had done it. As it turned out, taking the path less traveled worked out well for Acquia.

Moving from dorm room to billion-dollar exit is the dream of every startup founder. Buytaert got there by being bold, working hard and thinking big. His story is compelling, but it also offers lessons for startup founders who also want to build something big.

Born in the proverbial dorm room

In the days before everyone had internet access and a phone in their pockets, Buytaert simply wanted to build a way for him and his friends to communicate in a centralized way. “I wanted to build kind of an internal message board really to communicate with the other people in the dorm, and it was literally talking about things like ‘Hey, let’s grab a drink at 8:00,’” Buytaert told me.

He also wanted to hone his programming skills. “At the same time I wanted to learn about PHP and MySQL, which at the time were emerging technologies, and so I figured I would spend a few evenings putting together a basic message board using PHP and MySQL, so that I could learn about these technologies, and then actually have something that we could use.”

The resulting product served its purpose well, but when graduation beckoned, Buytaert realized if he unplugged his PC and moved on, the community he had built would die. At that point, he decided to move the site to the public internet and named it drop.org, which was actually an accident. Originally, he meant to register dorp.org because “dorp” is Dutch for “village or small community,” but he mistakenly inverted the letters during registration.

Buytaert continued adding features to drop.org like diaries (a precursor to blogging) and RSS feeds. Eventually, he came up with the idea of open-sourcing the software that ran the site, calling it Drupal.

The birth of web content management

About the same time Buytaert was developing the basis of what would become Drupal, web content management (WCM) was a fresh market. Early websites had been fairly simple and straightforward, but they were growing more complex in the late 90s and a bunch of startups were trying to solve the problem of managing them. Buytaert likely didn’t know it, but there was an industry waiting for an open-source tool like Drupal.

Jan
20
2021
--

Drain Kubernetes Nodes… Wisely

Drain Kubernetes Nodes Wisely

Drain Kubernetes Nodes WiselyWhat is Node Draining?

Anyone who ever worked with containers knows how ephemeral they are. In Kubernetes, not only can containers and pods be replaced, but the nodes as well. Nodes in Kubernetes are VMs, servers, and other entities with computational power where pods and containers run.

Node draining is the mechanism that allows users to gracefully move all containers from one node to the other ones. There are multiple use cases:

  • Server maintenance
  • Autoscaling of the k8s cluster – nodes are added and removed dynamically
  • Preemptable or spot instances that can be terminated at any time

Why Drain?

Kubernetes can automatically detect node failure and reschedule the pods to other nodes. The only problem here is the time between the node going down and the pod being rescheduled. Here’s how it goes without draining:

  1. Node goes down – someone pressed the power button on the server.
  2. kube-controller-manager

    , the service which runs on masters, cannot get the

    NodeStatus

    from the

    kubelet

    on the node. By default it tries to get the status every 5 seconds and it is controlled by

    --node-monitor-period

    parameter of the controller.

  3. Another important parameter of the
    kube-controller-manager

    is

    --node-monitor-grace-period

    , which defaults to 40s. It controls how fast the node will be marked as

    NotReady

    by the master.

  4. So after ~40 seconds
    kubectl get nodes

    shows one of the nodes as

    NotReady

    , but the pods are still there and shown as running. This leads us to

    --pod-eviction-timeout

    , which is 5 minutes by default (!). It means that after the node is marked as

    NotReady

    , only after 5 minutes Kubernetes starts to evict the Pods.

Drain Kubernetes Nodes

So if someone shuts down the server, then only after almost six minutes (with default settings), Kubernetes starts to reschedule the pods to other nodes. This timing is also valid for managed k8s clusters, like GKE.

These defaults might seem to be too high, but this is done to prevent frequent pods flapping, which might impact your application and infrastructure in a far more negative way.

Okay, Draining How?

As mentioned before – draining is the graceful method to move the pods to another node. Let’s see how draining works and what pitfalls are there.

Basics

kubectl drain {NODE_NAME}

command most likely will not work. There are at least two flags that need to be set explicitly:

  • --ignore-daemonsets

    – it is not possible to evict pods that run under a DaemonSet. This flag ignores these pods.

  • --delete-emptydir-data

    – is an acknowledgment of the fact that data from EmptyDir ephemeral storage will be gone once pods are evicted.

Once the drain command is executed the following happens:

  1. The node is cordoned. It means that no new pods can be placed on this node. In the Kubernetes world, it is a Taint
    node.kubernetes.io/unschedulable:NoSchedule

    placed on the node that most of the pods tolerate.

  2. Pods, except the ones that belong to DaemonSets, are evicted and hopefully scheduled on another node.

Pods are evicted and now the server can be powered off. Wrong.

DaemonSets

If for some reason your application or service uses a DaemonSet primitive, the pod was not drained from the node. It means that it still can perform its function and even receive the traffic from the load balancer or the service. 

The best way to ensure that it is not happening – delete the node from the Kubernetes itself.

  1. Stop the
    kubelet

    on the node.

  2. Delete the node from the cluster with
    kubectl delete {NODE_NAME}

If

kubelet

is not stopped, the node will appear again after the deletion.

Pods are evicted, node is deleted, and now the server can be powered off. Wrong again.

Load Balancer

Here is quite a standard setup:

kubernetes Load Balancer

The external load balancer sends the traffic to all Kubernetes nodes. Kube-proxy and Container Network Interface internals are dealing with routing the traffic to the correct pod.

There are various ways to configure the load balancer, but as you see it might be still sending the traffic to the node. Make sure that the node is removed from the load balancer before powering it off. For example, AWS node termination handler does not remove the node from the Load Balancer, which causes a short packet loss in the event of node termination.

Conclusion

Microservices and Kubernetes shifted the paradigm of systems availability. SRE teams are focused on resilience more than on stability. Nodes, containers, and load balancers can fail, but they are ready for it. Kubernetes is an orchestration and automation tool that helps here a lot, but there are still pitfalls that must be taken care of to meet SLAs.

Jan
20
2021
--

Keeping Open Source Open, or, Why Open is Better

Keeping Open Source Open

Last week Elastic announced that they were “Doubling Down” on open source by changing their licensing to a non-open license – MongoDB’s Server Side Public License, or SSPL.  Let me clarify in my opinion this is not doubling down – unless, as our good friend @gabidavila highlighted, that maybe the thinking was a double negative makes a positive? VM Brasseur posted on her blog that she feels Elastic and Kibana are now a business risk for enterprises.  Peter Zaitsev has penned why he felt SSPL was bad for you before this announcement and then sat down with me to discuss his thoughts last week as well.

This is not the first and regrettably, this won’t be the last “open” company to run away from open source or the model which got them as a company to where they are today.

Why Do Some Companies Have Open Source Problems Today?

I personally have no direct problem with companies selling proprietary software. I have no problem with companies selling software with new licensing, and I have no problem with cloud services.  I simply feel that open is better. Open gives not only paying customers but the entire community a better and more viable product. It’s ok if you disagree with that.  It is ok if you as a company, consumer, or other entity, choose proprietary software or some other model, I won’t hate on you. What gets me angry though is when people wrap themselves in the veil of openness, ride the coattails of open software, pretending to be something they are not, and fool customers into getting locked in.  Companies that use open source as their gateway drug to get your applications completely dependent on paying them forever or risk massive costs migrating away.

Let Me Give You an “Outside of Tech” Example

My father in law worked on the line for GM for over 30 years. One of the things he really enjoyed when he was not working was golfing.  He traveled the state he lived in and tried all kinds of courses, but there was one, in particular, he really liked.  It was a resort in northern Michigan that had four golf courses with plans to add a couple of more.  He decided to plan his retirement by buying a lot on one of the soon to be built golf courses after painstaking research. He was going to build his dream house and spend his retirement golfing.  He was so thrilled about the plans he would drive me up and force me to trudge into the woods to show me his future spot.

A few years later as he was nearing retirement the resort announced that they would no longer be building the extra courses and would not invest in the infrastructure to build the homes/community where he had invested.  My father in law, who invested so much in this, was left with worthless land in the middle of nowhere.  Sure, he could retire somewhere else, but he invested here and now there is a terrible cost to try that again.

It’s similar to the current trend in open source.  You adopt open source because you have options, you can get things started easily, and if you love the product you can expand.  You do your research, you test, you weigh your options, finally, you commit.  You invest in building out the infrastructure using the open products, knowing you have options. Open source allows you to be agile, to contribute, to fix, and enhance.  Then, one day, the company says, “Sorry, we are now only sort of open and you have to play by the new rules.”

The Impact of Financing

So what the hell is happening? A few things actually.  The first is that open source companies scored some big exits and valuations over the last 15 years. Everyone loves a winner. The rush to emulate the success of those early companies and models was viewed by many entrepreneurs and investors as the new digital gold rush. Let’s look at this from an investor’s point of view for a second:

  • Free labor from the community = higher profit margins
  • Free advertising from the community = higher profit margins
  • Grassroots adoption and product evolution = better products and higher profit margins
  • Community-led support = better profit margins
  • People adopt, then they need help or features … so we can sell them a better more stable version (open core) = better profit margins

So investors poured in.  Many open source startups were started not because the founders believed in open source, but it was a quick and very popular way to get funding. They could potentially gain a huge audience and quickly monetize that audience. Don’t take my word for it, the CEO of MongoDB Inc admits it:

“[W]e didn’t open source it to get help from the community, to make the product better. We open sourced as a freemium strategy; to drive adoption.” – MongoDB CEO Dev Ittycheria

If you look at their publicly available Wall Street analyst briefings, they talk about their strategy around “Net Expansion” and their model is predicated on getting 20-25% expansion from their existing customer base.  Unless you spend more the model breaks.  The stock price continues to rise and the valuation of the company continues to grow despite not being able to be profitable.  In the stock market today growth – delivered by getting more users, beating earnings, squeezing your customer base, and locking you in – is rewarded with bigger paydays. Again let’s pull a quote, this time from a Wall Street analyst:

In Billy Duberstein’s article, If You Invested $1000 in MongoDB’s IPO, This Is How Much Money You’d Have Now, he wrote:

 “…the database is a particularly attractive product, even by enterprise-software standards, because it stores and organizes all or part of a large corporation’s most important data. If a company wanted to switch vendors, it would have to lift all of that data from the old database and insert it into a new one. That’s not only a huge pain; it’s also terribly risky, should any data get lost. Therefore, most companies tend to stick with their database vendor over time, even if that vendor raises prices. That’s how Oracle became such a tech powerhouse throughout the 1990s.”

Essentially, database companies are a good investment because, as well as storing your most important data, once your data is captured it’s painful and risky for users to switch. While this is great for investors, it is not always good news for enterprise customers or any consumers.

Making the Right Decisions for Your Customers

This brings in the debate around retention, expansion, and stickiness. Retention in a pure open source business is difficult. I won’t lie about it.  You have to continually up your game in support, services, features, and tooling.  It is hard!  Really hard!  Because in a pure open-source model, the user can always choose to self-support.  So your offering has to be so compelling that companies want to pay for it (or they view it as “insurance”).

This has led to a discussion that occurs at every open source company in the world.  How do we generate “stickiness.”  Stickiness is the term given to the efforts around keeping paying customers as paying customers.  Here’s an example: MySQL AB had a problem with the “stickiness” of MySQL Enterprise, so they developed the MySQL Enterprise Monitor.  People who used this tool loved it, so they kept paying for MySQL Enterprise. This approach is the right one, as it helps customers that need functionality or services without penalizing the community.

Open core is another form of “Stickiness” – If you want to use these features you have to pay. While stickiness is not always bad, if you have such an awesome product and service offering that people want to pay you, then great. However, oftentimes this stickiness is basically vendor lock-in. It’s a fine line between creating compelling and awesome features and locking your customers in, but that line is there.

Ask yourself, if you wanted to do this yourself without paying the licensing fees how hard would it be?  If you would have to rearchitect or redesign large portions of your application, or you need to reinstall everything, you are locked in. In other words, companies taking this approach do something unique that the community cannot replicate. If it would take additional time and resources it is a value-added service that makes you more efficient.  Using the MySQL Enterprise monitor example, you can monitor MySQL without this specific feature, it just was a value add on that made it easier and more efficient. It created stickiness for the product but without lock-in.

Companies start to be focused on shareholder value and increasing the stock price at all costs.  The mantra I have heard from CEOs is, “I work for the shareholders, I do what’s best for them.”  In many areas, this conflicts with open source communities and culture.

When I work on an open-source project I view it like I am part of the project or part of the team, even if I am not paid.  I am contributing because I love the product, I love the community, and I want to make a difference. As a user of open source, my motivations tend to be a bit different.  I want software that is evolving, that is tested by millions around the world. I want the option and freedom to deploy anywhere at any time. I want incremental improvement, and I only want to pay if I need to.

This is where there is a disconnect between shareholders and the community and users comes in.  If you’re looking at the priority list, shareholder value in many of these companies will come at the expense of the needs of the community or the users.  Peter Zaitsev founded Percona in 2006 because MySQL AB started that shift to shareholder first, and it deeply upset him.

When Circumstances Change

MongoDB started as a shareholder value first company, focused on the “freemium” model.  I think Elastic falls into a second category of companies.  What are some of the characteristics of this second category?

These companies start off as open.  They do build the company on free and open principles and they put users and community as a higher priority.  But, as they take funding or go public, the demands of revenue, profit, increased expansion by shareholders get louder and louder.  These companies start bringing in industry veterans from big companies that “know how software companies should be built.” Oftentimes these executives have little or no background in open source software development or community development.

Executive pedigree matters a lot in valuation and Wall Street acceptance, so coming from a multi-billion dollar company makes a huge difference.  Every hire a company makes changes culture. These companies end up not only having external shareholder pressure but also pressure from the executive management team to focus on increasing profits and revenue.

Keep in mind that this shareholder pressure is nothing new. This is how “Big” business tends to work.  In the 1970s and 1980s, American car companies sacrificed quality and safety for better margins. In the 90s and 00s, airlines reduced seat size, eliminated “amenities” like free baggage and magazines, and added charges for things that used to be included.  Sacrificing the benefits and features of the users or customers to make more money is nothing new.  The pressure to get more growth or revenue is always there. In fact today we can often see companies “Meet Earning Expectations”  or in some cases beat expectations – and still lose value.

As I outlined above, we are now in an environment where the expectation is to beat revenue estimates, continue to show massive growth, and focus on exceeding the expectations of your shareholders. Outside risk factors that could cause volatility or risk slowing growth need to be addressed. For many database companies, the cloud is both a massive risk as well as a massive opportunity to deliver shareholder value.  This is especially true for companies that are not profitable. Wall Street is banking on eventual profitability just like eventual consistency.  If you look at the margins and spending from both MongoDB and Elastic, they are losing money.  They are spending on sales and marketing to capture a larger market share which they hope will lead to expansion dollars and then profitability.

Why Is This Market Special?

Database vendors have been calling out cloud vendors for a couple of years.  The big complaint has been that cloud vendors take advantage of open source licensing by offering “Database as a Service” using their software without paying anything back.   The open nature of some licensing puts cloud providers completely within their rights to do this.  Similarly, however cloud providers do offer other open source software (like Linux) through their services without the same sort of level of blowback.

In the case of MongoDB and Elastic (and others I am sure), it is not a simple matter of database providers withering and dying due to the stranglehold of the cloud.  In fact, they probably make more money from the cloud being available than they would without it.  More than half the open source databases we support here at Percona are in the cloud. Coupled with the fact that both MongoDB and Elastic are growing their quarterly and yearly revenue, it hardly seems like they are on the ropes.  So why the angst?

A couple of theories:  first, the loss of “Potential Revenue” and seeing the money and revenue the other guys are making and you are not getting any of it.  Second, back to growth. Increasing shareholder value and getting to the magic break-even number is job #1.  If they can win or claim the business other vendors have, they can accelerate that growth. As mentioned, both companies are overspending for growth now, so the faster they can reach profitability the better.

In my opinion, the cloud is the convenient excuse here for trying to accelerate growth and increase shareholder value. “If the cloud played fair, our growth would be 40% instead of 30%! Think of how much more our stock would be worth!”

Let’s not lie about it.  Companies are switching licensing not for the good of the community or their users but for the good of their bottom line and in an effort to improve their true overlords:  Shareholders and Investors.  There is nothing wrong with a for-profit company focused on shareholders. Just don’t lie about it and pretend to be something you are not.  In my opinion, open source driven by an active community of contributors, users, and inventors leads to better quality, faster innovation, and more freedom.

The user base, the contributors, and the “community” as a whole got the most successful open source companies where they are today. Communities are built on trust and goodwill.  Open Source is about innovation, freedom, and collaboration.  These recent changes hurt all three.  They turn what were iconic open source companies into just another company focused on the shareholder over the users, customers, and community that got them to where they are.

I will leave you with this.  As a user or a paying customer of any product, but certainly, as an active member of a community, my position is built on trust.  I may not agree with all changes a community, project, or company makes but I want them to be open, honest, and transparent.  I also want companies to show me with consistent actions, and not tell me in empty words, how I will be impacted.  It is very hard to trust and build respect when you say one thing and do another.

Using Elastic as an example here – and thanks to Corey Quinn on his Twitter feed for pointing this out –  over a year ago Elastic changed their licensing to a more open core model.  At the time they told us this via Github release notes:

Github elastic

The last time they made changes that upset the community, they promised they would leave these products open.  A year later and the license changed.  This leaves those users and community members of the product wondering what will change next?  Can we trust that this is it?  Can we trust when they say really SSPL is “just as good”?  And that they won’t move the line of what is acceptable in the future?

I am going to stand up (as many of you are) for more innovation, more freedom of choice, and certainly more truly open software!

Jan
14
2021
--

Thinking About Deploying MongoDB? Read This First.

Deploying MongoDB

Deploying MongoDBAre you thinking about deploying MongoDB? Is it the right choice for you?

Choosing a database is an important step when designing an application. A wrong choice can have a negative impact on your organization in terms of development and maintenance. Also, the wrong choice can lead to poor performance.

Generally speaking, any kind of database can manage any kind of workload, but any database has specific workloads that fit better than others.

You don’t have to consider MongoDB just because it’s cool and there’s already a lot of companies using it. You need to understand if it fits properly with your workload and expectations. So, choose the right tool for the job.

In this article, we are going to discuss a few things you need to know before choosing and deploying MongoDB.

MongoDB Manages JSON-style Documents and Developers Appreciate That

The basic component of a MongoDB database is a JSON-style document. Technically it is BSON, which contains some extra datatypes (eg. datetime) that aren’t legit JSON.

We can consider the document the same as a record for a relational database. The documents are put into a collection, the same concept as a relational table.

JSON-style documents are widely used by a lot of programmers worldwide to implement web services, applications, and exchange data. Having a database that is able to manage that data natively is really effective.

MongoDB is often appreciated by developers because they can start using it without having specific knowledge about database administration and design and without studying a complex query language. Indeed, the MongoDB query language is also represented by JSON documents.

The developers can create, save, retrieve, and update their JSON-style documents at ease. Great! This leads usually to a significant reduction in development time.

MongoDB is Schemaless

Are you familiar with relational databases? For sure you are, as relational databases are used and studied for such a long time at school and at university. Relational databases are the most widely used in the market nowadays.

You know that a relational schema needs a predefined and fixed structure for the tables. Any time you add or change a column you need to run a DDL query and additional time is necessary to also change your application code to manage the new structure. In the case of a massive change that requires multiple column changes and/or the creation of new tables, the application changes could be impressive. MongoDB’s lack of schema enforcement means none of that is required. You just insert a document in a collection and that’s all. Let suppose that you have a collection with user data. If at some point you need to add for example the new “date_of_birth” field, you simply start to insert the new JSON documents with the additional field. That’s all. No need to change anything on the schema.

You can insert into the same collection even completely different JSON documents, representing different entities. Well, this is technically feasible, but not recommended, anyway.

MongoDB greatly shortens the cycle of application development for a non-technology reason – it removes the need to coordinate a schema change migration project with the DBA team. There is no need to wait until the DBA team does a QA dress-rehearsal and then the production release (with rollback plans) that, often as not, requires some production downtime.

MongoDB Has No Foreign Keys, Stored Procedures, or Triggers. Joins Are Supported, but Untypical.

The database design requires SQL queries to be able to join multiple tables on specific fields. Also, the database design may require foreign keys for assuring the consistency of the data and for running automatic changes on semantically connected fields.

What about stored procedures? They can be useful to embed into the database some application logic to simplify some tasks or to improve the security.

And what about triggers? They are useful to automatically “trigger” changes on the data based on specific events, like adding/changing/deleting a row. They help to manage the consistency of the data and, in some cases, to simplify the application code.

Well, none of them is available on MongoDB. So, be aware of that.

Note: to be honest, there’s an aggregation stage that can implement the same of a LEFT JOIN, but this is the only case.

How to survive without JOIN?

Managing JOINs must be done on your application code. If you need to join two collections, you need to read the first one, selects the join field and use it for querying the second collection, and so on. Well, this seems to be expensive in terms of application development, and also this could lead to more queries executed. Indeed it is, but the good news is that in many cases you don’t have to manage the joins at all.

Remember that MongoDB is a schemaless database; it doesn’t require normalization. If you are able to properly design your collections, you can embed and duplicate data into a single collection without the need of creating an additional collection. This way you won’t need to run any join because all the data you need is already into one collection only.

Foreign keys are not available, but as long as you can embed multiple documents into the same collection, you don’t really need them.

Stored procedures can be implemented easily as external scripts you can write in your preferred language. Triggers can be implemented externally the same way, but with the help of the Change Stream API feature connected to a collection.

If you have a lot of collections with referenced fields, you have to implement in your code a lot of joins or you have to do a lot of checks to assure consistency. This is possible but at a higher cost in terms of development. MongoDB could be the wrong choice in such a case.

MongoDB Replication and Sharding Are Easy to Deploy

MongoDB was natively designed not as a standalone application. It was designed instead to be a piece of a larger puzzle. A mongod server is able to work together with other mongod instances in order to implement replication and sharding efficiently and without the need for any additional third-party tool.

A Replica Set is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability by design. With caveats regarding potentially stale data, you also get read scalability for free. It should be the basis for all production deployments.

The Sharding Cluster is deployed as a group of several Replica Sets with the capability to split and distribute the data evenly on them. The Sharding Cluster provides write scalability in addition to redundancy, high availability, and read scalability. The sharding topology is suitable for the deployment of very large data sets. The number of shards you can add is, in theory, unlimited.

Both the topologies can be upgraded at any time by adding more servers and shards. More importantly, no changes are required for the application since each topology is completely transparent from the application perspective.

Finally, the deployment of such topologies is straightforward. Well, you need to spend some time in the beginning to understand a few basic concepts, but then, in a matter of a few hours, you can deploy even a very large sharded cluster. In the case of several servers, instead of doing everything manually, you can automatize a lot of things using Ansible playbooks or other similar tools.

Further readings:

Deploy a MongoDB Replica Set with Transport Encryption (Part 1)

MongoDB Sharding 101 Webinar

MongoDB Has Indexes and They Are Really Important

MongoDB allows you to create indexes on the JSON document’s fields. Indexes are used the same way as a relational database. They are useful to solve queries faster and to decrease the usage of machine resources: memory, CPU time, and disk IOPS.

You should create all the indexes that will help any of the regularly executed queries, updates, or deletes from your application.

MongoDB has a really advanced indexing capability. It provides TLL indexes, GEO Spatial indexes, indexes on array elements, partial and sparse indexes. If you need more details about the available index types, you can take a look at the following articles:

MongoDB Index Types and MongoDB explain() (part 1)

Using Partial and Sparse Indexes in MongoDB

Create all the indexes you need for your collections. They will help you a lot to improve the overall performance of your database.

MongoDB is Memory Intensive

MongoDB is memory intensive; it needs a lot. This is the same for many other databases. Memory is the most important resource, most of the time.

MongoDB uses the RAM for caching the most frequently and recently accessed data and indexes. The larger this cache, the better the overall performance will be, because MongoDB will be able to retrieve a lot of data faster. Also, MongoDB writes are only committed to memory before client confirmation is returned, at least by default. Writes to disk are done asynchronously – first to the journal file (typically within 50ms), and later into the normal data files (once per min).

The widely used storage engine used by MongoDB is WiredTiger. In the past there was MMAPv1, but it is no longer available on more recent versions. The WiredTiger storage engine uses an important memory cache (the WiredTiger Cache) for caching data and indexes.

Other than using the WTCache, MongoDB relies on the OS file system caches for accessing the disk pages. This is another important optimization, and significant memory may be required also for that.

In addition, MongoDB needs memory for managing other stuff like client connections, in-memory sortings, saving temporary data when executing aggregation pipelines, and other minor things.

In the end, be prepared to provide enough memory to MongoDB.

But how much memory should I need? The rule of thumb is evaluating the “working set” size.

The “working set” is the amount of data that is most frequently requested by your application. Usually, an application needs a limited amount of data, it doesn’t need to read the entire data set during normal operations. For example, in the case of time-series data, most probably you need to read only the last few hours or the last few day’s entries. Only on a few occasions will you need to read legacy data. In such a case, your working set is the one that can store just a few days of data.

Let’s suppose your data set is 100GB and you evaluated your working set is around 20%, then you need to provide at least 20GB for the WTCache.

Since MongoDB uses by default 50% of the RAM for the WTCache (we usually suggest not to increase it significantly), you should provide around 40GB of memory in total for your server.

Every case is different and sometimes it could be difficult to evaluate correctly the working set size. Anyway, the main recommendation is that you should spend a significant part of your budget to provide the larger memory you can. For sure, this is will be beneficial for MongoDB.

What Are the Suitable Use Cases for MongoDB?

Actually, a lot. I have seen MongoDB deployed on a wide variety of environments.

For example, MongoDB is suitable for:

  • events logging
  • content management
  • gaming applications
  • payment applications
  • real-time analytics
  • Internet Of Things applications
  • content caching
  • time-series data applications

And many others.

We can say that you can use MongoDB basically for everything, it is a general-purpose database. The key point is instead the way you use it.

For example, if you plan to use MongoDB the same way as a relational database, with data normalized, a lot of collections around, and a myriad of joins to be managed by the application, then MongoDB is not the right choice for sure. Use a relational database.

The best way to use MongoDB is to adhere to a few best practices and modeling the collections keeping in mind some basic rules like embedding documents instead of creating multiple referenced collections.

Percona Server for MongoDB: The Enterprise-Class Open Source Alternative

Percona develops and deploys its own open source version of MongoDB: the Percona Server for MongoDB (PSMDB).

PSMDB is a drop-in replacement for MongoDB Community and it is 100% compatible. The great advantage provided by PSMDB is that you can get enterprise-class features for free, like:

  • encryption at the rest
  • audit logging
  • LDAP Authentication
  • LDAP Authorization
  • Log redaction
  • Kerberos Authentication
  • Hot backup
  • in-memory storage engine

Without PSMDB all these advanced features are available only in the MongoDB Enterprise subscription.

Please take a look at the following links for more details about PSMDB:

Percona Server for MongoDB Feature Comparison

Percona Server for MongoDB

Remember you can get in touch with Percona at any time for any details or for getting help.

Conclusion

Let’s have a look at the following list with the more important things you need to check before choosing MongoDB as the backend database for your applications. The three colored flags indicate if MongoDB is a good choice: red means it’s not, orange means it could be a good choice but with some limitations or potential bottlenecks, green means it’s very good.

Your applications primarily deal with JSON documents
Your data has unpredictable and frequent schema changes during the time
You have several collections with a lot of external references for assuring consistency and the majority of the queries need joins
You need to replicate stored procedures and triggers you have in your relational database
You need HA and read scalability
You need to scale your data to a very large size
You need to scale because of a huge amount of writes

 

And finally, remember the following:

Take a look at Percona Server for MongoDB 

Jan
13
2021
--

Running Kubernetes on the Edge

Running Kubernetes on the Edge

Running Kubernetes on the EdgeWhat is Edge

Edge is a buzzword that, behind the curtain, means moving private or public clouds closer to the end devices. End devices, such as the Internet of Things (from a doorbell to a VoIP station), become more complex and require more computational power.  There is a constant growth of connected devices and by the end of 2025, there will be 41.6 billion of them, generating 69.4 Zettabytes of data.

Latency, speed of data processing, or security concerns do not allow computation to happen in the cloud. Businesses rely on edge computing or micro clouds, which can run closer to the end devices. All this constructs the Edge.

How Kubernetes Helps Here

Containers are portable and quickly becoming a de facto standard to ship software. Kubernetes is a container orchestrator with robust built-in scaling capabilities. This gives the perfect toolset for businesses to shape their Edge computing with ease and without changing existing processes.

The cloud-native landscape has various small Kubernetes distributions that were designed and built for the Edge: k3s, microk8s, minikube, k0s, and newly released EKS Distro. They are lightweight, can be deployed with few commands, and are fully conformant. Projects like KubeEdge bring even more simplicity and standardization into the Kubernetes ecosystem on the Edge.

Running Kubernetes on the Edge also poses the challenge to manage hundreds and thousands of clusters. Google Anthos, Azure Arc, and VMWare Tanzu allow you to run your clusters anywhere and manage them through a single interface with ease.

Topologies

We are going to review various topologies that Kubernetes provides for the Edge to bring computation and software closer to the end devices.

The end device is a Kubernetes cluster

Kubernetes cluster edge

Some devices run complex software and require multiple components to operate – web servers, databases, built-in data-processing, etc. Using packages is an option, but compared to containers and automated orchestration, it is slow and sometimes turns the upgrade process into a nightmare. In such cases, it is possible to run a Kubernetes cluster on each end device and manage software and infrastructure components using well-known primitives.

The drawback of this solution is the overhead that comes from running etcd and masters’ components on every device.

The end device is a node

The end device is a node kubernetes

In this case, you can manage each end device through a single Kubernetes control plane. Deploying software to support and run your phones, printers or any other devices can be done through standard Kubernetes primitives.

Micro-clouds

kubernetes Micro-clouds

This topology is all about moving computational power closer to the end devices by creating micro-clouds on the Edge. Micro-cloud is formed by the Kubernetes nodes on the server farm on the customer premises. Running your AI/ML (like Kubeflow) or any other resource-heavy application in your own micro-cloud is done with Kubernetes and its primitives.

How Percona Addresses Edge Challenges

We at Percona continue to invest in the Kubernetes ecosystem and expand our partnership with the community. Our Kubernetes Operators for Percona XtraDB Cluster and MongoDB are open source and enable anyone to run production-ready MySQL and MongoDB databases on the Edge.

Check out how easy it is to deploy our operators on Minikube or EKS Distro (which is similar to microk8s). We are working on furthering Day 2 operations simplification and in future blog posts, you will see how to deploy and manage databases on multiple Kubernetes clusters with KubeApps.

Jan
13
2021
--

Percona 2020 Recap: Great Content and Software Releases

Percona 2020 content and releases

Percona 2020 content and releasesThe Percona team provided the community with some excellent content and several new releases in 2020. I wanted to highlight some of your favorites (based on popularity) if you missed them.

First up is our most-read blog from last year, which ironically was published before 2020. Ananias Tsalouchidis’s blog on when you should use Aurora and when should you use RDS MYSQL continued to attract readers all year long. People don’t always understand the key differences between the two, so having a guide is great and timely for many.

What about the most read blogs or watched videos published in 2020?

PostgreSQL Takes Our Most-Read Spot of 2020

The Percona blog is known for its great in-depth MySQL coverage, but experts in the MongoDB and PostgreSQL space have also written some quality content over the last few years. It is exciting to see that the most popular blog published last year was outside of MySQL: Ibrar Ahmed’s deep dive into handling null values in PostgreSQL.

Interested in the top six PostgreSQL reads from 2020? Here they are:

We also had some fantastic conference talks this year you may want to check out. Here are the most-watched PostgreSQL videos of 2020:

Awesome PostgreSQL talks and blogs from the community:

Our Percona University Online posted its first PostgreSQL training last year; if you are looking for a deeper understanding of indexes (and who isn’t), check out our training, Deep Dive Into PostgreSQL Indexes.

MySQL is Still as Popular as Ever

Even though PostgreSQL took this year’s top spot, not too far behind was a great blog series by our CEO Peter Zaitsev on solving MySQL bottlenecks. His three-part series, 18 things you can do to remove MySQL Bottlenecks caused by high traffic, was not only highly read, but it also spawned one of the most-watched webinars of the year. Scalability and performance are critical to any application and can mean life or death for any application. A vital read and a great link to bookmark for when you have one of those odd performance issues you can not seem to find!

Interested in the top five MySQL reads from 2020? Here they are:

Interested in watching some outstanding MySQL sessions? Check out some of the most-watched MySQL sessions of 2020:

Awesome MySQL talks and blogs from the community:

Our Percona University Online posted its first MySQL training; if you are looking at how to upgrade to MySQL 8, it is worth watching. Check out the training, How to Upgrade to MySQL 8.0.

The Staying Power of MongoDB is Undeniable

MongoDB growth in 2020 was undeniable, which is why it’s no surprise that another one of our top blogs was on MongoDB. Percona most-read tech blog on MongoDB published in 2020 was Vinicius Grippa’s must-read work outlining the best practices for running MongoDB. If you are new or old to MongoDB, it is worth reading and double-checking to ensure you have MongoDB optimized.

Interested in the top five MongoDB reads from 2020? Here they are:

Interested in watching some MongoDB sessions? Check out some of the most-watched MongoDB sessions of 2020:

Awesome MongoDB talks and blogs from the community:

More Popular Blogs and Discussions

Sometimes topics cross databases and delve into general advice. Let’s look at some of the more popular talks and blogs that are not tied to a specific database.

If you like videos, you may want to check out these great Percona Live Sessions from last year:

Other Popular Blogs:

Finally, Some Great Percona Software Released This Year

Here is the list of interesting software changes and news on Percona software in 2020:

Percona Distributions for MongoDB and MySQL:

  • What are Percona distributions? We take the best components from the community and ensure they work together. This way, you know your backup, HA, monitoring, etc., will all work together seamlessly.

Percona XtraDB Cluster 8.0 (PXC) was released, with improved performance, scalability, and security. Long sought after features include:

  • Streaming replication to support larger transactions
  • More granular and improved logging and troubleshooting options
  • Multiple system tables help find out more about the state of the cluster state.
  • Percona XtraDB Cluster 8.0 now automatically launches the upgrade as needed (even for minor releases), avoiding manual intervention and simplifying operation in the cloud.

Percona Distribution for PostgreSQL 13. Version 13 of PostgreSQL was a leap forward, and our distribution was updated to support all the additional functionality. Better indexing, better performance, and better security! Sign me up!

Percona Monitoring And Management (PMM) jumped forward from 2.2 to 2.13 adding some very cool features like:

  • Alert manager integration and integrated alerting
  • A brand new Query Analyzer with awesome features to allow you to find problem queries quicker and more efficiently
  • Enhanced metrics for AWS RDS monitoring
  • Added support for External Exporters so you can monitor 3rd party and custom services through the installed PMM-agent
  • New security threat tool allows for alerts and visibility into the most common security issues
  • Support for group replication
  • Better MongoDB and PostgreSQL monitoring
  • Better support for larger environments (Monitor More Stuff Faster)
  • Plus a ton of misc small enhancements!

Percona Kubernetes Operator for Percona XtraDB Cluster continued to evolve with several new features helping users build their own DYI DBaaS:

  • Auto-Tuning MySQL Parameters
  • Integration with Percona Monitoring and Management
  • Full data encryption at rest
  • Support for Percona XtraDB Cluster 8.0
  • Support for the latest version of Open Shift and Amazon’s Elastic Container Service
  • Dual support for ProxySQL and HA Proxy
  • Automated minor upgrades
  • Clone backups to set up a new PXC cluster on a different Kubernetes cluster

Percona Kubernetes Operator for Percona Server for MongoDB added several features, including:

  • Support for Percona Server for MongoDB 4.4
  • Automated management of system users
  • Support for the latest version of Open Shift and Amazon’s Elastic Container Service
  • Automated minor upgrades

While 2020 was far from the best year for many of us and we are glad it is behind us, it did generate some good content that we can use in 2021 and going forward to help us better manage and run our databases. Thanks for reading and happy database tuning!

Jan
12
2021
--

MySQL Backup and Recovery Best Practices

MySQL Backup and Recovery Best Practices

MySQL Backup and Recovery Best PracticesIn this blog, we will review all the backup and restore strategies for MySQL, the cornerstones of any application. There are a few options, depending on your topology, MySQL versions, etc. And based on that, there are some questions we need to ask ourselves to make sure we make the right choices.

How many backups we need to keep safe, or what’s the best retention policy for us?

This means the number of backups to safeguard, whether local or remote (external fileserver, cloud). The retention policy can be daily, weekly, or monthly, depending on the free space available.

What is the Recovery Time Objective?

The Recovery Time Objective (RTO) refers to the amount of time that may pass during a disruption before it exceeds the maximum allowable threshold specified in the Business Continuity Plan.

The key question related to RTO is, “How quickly must the data on this system be restored?”

What is the Recovery Point Objective?

The Recovery Point Objective (RPO) is the duration of time and service level within which a business process must be stored after a disaster in order to avoid unacceptable consequences associated with a break in continuity.

The key question related to RPO is, “How much data can we lose?”

Different Types of Backups

There are two backup types: physical and logical.

  • Physical (Percona XtraBackup, RDS/LVM Snapshots, MySQL Enterprise Backup), and also you can use cp or rsync command lines to copy the datadir as long as mysql is down/stopped.
  • Logical (mysqldump, mydumper, mysqlpump, mysql shell only for mysql 8)

Also is recommended to take a copy of binlog files, why? Well, this will help us to recover until the last transaction.

Why are backups needed?

Backups are needed in case of multiple problems:

  • Host Failure: We can get multiple problems from disks stalled or broken disks. Also from cloud services, our DB instance can be broken and it’s non-accessible.
  • Corrupted Data: This can happen on a power outage, MySQL wasn’t able to write correctly and close the file, sometimes when MySQL starts again it cannot start due to corrupted data and the crash recovery process cannot fix it.
  • Inconsistent Data: When a human mistake, delete/update erroneous data over the primary or replica node.
  • DataCenter Failure: power outage or internet provider issues.
  • Legislation/Regulation: provide consistent business value and customer satisfaction.

Now let me explain those different types of backups mentioned above, but before I continue, it’s important to configure a new and dedicated replica node for backups purposes, due to the high CPU load to avoid any issue on any other replica node (AKA backup server).

Logical Backup

This is a dump from logical database structure (CREATE DATABASE, CREATE TABLE statements) and content (INSERT statements). This is recommended to be used against smaller amounts of data. The disadvantage of this method is slower (backup and restore) if you compare it with physical backups. Using mydumper you can backup and restore a single database or a single table if it’s needed, and this is useful to copy some data to a different environment to run tests. Also, mydumper can take a consistent (as long as all the tables are InnoDB engine) backup and provides accurate master and slave log positions.

The output is larger than for physical backup, particularly when saved in text format, but it can be compressed on the fly depending on the software you are using. Mydumper can compress and mysqldump needs to add a pipe to redirect the output to gzip, for example.

Logical backups are used to address data corruption or the need to restore a subset of tables.

Physical (Raw) Backup

In short, this consists of exact copies of database directories and files. This can be a copy for all or a part from MySQL datadir directory. This kind of backup is most used to restore or create a new replica node easily and quickly and is used to address host failure. It’s recommended to restore using the same MySQL version. I recommend using Percona XtraBackup because it can include any related files such as configuration files like cnf config files.

Snapshot Backups

Some file system implementations enable “snapshots” to be taken. These provide logical copies of the file system at a given point in time, without requiring a physical copy of the entire file system. MySQL itself does not provide the capability for taking file system snapshots but it is available using third-party solutions such as LVM or ZFS.

The disadvantage is that sometimes physical backups do not compress much, because data is usually in a binary format and sometimes the table is already compressed.

Binary Log Backups

Binlog backups specifically address RPO. Binary log files contain records of each SQL query executed that made changes.

From MySQL 5.6 on, you can use mysqlbinlog to stream binary logs from a remote server. You can combine binlog backups with Percona XtraBackup or mydumper backup to allow restoration up to the end of the most-recently-backed-up binary log.

Incremental / Differential Backups

An incremental backup is a backup of everything that has changed since the last backup (a binary log backup is a special case of an incremental backup). This is a very good option if the dataset size is huge, as you can take a full backup at the beginning of the week and run incremental backups per day. Also, the backup size is smaller than the full backup.

The main risks associated with incremental backups are:

– A single corrupt incremental backup may invalidate all the others

– Incremental backups typically negatively affect the RTO

For a differential backup, it copies the differences from your last backup, and the advantage is that a lot of data does not change from one backup to the next, so the result can be significantly smaller backups. This saves disk space.

Percona XtraBackup supports both incremental and differential backups.

Offsite Storage

It’s highly recommended to copy all the backup methods to another place, like the cloud or an external file server, so in case of host failure or data center failure, you have another copy.

Not all the backup files need to be uploaded to the cloud, sometimes the time you need to spend in the download is bigger than the time consumed in the recovery process.

A good approach is to keep 1-7 days locally on the backup server in case a fast recovery is needed, and this depends on your business regulations.

Encryption

Backups have sensitive data, so it’s highly recommended to encrypt, especially for offsite storage. This adds more time when you need to restore a backup but it keeps your data safe.

GPG is a good option to encrypt backups, and if you use this option or some other alternative, don’t forget to get a copy of the keys/passphrase. If you lose it, your backups will be useless.

Restore Testing

Depending on your business, it’s highly recommended to test your backups at least once per month. This action validates your backups are not corrupted and it provides critical metrics on recovery time. This process should be automated to get the full backup, restore it, and finally configure this server as a replica from the current primary or another replica. This is good as well to validate that the replication process has no errors.

Many customers are using this methodology to refresh their QA/STG environment to have fresh data from production backups.

In addition to the above, it is recommended to create a manual or automated restore documentation process to keep all the steps together, so in case of disaster, you can follow it without wasting time.

Retention Requirements

Last but not least, it is very important to keep multiple copies of different backup types.

Our best recommendation is:

  • One or two physical backups locally on the backup server (as long as space allows it).
  • Seven daily and four weekly logical backups locally on the backup server.
  • 30 days of binlog backups locally on the backup server.
  • For offsite backups (like S3, Google Cloud, etc.), keep monthly backups for one year or more.

For local backups, keep in mind you will need a minimum of 2.5 times the current dataset size as free disk space to save/meet these retention policies. Don’t forget to encrypt all the backup types!

Legal or regulatory requirements may also dictate how long data must be archived.

Percona Can Help

Percona can help you choose, implement, and optimize the most appropriate MySQL backup and recovery solution for your MySQL ecosystem. If your current solution unexpectedly fails, we can facilitate your recovery with onsite, remote, or emergency consulting services. We can also help you take steps to prevent another occurrence. Every situation is unique and we will work with you to create the most effective solution for your business.

Contact Us

Jan
12
2021
--

Slim.ai announces $6.6M seed to build container DevOps platform

We are more than seven years into the notion of modern containerization, and it still requires a complex set of tools and a high level of knowledge on how containers work. The DockerSlim open source project developed several years ago from a desire to remove some of that complexity for developers.

Slim.ai, a new startup that wants to build a commercial product on top of the open source project, announced a $6.6 million seed round today from Boldstart Ventures, Decibel Partners, FXP Ventures and TechAviv Founder Partners.

Company co-founder and CEO John Amaral says he and fellow co-founder and CTO Kyle Quest have worked together for years, but it was Quest who started and nurtured DockerSlim. “We started coming together around a project that Kyle built called DockerSlim. He’s the primary author, inventor and up until we started doing this company, the sole proprietor of that of that community,” Amaral explained.

At the time Quest built DockerSlim in 2015, he was working with Docker containers and he wanted a way to automate some of the lower level tasks involved in dealing with them. “I wanted to solve my own pain points and problems that I had to deal with, and my team had to deal with dealing with containers. Containers were an exciting new technology, but there was a lot of domain knowledge you needed to build production-grade applications and not everybody had that kind of domain expertise on the team, which is pretty common in almost every team,” he said.

He originally built the tool to optimize container images, but he began looking at other aspects of the DevOps lifecycle including the author, build, deploy and run phases. He found as he looked at that, he saw the possibility of building a commercial company on top of the open source project.

Quest says that while the open source project is a starting point, he and Amaral see a lot of areas to expand. “You need to integrate it into your developer workflow and then you have different systems you deal with, different container registries, different cloud environments and all of that. […] You need a solution that can address those needs and doing that through an open source tool is challenging, and that’s where there’s a lot of opportunity to provide premium value and have a commercial product offering,” Quest explained.

Ed Sim, founder and general partner at Boldstart Ventures, one of the seed investors sees a company bringing innovation to an area of technology where it has been lacking, while putting some more control in the hands of developers. “Slim can shift that all left and give developers the power through the Slim tools to answer all those questions, and then, boom, they can develop containers, push them into production and then DevOps can do their thing,” he said.

They are just 15 people right now including the founders, but Amaral says building a diverse and inclusive company is important to him, and that’s why one of his early hires was head of culture. “One of the first two or three people we brought into the company was our head of culture. We actually have that role in our company now, and she is a rock star and a highly competent and focused person on building a great culture. Culture and diversity to me are two sides of the same coin,” he said.

The company is still in the very early stages of developing that product. In the meantime, they continue to nurture the open source project and to build a community around that. They hope to use that as a springboard to build interest in the commercial product, which should be available some time later this year.

Jan
12
2021
--

The Preview of Database as a Service (DBaaS) in Percona Monitoring and Management is Now Live!

DBaaS Percona Monitoring and Management

This week we officially kick-off the Preview of Database as a Service (DBaaS) in Percona Monitoring and Management. We are still looking for users to test and provide feedback during this year-long program, and we would love you to participate! 

Preview of Database as a Service in Percona Monitoring and Management

 

Our vision is to deliver a truly open source solution that won’t lock you in. A single pane of glass to easily manage your open source database infrastructure, and a self-service experience enabling fast and consistent open source database deployment. 

Our goal is to deliver the enterprise benefits our customers are looking for, including:

  • A single interface to deploy and manage your open source databases on-premises, in the cloud, or across hybrid and multi-cloud environments.
  • The ability to configure a database once and deploy it anywhere. 
  • Critical database management operations, such as backup, recovery, and patching.
  • Enhanced automation and advisory services allow you to find, eliminate, and prevent outages, security issues, and slowdowns. 
  • A viable alternative to public cloud and large enterprise database vendor DBaaS offerings, allowing you to eliminate vendor lock-in.

Percona applies a user-driven product development process. So, we hope our user community will get involved in the Preview of Database as a Service (DBaaS) in Percona Monitoring and Management and help inform the design and development of this new software functionality.

The Preview is a year-long program consisting of four phases. Each three-month phase will focus on a different area of participant feedback. Preview participants can be involved in as many phases as they like.

Preview of Database as a Service (DBaaS) in Percona Monitoring and Management

Phase One Details for Interested Participants

Phase one will focus on:

  1. Gathering feedback allows us to understand the applicable user personas and the goals and objectives required in their day-to-day roles.
  2. Gathering feedback on the user experience, specifically involving creating, editing, and deleting database clusters and the databases within those clusters. We will also gather feedback on the management of those clusters and the monitoring of added database servers and nodes.

We are starting with a focus on database deployment and management features, as they help users improve their productivity. 

Other details to note…

  • Phase one of the Preview will run until April 2021
  • Phase one requires around 10 hours of self-paced activities, facilitated through the Percona Forum
    • All Community Preview participant feedback will be captured within the Percona Forum
    • Community Preview participant questions will be facilitated through the Percona Forum.

So make sure to sign up to participate in the Preview of Database as a Service (DBaaS) in Percona Monitoring and Management and become a crucial participant in this initiative, helping shape future users’ experience as we develop and test this new software functionality! 

Register Now!

Jan
12
2021
--

Cockroach Labs scores $160M Series E on $2B valuation

Cockroach Labs, makers of CockroachDB, have been on a fundraising roll for the last couple of years. Today the company announced a $160 million Series E on a fat $2 billion valuation. The round comes just eight months after the startup raised an $86.6 million Series D.

The latest investment was led by Altimeter Capital, with participation from new investors Greenoaks and Lone Pine, along with existing investors Benchmark, Bond, FirstMark, GV, Index Ventures and Tiger Global. The round doubled the company’s previous valuation and increased the amount raised to $355 million.

Co-founder and CEO Spencer Kimball says the company’s revenue more than doubled in 2020 in spite of COVID, and that caught the attention of investors. He attributed this paradoxical rise to the rapid shift to the cloud brought on by the pandemic that many people in the industry have seen.

“People became more aggressive with what was already underway, a real move to embrace the cloud to build the next generation of applications and services, and that’s really fundamentally where we are,” Kimball told me.

As that happened, the company began a shift in thinking. While it has embraced an open-source version of CockroachDB along with a 30-day free trial on the company’s cloud service as ways to attract new customers to the top of the funnel, it wants to try a new approach.

In fact, it plans to replace the 30-day trial with a newer version later this year without any time limits. It believes this will attract more developers to the platform and enable them to see the full set of features without having to enter credit card information. What’s more, by taking this approach, it should end up costing the company less money to support the free tier.

“What we expect is that you can do all kinds of things on that free tier. You can do a hackathon, any kind of hobby project […] or even a startup that has ambitions to be the next DoorDash or Airbnb,” he said. As he points out, there’s a point where early-stage companies don’t have many users, and can remain in the free tier until they achieve product-market fit.

“That’s when they put a credit card down, and they can extend beyond the free tier threshold and pay for what they use,” he said. The newer free tier is still in the beta testing phase, but will be rolled out during this year.

Kimball says the company wasn’t necessarily looking to raise, although he knew that it would continue to need more cash on the balance sheet to run with giant competitors like Oracle, AWS and the other big cloud vendors, along with a slew of other database startups. As the company’s revenue grows, he certainly sees an IPO in its future, but he doesn’t see it happening this year.

The startup ended the year with 200 employees and Kimball expects to double that by the end of this year. He says growing a diverse group of employees takes good internal data and building a welcoming and inclusive culture.

“I think the starting point for anything you want to optimize in a business is to make sure that you have the metrics in front of you, and that you’re constantly looking at them […] in order to measure how you’re doing,” he explained.

He added, “The thing that we’re most focused on in terms of action is really building the culture of the company appropriately and that’s something we’ve been doing for all six years we’ve been around. To the extent that you have an inclusive environment where people actually really view the value of respect, that helps with diversity.”

Kimball says he sees a different approach to running the business when the pandemic ends, with some small percentage going into the office regularly and others coming for quarterly visits, but he doesn’t see a full return to the office post-pandemic.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com