Jul
19
2023
--

Deploy PostgreSQL on Kubernetes Using GitOps and ArgoCD

PostgreSQL on Kubernetes using GitOps and ArgoCD

In the world of modern DevOps, deployment automation tools have become essential for streamlining processes and ensuring consistent, reliable deployments. GitOps and ArgoCD are at the cutting edge of deployment automation, making it easy to deploy complex applications and reducing the risk of human error in the deployment process. In this blog post, we will explore how to deploy the Percona Operator for PostgreSQL v2 using GitOps and ArgoCD.

deploy the Percona Operator for PostgreSQL v2 using GitOps and ArgoCD

The setup we are looking for is the following:

  1. Teams or CICD roll out the manifests to Github
  2. ArgoCD reads the changes and compares the changes to what we have in Kubernetes
  3. ArgoCD creates/modifies Percona Operator and PostgreSQL custom resources
  4. Percona Operator takes care of day-1 and day-2 operations based on the changes pushed by ArgoCD to custom resources

Prerequisites:

  1. Kubernetes cluster
  2. GitHub repository. You can find my manifests here.

Start it up

Deploy and prepare ArgoCD

ArgoCD has quite detailed documentation explaining the installation process. I did the following:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Expose the ArgoCD server. You might want to use ingress or some other approach. I’m using a Load Balancer in a public cloud:

kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'

Get the Load Balancer endpoint; we will use it later:

kubectl -n argocd get svc argocd-server
NAME            TYPE           CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
argocd-server   LoadBalancer   10.88.1.239   123.21.23.21   80:30480/TCP,443:32623/TCP   6h28m

I’m not a big fan of Web User Interfaces, so I took the path of using
argocd CLI. Install it by following the CLI installation documentation.

Retrieve the admin password to log in using the CLI:

argocd admin initial-password -n argocd

Login to the server. Use the Load Balancer endpoint from above:

argocd login 123.21.23.21

PostgreSQL

Github and manifests

Put YAML manifests into the github repository. I have two:

  1. bundle.yaml – deploys Custom Resource Definitions, Service Account, and the Deployment of the Operator
  2. argo-test.yaml – deploys the PostgreSQL Custom Resource manifest that Operator will process

There are some changes that you would need to make to ensure that ArgoCD works with Percona Operator.

Server Side sync

Percona relies on OpenAPI v3.0 validation for Custom Resources. When done properly, it increases the size of a Custom Resource Definition manifest (CRDS) in some cases. As a result, you might see the following error when you apply the bundle:

kubectl apply -f deploy/bundle.yaml
...
Error from server (Invalid): error when creating "deploy/bundle.yaml": CustomResourceDefinition.apiextensions.k8s.io "perconapgclusters.pg.percona.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

To avoid it, use the
serverside flag. ArgoCD supports Server-Side apply. In the manifests, I added them through annotations to
CustomResourceDefinition objects:

kind: CustomResourceDefinition
metadata:
  annotations:
    ...
    argocd.argoproj.io/sync-options: ServerSideApply=true
  name: perconapgclusters.pgv2.percona.com

Phases and waves

ArgoCD comes with Phases and Waves that allow you to apply manifests and resources in them in a specific order. You should use Waves for two reasons:

  1. To deploy Operator before Percona PostgreSQL Custom Resource
  2. It is also important to delete the Custom Resource first (so perform operations in reverse order)

I added the waves through annotations to the resources.

  • All resources in bundle.yaml are assigned to wave “1”:
kind: CustomResourceDefinition
metadata:
  annotations:
    ...
    argocd.argoproj.io/sync-wave: "1"
  name: perconapgclusters.pgv2.percona.com

  • PostgreSQL Custom Resource in argo-test.yaml has wave “5”:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "5"
  name: argo-test

The bigger the number in wave, the later the resource will be created. In our case, PerconaPGCluster resource will be created after the Custom Resource Definitions from bundle.yaml.

Deploy with ArgoCD

Create the application in ArgoCD:

argocd app create percona-pg --repo https://github.com/spron-in/blog-data.git --path gitops-argocd-postgresql --dest-server https://kubernetes.default.svc --dest-namespace default

The commands do the following:

  • Creates an application called percona-pg
  • Uses the GitHub repo and a folder in it as a source of YAML manifests
  • Uses local k8s API. It is possible to have multiple k8s clusters.
  • Deploys into default namespace

Now perform a manual sync. The command will roll out manifests:

argocd app sync percona-pg

Check for pg object in the default namespace:

kubectl get pg
NAME        ENDPOINT          STATUS   POSTGRES   PGBOUNCER   AGE
argo-test   33.22.11.44       ready    3          3           80s

Operational tasks

Now let’s say we want to change something. The change should be merged into git, and ArgoCD will detect it. The sync interval is 180 seconds by default. You can change it in argocd-cm ConfigMap if needed.

Even when ArgoCD detects the change, it marks it as out of sync. For example, I reduced the number of CPUs for pgBouncer. ArgoCD detected the change:

argocd app get percona-pg
...
2023-07-07T10:11:22+03:00  pgv2.percona.com      PerconaPGCluster             default             argo-test                                OutOfSync                       perconapgcluster.pgv2.percona.com/argo-test configured

Now I can manually sync the change again. To automate the whole flow, just set the sync policy:

argocd app set percona-pg --sync-policy automated

Now all the changes in git will be automatically synced with Kubernetes once ArgoCD detects them.

Conclusion

GitOps, combined with Kubernetes and the Percona Operator for PostgreSQL, provides a powerful toolset for rapidly deploying and managing complex database infrastructures. By leveraging automation and deployment best practices, teams can reduce the risk of human error, increase deployment velocity, and focus on delivering business value. Additionally, the ability to declaratively manage infrastructure and application state enables teams to quickly recover from outages and respond to changes with confidence.

Try out the Operator by following the quickstart guide here.   

You can get early access to new product features, invite-only ”ask me anything” sessions with Percona Kubernetes experts, and monthly swag raffles. Interested? Fill in the form at percona.com/k8s.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Dec
29
2021
--

Q & A on Webinar “MySQL Performance for DevOps”

MySQL Performance for DevOps

MySQL Performance for DevOpsFirst I want to thank everyone who attended my November 16, 2021 webinar “MySQL Performance for DevOps“. Recording and slides are available on the webinar page.

Here are answers to the questions from participants which I was not able to provide during the webinar.

Q: Hi! We have troubles with DELETE queries. We have to remove some data periodically (like, hourly, daily) and we have short-term server stalls during these DELETEs. Server is running on modern NVMe’s so we wonder why do we have this situation. Those DELETE’s are not so large, like 10 000 – 15 000 records, but tables on which DELETE’s are performed update frequently.

A: I would test if a similar

DELETE

  statement is slow when you run it on the development server in an isolated environment while no other session is connected to the MySQL server instance.  If it is slow in this case too, check if MySQL uses indexes to resolve the condition

WHERE

  for the

DELETE

  statement. You can use

EXPLAIN

  statement for

DELETE

  or convert

DELETE

  into a similar

SELECT

  query and experiment.

If the

DELETE

  statement is running fast when called in the isolated environment, check how parallel sessions affect its performance. If the tables you are deleting from are updated frequently,

DELETE

  statements could cause and be affected by locking conflicts. To resolve this situation study how MySQL works with locks. Great presentation about InnoDB locks “InnoDB Locking Explained with Stick Figures” could be found at https://www.slideshare.net/billkarwin/innodb-locking-explained-with-stick-figures Then you need to optimize

DELETE

  and

UPDATE

  statements, so they finish faster. Alternatively, you can separate them in time, so they have less effect on each other. You may also split

DELETE

  statements, so they update fewer records at a time.

Q: Question 2. We have innodb_buffer_size set around 260Gb on the dedicated server with about 320Gb of total RAM. Still, we have 99.9% memory full and there are no other large memory consumers, only MySQL (Percona 8.0.23). The server starts and around 3 hours it takes all available memory regardless of the innodb_buffer_size setting. We never had something like this with 5.7. Do you have any ideas?

A: MySQL uses memory not only for the InnoDB buffer pool but for other data, such as session-based and operation-based buffers. For example, if you have 100 connections that use underlying temporary tables to resolve queries and set the size of the internal temporary table to 100MB you will use around 10G additional memory for these tables. Query memory digest tables in Performance Schema and views on these tables in the

sys

 schema to find the operations that allocate memory in your MySQL server.

Q: Can we get a copy of this presentation?

A: You should have received a copy of the slides. If you did not, they are attached to this blog post: DevOps_Perf_202111

Q: buffer_pool_size should be what percentage of the host RAM?

A: The percentage of the host RAM is a very rough estimation of the ideal amount of memory you need to allocate for the InnoDB buffer pool. For example, the MySQL user manual in past had recommendations for having InnoDB buffer pool size up to 80% of the available RAM. But 80% of RAM is very different if the host has, say, 8G, or 1024G. In the former case, 80% is 6.4G and the host will have 1.6G for other MySQL buffers and the operating system that could be not enough. In the latter case, 80% is 819.2G and the host will have 204.8G for other needs. Depending on your workload it could be a huge waste of resources. I recommend you to read this blog post: https://www.percona.com/blog/2015/06/02/80-ram-tune-innodb_buffer_pool_size/ and follow the links in the end, then choose the size, appropriate for your data set and workload.

Q: How we can fitting RAM size vs data size?

Example: if I have 1G of data, how many RAM I need for get 100 QPS, and if I have 100G of data how many RAM I need for get 100 QPS?

A: RAM size, dataset size, and the number of queries per second that your server can handle are not directly related. You need to test your queries and follow how they are executed. For example, if you select everything from the InnoDB table and your table holds either 1G or 100G of data, and you do not access any other table on the server, the very first run will be slower than following because InnoDB will read data into the buffer pool. Then performance and the number of queries per second will be limited only by network speed and bandwidth between your client and server having you can allocate about 100G for your buffer pool. But cached size will stay almost the same as the table size no matter how many connections you have. Your MySQL server will only use a small amount of memory for new connections buffers.

In another case, however, you may have a comparatively small table that you will access by a quite complicated query. For example, if you try to repeat the test case for still valid https://bugs.mysql.com/bug.php?id=29423, a single query on the 184M table would run for a much longer time than you expect. In this case number of queries per second will be also very low.

Q: Do you have a recommendation parameter list for MySQL RDS on AWS?

A: It is the same as for the dedicated MySQL server but you may have not been able to change some of the options.

Q: If you know you have SSD’s, but ROTA = 1, what has to be configured to make use of the SSDs?

A: For SSD ROTA should be 0. If you are sure you have SSDs but they are shown as rotational disks this means that your storage is configured incorrectly. Depending on the configuration you may still have the same performance as if the disks were recognized properly. If this is not the case, check your storage, RAID controller, and system configuration.

MySQL just sends system commands for reading, writing, and syncing data. It does not care if the disk is rotational or not. For MySQL performance value of ROTA does not really matter.

Q: If you believed you tuned both master and slave for the best performance, but seconds behind master continues to increase, you decide to split the shard, but xtrabackup fails with log wrap.  But even if you were to get a good backup, once it is online, the slave will never catch up.  The Kobayashi Maru, a no win situation – have you been there?  What did you do?

A: First make sure if you configured a multi-threaded replica. If you use parallel type

LOGICAL_CLOCK

, study option binlog_transaction_dependency_tracking. Practically how it works when set to

WRITESET

  or to

WRITESET_SESSION

 . For avoiding log wrap during backup increase redo log file size. If you can stop the source server, stop it and set up a replica by copying datadir: it is faster than using XtraBackup, because you would not need to copy changes in the redo log files while the backup is running.

Q: In MySQL 5.7, the tmp tablespace is now InnoDB, how can you tune tmp to take advantage of RAM and not use disk?

A: The tablespace file on disk is used only when the in-memory table is converted into a disk-based table. Otherwise, temporary tables continue using memory.

Q: What are the top 6 variables to get the best performance, how can you verify how effective their setting are, looking at the global status, when can you know when those variables can be increased to get the best utilization from CPUs/RAM/Disk/Network.

A: While I showed variables that can improve performance in most cases on my “Conclusion” slides I recommend you to start from the issue you are trying to solve and start adjusting variables only when you understand what you are doing.

Some of such variables could be measured for effectiveness. For example, if the number of free buffers in the output of

SHOW ENGINE INNODB STATUS

  is small and the buffer pool hit rate shows that a number of disk access is consistently greater than the number of the buffer pool hits, it indicates that the buffer pool size may be too small for you your workload and data.

Regarding CPU, if the number of active threads is high, and you see performance drop when concurrency increases while the operating system shows low CPU usage, it may be a symptom that either:

– you limited the upper limit of the number of active engine threads

– disk does not support so many parallel operations and active threads are waiting for IO

Another issue with CPU performance could happen if the upper limit of the number of active engine threads is not set or too high and threads are spending time doing nothing while waiting in the priority queue.

The only option that directly limits IO activity is

innod_io_capacity

  that limits the speed of background InnoDB operations. If set too low InnoDB may underuse your fast disk and if set too high InnoDB could start writing too fast, so each write request will waste time waiting in its queue.

Q: What was the last InnoDB setting, the one which should up to no of CPU cores?

A: This is

innodb_thread_concurrency

  that limits the number of InnoDB threads that could run in parallel. You should set it either to 0 or to the number of CPU cores.

Q: Which is more secure and faster community MySQL or Percona MySQL or aws rds?

A: Percona MySQL has performance, diagnostic improvements, as well as Enterprise-level features, available as open source. AWS RDS supports hardware scaling on demand and physical replication that uses InnoDB redo log files instead of binary logs. However, it does not allow you to have the same control on the server as for your own physical instance. Community MySQL works on a higher number of platforms, thus uses function calls that work on all of them where Percona MySQL or AWS RDS may use optimized variants. So each of them has its own advantages and disadvantages.

Q: In case with open tables >>> open_files (and cannot change open_files) how to set table_open_cache? “as big as possible”?

A: Status variable

Open_files

  is “the number of files that are open. This count includes regular files opened by the server. It does not include other types of files such as sockets or pipes. Also, the count does not include files that storage engines open using their own internal functions rather than asking the server level to do so.” (https://dev.mysql.com/doc/refman/8.0/en/server-status-variables.html#statvar_Open_files) The status variable

Open_tables

  is “the number of tables that are open”. They are not related to each other. You need to watch that value of 

Opened_tables

  (“the number of tables that have been opened”) is not greater than

Open_tables

 .

There is an operating system option “open files” that is visible if you run the command

ulimit -n

. This option should be greater than the maximum number of files that your MySQL instance can simultaneously open. Speaking about

Open_tables

 : you cannot have this value set to a number that is larger than the operating system option “open files” unless your tables are stored in the shared or general tablespace.

Q: How to tell if we should tune join_buffer_size? wait events anywhere?

A: If you use

JOIN

  queries that do not use indexes and they perform slowly because of this. Start from regular query tuning using slow query log, Performance Schema, and Query Analyzer in PMM to find queries that require optimization. In Query Analyzer add a column “Full Join” to your query list. In the Performance Schema search for statements where the value of

SELECT_FULL_JOIN

  is greater than 0 in the

events_statements_*

  tables.

Check also my “Introduction to MySQL Query Tuning for Dev[Op]s” webinar.

Q: How to measure memory consumption of table_open_cache? 15K/table? FRM-related? some way to estimate?

A: This is event “

memory/sql/TABLE_SHARE::mem_root

” Check also this blog post.

Q: Hello guys!

Do we need to prepare different optimization depends on MySQL engine e.g. XtraDB, InnoDB? If yes, could you please explain differences?

Best regards,

Oleg Stelmach

A: XtraDB is an enhanced version of InnoDB in the Percona Server: https://www.percona.com/doc/percona-server/8.0/percona_xtradb.html. So differences are added features in the Percona Server. Namely, the options that exist in the Percona server and do not exist in the upstream Community MySQL.

Q: Regarding threads. Do better to use hyperthreading\multithreading for MySQL instance or we need to turn off this function?

Best regards,

Oleg Stelmach

A: You do not have to turn this option off but you may see that MySQL performance is not linearly predictable in high concurrent workloads. I recommend you to check this blog post with hyperthreading benchmarks on MySQL and comments on it for a better understanding of how hyperthreading can affect MySQL performance.

Q: Besides from setting os swap-pines correctly. would also recommend to enable memlock in my.cnf?

A: Normally you do not need it.

Jul
27
2021
--

Blameless raises $30M to guide companies through their software lifecycle

Site reliability engineering platform Blameless announced Tuesday it raised $30 million in a Series B funding round, led by Third Point Ventures with participation from Accel, Decibel and Lightspeed Venture Partners, to bring total funding to over $50 million.

Site reliability engineering (SRE) is an extension of DevOps designed for more complex environments.

Blameless, based in San Mateo, California, emerged from stealth in 2019 after raising both a seed and Series A round, totaling $20 million. Since then, it has turned its business into a blossoming software platform.

Blameless’ platform provides the context, guardrails and automated workflows so engineering teams are unified in the way they communicate and interact, especially to resolve issues quicker as they build their software systems.

It originally worked with tech-forward teams at large companies, like Home Depot, that were “dipping [their toes] into the space and now [want] to double down,” co-founder and CEO Lyon Wong told TechCrunch.

The company still works with those tech-forward teams, but in the past two years, more companies sought out resident SRE architect Kurt Anderson to advise them, causing Blameless to change up its business approach, Wong said.

Other companies are also seeing a trend of customers asking for support — for example, in March, Google Cloud unveiled its Mission Critical Services support option for SRE to serve in a similar role as a consultant as companies move toward readiness with their systems. And in February, Nobl9 raised a $21 million Series B to provide enterprises with the tools they need to build service-level-objective-centric operations, which is part of a company’s SRE efforts.

Blameless now has interest from more mainstream companies in the areas of enterprise, logistics and healthcare. These companies aren’t necessarily focused on technology, but see a need for SRE.

“Companies recognize the shortfall in reliability, and then the question they come to us with is how do they get from where they are to where they want to be,” Anderson said. “Often companies that don’t have a process respond with ‘all hands on deck’ all the time, but instead need to shift to the right people responding.”

Lyon plans to use the new funding to fill key leadership roles, the company’s go-to-market strategy and product development to enable the company to go after larger enterprises.

Blameless doubled its revenue in the last year and will expand to service all customer segments, adding small and emerging businesses to its roster of midmarket and large companies. The company also expects to double headcount in the next three quarters.

As part of the funding announcement, Third Point Ventures partner Dan Moskowitz will join Blameless’ board of directors with Wong, Accel partner Vas Natarajan and Lightspeed partner Ravi Mhatre.

“Freeing up engineering to focus on shipping code is exactly what Blameless achieves,” said Moskowitz in a written statement. “The Blameless market opportunity is big as we see teams struggle and resort to creating homegrown playbooks and point solutions that are incomplete and costly.”

 

Jun
29
2021
--

DevOps platform JFrog acquires AI-based IoT and connected device security specialist Vdoo for $300M

JFrog, the company best known for a platform that helps developers continuously manage software delivery and updates, is making a deal to help it expand its presence and expertise in an area that has become increasingly connected to DevOps: security. The company is acquiring Vdoo, which has built an AI-based platform that can be used to detect and fix vulnerabilities in the software systems that work with and sit on IoT and connected devices. The deal — in a mix of cash and stock — is valued at approximately $300 million, JFrog confirmed to me.

Sunnyvale-based, Israeli-founded JFrog is publicly traded on Nasdaq, where it went public last September, and currently it has a market cap of $4.65 billion. Vdoo, meanwhile, had raised about $70 million from investors that include NTT, Dell, GGV and Verizon (disclaimer: Verizon owns TechCrunch), and when we covered its most recent funding round, we estimated that the valuation was somewhere between $100 million and $200 million, making this a decent return.

Shlomi Ben Haim, JFrog’s co-founder and CEO, said that his company’s turn to focusing deeper on security, and making this acquisition in particular to fill out that strategy, are a natural progression in its aim to build out an end-to-end platform for the DevOps team.

“When we started JFrog, the main challenge was to educate the market on what we saw as most important priorities when it comes to building, testing and deploying software,” he said. Then sometime around 2015-2016 he said they started to realize there was a “crack” in the system, “a crack called security.” InfoSec engineers and developers sometimes work at cross purposes, as “developers became too fast” the work they were doing has inadvertently led to a lot of security vulnerabilities.

JFrog has been building a number of tools since then to address that and to bring the collective priorities together, such as its X-ray product. And indeed, Vdoo is not JFrog’s first foray into security, but it represents a significant step deeper into the hardware and systems that are being run on software. “It’s a very important leap forward,” Ben Haim said.

For its part, Vdoo was born out of a realization as well as a challenging mission: IoT and other connected devices — a universe of some 50 billion pieces of hardware as of last year — represents a massive security headache, and not just because of the volume of devices: Each object uses and interacts with software in the cloud and so each instance represents a potential vulnerability, with zero-day vulnerabilities, CVEs, configuration and hardening issues, and standard non-compliance among some of the most common.

While connected-device security up to now has typically focused on monitoring activity on the hardware, how data is moving in and out of it, Vdoo’s approach has been to build a platform that monitors the behavior of the devices themselves on top of that, using AI to compare that behavior to identify when something is not working as it should. Interestingly, this mirrors the kind of binary analysis that JFrog provides in its DevOps platform, making the two complementary to each other.

But what’s notable is that this will give JFrog a bigger play at the edge, since part of Vdoo’s platform works on devices themselves, “micro agents” as the company has described them to me previously, to detect and repair vulnerabilities on endpoints.

While JFrog has built a lot of its own business from the ground up, it has made a number of acquisitions to bolt on technology (one example: Shippable, which it used to bring continuous integration and delivery into its DevOps platform). In this case, Netanel Davidi, the co-founder and CEO of Vdoo (who previously co-founded and sold another security startup, Cyvera, to Palo Alto Networks) said that this was a good fit because the two companies are fundamentally taking the same approaches in their work (another synergy and justification for DevOps and InfoSec being more closely knitted together too I might add).

“In terms of the fit between the companies, it’s about our approach to binaries,” Davidi said in an interview, noting that the two being on the same page with this approach was fundamental to the deal. “That’s only the way to cover the entire pipeline from the very beginning, when they go you develop something, all the way to the device or to the server or to the application or to the mobile phone. That’s the only way to truly understand the context and contextual risk.”

He also made a note not just of the tech but of the talent that is coming on with the acquisition: 100 people joining JFrog’s 800.

“If JFrog chose to build something like this themselves, they could have done it,” he said. “But the uniqueness here is that we have built the best security team, the best security researchers, the best vulnerability researchers, the best reverse engineers, which focus not only on embedded systems, and IoT, which is considered to be the hardest thing to learn and to analyze, but also in software artifacts. We are bringing this knowledge along with us.”

JFrog said that Vdoo will continue to operate as a standalone SaaS product for the time being. Updates that are made will be in aid of supporting the JFrog platform and the two aim to have a fully integrated, “holistic” product by 2022.

Along with the deal, JFrog reiterated financial guidance for the next quarter that will end June 30, 2021. It expects revenues of $47.6 million to $48.6 million, with non-GAAP operating income of $0.5 million to $1.5 million and non-GAAP EPS of $0.00 to $0.01, assuming approximately 104 million weighted average diluted shares outstanding. For Full Year 2021, revenues are expected to be $198 million to $204 million, with non-GAAP operating income between $5 million and $7 million and an approximately 3% increase in weighted average diluted shares. JFrog anticipates consolidated operating expenses to increase by approximately $9-10 million for the remainder of 2021, subject to the acquisition closing.

May
11
2021
--

Cycode raises $20M to secure DevOps pipelines

Israeli security startup Cycode, which specializes in helping enterprises secure their DevOps pipelines and prevent code tampering, today announced that it has raised a $20 million Series A funding round led by Insight Partners. Seed investor YL Ventures also participated in this round, which brings the total funding in the company to $24.6 million.

Cycode’s focus was squarely on securing source code in its early days, but thanks to the advent of infrastructure as code (IaC), policies as code and similar processes, it has expanded its scope. In this context, it’s worth noting that Cycode’s tools are language and use case agnostic. To its tools, code is code.

“This ‘everything as code’ notion creates an opportunity because the code repositories, they become a single source of truth of what the operation should look like and how everything should function, Cycode CTO and co-founder Ronen Slavin told me. “So if we look at that and we understand it — the next phase is to verify this is indeed what’s happening, and then whenever something deviates from it, it’s probably something that you should look at and investigate.”

Cycode Dashboard

Cycode Dashboard. Image Credits: Cycode

The company’s service already provides the tools for managing code governance, leak detection, secret detection and access management. Recently it added its features for securing code that defines a business’ infrastructure; looking ahead, the team plans to add features like drift detection, integrity monitoring and alert prioritization.

“Cycode is here to protect the entire CI/CD pipeline — the development infrastructure — from end to end, from code to cloud,” Cycode CEO and co-founder Lior Levy told me.

“If we look at the landscape today, we can say that existing solutions in the market are kind of siloed, just like the DevOps stages used to be,” Levy explained. “They don’t really see the bigger picture, they don’t look at the pipeline from a holistic perspective. Essentially, this is causing them to generate thousands of alerts, which amplifies the problem even further, because not only don’t you get a holistic view, but also the noise level that comes from those thousands of alerts causes a lot of valuable time to get wasted on chasing down some irrelevant issues.”

What Cycode wants to do then is to break down these silos and integrate the relevant data from across a company’s CI/CD infrastructure, starting with the source code itself, which ideally allows the company to anticipate issues early on in the software life cycle. To do so, Cycode can pull in data from services like GitHub, GitLab, Bitbucket and Jenkins (among others) and scan it for security issues. Later this year, the company plans to integrate data from third-party security tools like Snyk and Checkmarx as well.

“The problem of protecting CI/CD tools like GitHub, Jenkins and AWS is a gap for virtually every enterprise,” said Jon Rosenbaum, principal at Insight Partners, who will join Cycode’s board of directors. “Cycode secures CI/CD pipelines in an elegant, developer-centric manner. This positions the company to be a leader within the new breed of application security companies — those that are rapidly expanding the market with solutions which secure every release without sacrificing velocity.”

The company plans to use the new funding to accelerate its R&D efforts, and expand its sales and marketing teams. Levy and Slavin expect that the company will grow to about 65 employees this year, spread between the development team in Israel and its sales and marketing operations in the U.S.

Apr
28
2021
--

Opsera raises $15M for its continuous DevOps orchestration platform

Opsera, a startup that’s building an orchestration platform for DevOps teams, today announced that it has raised a $15 million Series A funding round led by Felicis Ventures. New investor HMG Ventures, as well as existing investors Clear Ventures, Trinity Partners and Firebolt Ventures also participated in this round, which brings the company’s total funding to $19.3 million.

Founded in January 2020, Opsera lets developers provision their CI/CD tools through a single framework. Using this framework, they can then build and manage their pipelines for a variety of use cases, including their software delivery lifecycle, infrastructure as code and their SaaS application releases. With this, Opsera essentially aims to help teams set up and operate their various DevOps tools.

The company’s two co-founders, Chandra Ranganathan and Kumar Chivukula, originally met while working at Symantec a few years ago. Ranganathan then spent the last three years at Uber, where he ran that company’s global infrastructure. Meanwhile, Chivukula ran Symantec’s hybrid cloud services.

Image Credits: Opsera

“As part of the transformation [at Symantec], we delivered over 50+ acquisitions over time. That had led to the use of many cloud platforms, many data centers,” Ranganathan explained. “Ultimately we had to consolidate them into a single enterprise cloud. That journey is what led us to the pain points of what led to Opsera. There were many engineering teams. They all had diverse tools and stacks that were all needed for their own use cases.”

The challenge then was to still give developers the flexibility to choose the right tools for their use cases, while also providing a mechanism for automation, visibility and governance — and that’s ultimately the problem Opsera now aims to solve.

Image Credits: Opsera

“In the DevOps landscape, […] there is a plethora of tools, and a lot of people are writing the glue code,” Opsera co-founder Chivukula noted. “But then they’re not they don’t have visibility. At Opsera, our mission and goal is to bring order to the chaos. And the way we want to do this is by giving choice and flexibility to the users and provide no-code automation using a unified framework.”

Wesley Chan, a managing director for Felicis Ventures who will join the Opsera board, also noted that he believes that one of the next big areas for growth in DevOps is how orchestration and release management is handled.

“We spoke to a lot of startups who are all using black-box tools because they’ve built their engineering organization and their DevOps from scratch,” Chan said. “That’s fine, if you’re starting from scratch and you just hired a bunch of people outside of Google and they’re all very sophisticated. But then when you talk to some of the larger companies. […] You just have all these different teams and tools — and it gets unwieldy and complex.”

Unlike some other tools, Chan argues, Opsera allows its users the flexibility to interface with this wide variety of existing internal systems and tools for managing the software lifecycle and releases.

“This is why we got so interested in investing, because we just heard from all the folks that this is the right tool. There’s no way we’re throwing out a bunch of our internal stuff. This would just wreak havoc on our engineering team,” Chan explained. He believes that building with this wide existing ecosystem in mind — and integrating with it without forcing users onto a completely new platform — and its ability to reduce friction for these teams, is what will ultimately make Opsera successful.

Opsera plans to use the new funding to grow its engineering team and accelerate its go-to-market efforts.

Mar
29
2021
--

Testing platform Tricentis acquires performance testing service Neotys

If you develop software for a large enterprise company, chances are you’ve heard of Tricentis. If you don’t develop software for a large enterprise company, chances are you haven’t. The software testing company with a focus on modern cloud and enterprise applications was founded in Austria in 2007 and grew from a small consulting firm to a major player in this field, with customers like Allianz, BMW, Starbucks, Deutsche Bank, Toyota and UBS. In 2017, the company raised a $165 million Series B round led by Insight Venture Partners.

Today, Tricentis announced that it has acquired Neotys, a popular performance testing service with a focus on modern enterprise applications and a tests-as-code philosophy. The two companies did not disclose the price of the acquisition. France-based Neotys launched in 2005 and raised about €3 million before the acquisition. Today, it has about 600 customers for its NeoLoad platform. These include BNP Paribas, Dell, Lufthansa, McKesson and TechCrunch’s own corporate parent, Verizon.

As Tricentis CEO Sandeep Johri noted, testing tools were traditionally script-based, which also meant they were very fragile whenever an application changed. Early on, Tricentis introduced a low-code tool that made the automation process both easier and resilient. Now, as even traditional enterprises move to DevOps and release code at a faster speed than ever before, testing is becoming both more important and harder for these companies to implement.

“You have to have automation and you cannot have it be fragile, where it breaks, because then you spend as much time fixing the automation as you do testing the software,” Johri said. “Our core differentiator was the fact that we were a low-code, model-based automation engine. That’s what allowed us to go from $6 million in recurring revenue eight years ago to $200 million this year.”

Tricentis, he added, wants to be the testing platform of choice for large enterprises. “We want to make sure we do everything that a customer would need, from a testing perspective, end to end. Automation, test management, test data, test case design,” he said.

The acquisition of Neotys allows the company to expand this portfolio by adding load and performance testing as well. It’s one thing to do the standard kind of functional testing that Tricentis already did before launching an update, but once an application goes into production, load and performance testing becomes critical as well.

“Before you put it into production — or before you deploy it — you need to make sure that your application not only works as you expect it, you need to make sure that it can handle the workload and that it has acceptable performance,” Johri noted. “That’s where load and performance testing comes in and that’s why we acquired Neotys. We have some capability there, but that was primarily focused on the developers. But we needed something that would allow us to do end-to-end performance testing and load testing.”

The two companies already had an existing partnership and had integrated their tools before the acquisition — and many of its customers were already using both tools, too.

“We are looking forward to joining Tricentis, the industry leader in continuous testing,” said Thibaud Bussière, president and co-founder at Neotys. “Today’s Agile and DevOps teams are looking for ways to be more strategic and eliminate manual tasks and implement automated solutions to work more efficiently and effectively. As part of Tricentis, we’ll be able to eliminate laborious testing tasks to allow teams to focus on high-value analysis and performance engineering.”

NeoLoad will continue to exist as a stand-alone product, but users will likely see deeper integrations with Tricentis’ existing tools over time, include Tricentis Analytics, for example.

Johri tells me that he considers Tricentis one of the “best kept secrets in Silicon Valley” because the company not only started out in Europe (even though its headquarters is now in Silicon Valley) but also because it hasn’t raised a lot of venture rounds over the years. But that’s very much in line with Johri’s philosophy of building a company.

“A lot of Silicon Valley tends to pay attention only when you raise money,” he told me. “I actually think every time you raise money, you’re diluting yourself and everybody else. So if you can succeed without raising too much money, that’s the best thing. We feel pretty good that we have been very capital efficient and now we’re recognized as a leader in the category — which is a huge category with $30 billion spend in the category. So we’re feeling pretty good about it.”

Jan
15
2021
--

GitLab oversaw a $195 million secondary sale that values the company at $6 billion

GitLab has confirmed with TechCrunch that it oversaw a $195 million secondary sale that values the company at $6 billion. CNBC broke the story earlier today.

The company’s impressive valuation comes after its most recent 2019 Series E in which it raised $268 million on a 2.75 billion valuation, an increase of $3.25 billion in under 18 months. Company co-founder and CEO Sid Sijbrandij believes the increase is due to his company’s progress adding functionality to the platform.

“We believe the increase in valuation over the past year reflects the progress of our complete DevOps platform towards realizing a greater share of the growing, multi-billion dollar software development market,” he told TechCrunch.

While the startup has raised over $434 million, this round involved buying employee stock options, a move that allows the company’s workers to cash in some of their equity prior to going public. CNBC reported that the firms buying the stock included Alta Park, HMI Capital, OMERS Growth Equity, TCV and Verition.

The next logical step would appear to be IPO, something the company has never shied away from. In fact, it actually at one point included the proposed date of November 18, 2020 as a target IPO date on the company wiki. While they didn’t quite make that goal, Sijbrandij still sees the company going public at some point. He’s just not being so specific as in the past, suggesting that the company has plenty of runway left from the last funding round and can go public when the timing is right.

“We continue to believe that being a public company is an integral part of realizing our mission. As a public company, GitLab would benefit from enhanced brand awareness, access to capital, shareholder liquidity, autonomy and transparency,” he said.

He added, “That said, we want to maximize the outcome by selecting an opportune time. Our most recent capital raise was in 2019 and contributed to an already healthy balance sheet. A strong balance sheet and business model enables us to select a period that works best for realizing our long-term goals.”

GitLab has not only published IPO goals on its Wiki, but its entire company philosophy, goals and OKRs for everyone to see. Sijbrandij told TechCrunch’s Alex Wilhelm at a TechCrunch Disrupt panel in September that he believes that transparency helps attract and keep employees. It doesn’t hurt that the company was and remains a fully remote organization, even pre-COVID.

“We started [this level of] transparency to connect with the wider community around GitLab, but it turned out to be super beneficial for attracting great talent as well,” Sijbrandij told Wilhelm in September.

The company, which launched in 2014, offers a DevOps platform to help move applications through the programming lifecycle.

Update: The original headline of this story has been changed from ‘GitLab raises $195M in secondary funding on $6 billion valuation.’

 

Dec
07
2020
--

3 questions to ask before adopting microservice architecture

As a product manager, I’m a true believer that you can solve any problem with the right product and process, even one as gnarly as the multiheaded hydra that is microservice overhead.

Working for Vertex Ventures US this summer was my chance to put this to the test. After interviewing 30+ industry experts from a diverse set of companies — Facebook, Fannie Mae, Confluent, Salesforce and more — and hosting a webinar with the co-founders of PagerDuty, LaunchDarkly and OpsLevel, we were able to answer three main questions:

  1. How do teams adopt microservices?
  2. What are the main challenges organizations face?
  3. Which strategies, processes and tools do companies use to overcome these challenges?

How do teams adopt microservices?

Out of dozens of companies we spoke with, only two had not yet started their journey to microservices, but both were actively considering it. Industry trends mirror this as well. In an O’Reilly survey of 1500+ respondents, more than 75% had started to adopt microservices.

It’s rare for companies to start building with microservices from the ground up. Of the companies we spoke with, only one had done so. Some startups, such as LaunchDarkly, plan to build their infrastructure using microservices, but turned to a monolith once they realized the high cost of overhead.

“We were spending more time effectively building and operating a system for distributed systems versus actually building our own services so we pulled back hard,” said John Kodumal, CTO and co-founder of LaunchDarkly.

“As an example, the things we were trying to do in mesosphere, they were impossible,” he said. “We couldn’t do any logging. Zero downtime deploys were impossible. There were so many bugs in the infrastructure and we were spending so much time debugging the basic things that we weren’t building our own service.”

As a result, it’s more common for companies to start with a monolith and move to microservices to scale their infrastructure with their organization. Once a company reaches ~30 developers, most begin decentralizing control by moving to a microservice architecture.

Teams may take different routes to arrive at a microservice architecture, but they tend to face a common set of challenges once they get there.

Large companies with established monoliths are keen to move to microservices, but costs are high and the transition can take years. Atlassian’s platform infrastructure is in microservices, but legacy monoliths in Jira and Confluence persist despite ongoing decomposition efforts. Large companies often get stuck in this transition. However, a combination of strong, top-down strategy combined with bottoms-up dev team support can help companies, such as Freddie Mac, make substantial progress.

Some startups, like Instacart, first shifted to a modular monolith that allows the code to reside in a single repository while beginning the process of distributing ownership of discrete code functions to relevant teams. This enables them to mitigate the overhead associated with a microservice architecture by balancing the visibility of having a centralized repository and release pipeline with the flexibility of discrete ownership over portions of the codebase.

What challenges do teams face?

Teams may take different routes to arrive at a microservice architecture, but they tend to face a common set of challenges once they get there. John Laban, CEO and co-founder of OpsLevel, which helps teams build and manage microservices told us that “with a distributed or microservices based architecture your teams benefit from being able to move independently from each other, but there are some gotchas to look out for.”

Indeed, the linked O’Reilly chart shows how the top 10 challenges organizations face when adopting microservices are shared by 25%+ of respondents. While we discussed some of the adoption blockers above, feedback from our interviews highlighted issues around managing complexity.

The lack of a coherent definition for a service can cause teams to generate unnecessary overhead by creating too many similar services or spreading related services across different groups. One company we spoke with went down the path of decomposing their monolith and took it too far. Their service definitions were too narrow, and by the time decomposition was complete, they were left with 4,000+ microservices to manage. They then had to backtrack and consolidate down to a more manageable number.

Defining too many services creates unnecessary organizational and technical silos while increasing complexity and overhead. Logging and monitoring must be present on each service, but with ownership spread across different teams, a lack of standardized tooling can create observability headaches. It’s challenging for teams to get a single-pane-of-glass view with too many different interacting systems and services that span the entire architecture.

Dec
01
2020
--

AWS announces DevOps Guru to find operational issues automatically

At AWS re:Invent today, Andy Jassy announced DevOps Guru, a new tool for DevOps teams to help the operations side find issues that could be having an impact on an application performance. Consider it like the sibling of CodeGuru, the service the company announced last year to find issues in your code before you deploy.

It works in a similar fashion using machine learning to find issues on the operations side of the equation. “I’m excited to launch a new service today called Amazon DevOps Guru, which is a new service that uses machine learning to identify operational issues long before they impact customers,” Jassy said today.

The way it works is that it collects and analyzes data from application metrics, logs, and events “to identify behavior that deviates from normal operational patterns,” the company explained in the blog post announcing the new service.

This service essentially gives AWS a product that would be competing with companies like Sumo Logic, DataDog or Splunk by providing deep operational insight on problems that could be having an impact on your application such as misconfigurations or resources that are over capacity.

When it finds a problem, the service can send an SMS, Slack message or other communication to the team and provides recommendations on how to fix the problem as quickly as possible.

What’s more, you pay for the data analyzed by the service, rather than a monthly fee. The company says this means that there is no upfront cost or commitment involved.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com