Jul
30
2021
--

Improve PostgreSQL Query Performance Insights with pg_stat_monitor

Improve PostgreSQL Query Performance pg_stat_monitor

Understanding query performance patterns is essentially the foundation for query performance tuning. It, in many ways, dictates how a database cluster evolves. And then there are obviously direct and indirect cost connotations as well.

PostgreSQL provides very detailed statistics through a number of catalog views and extensions that can be easily added to provide more detailed query statistics. With each view focused on a particular aspect, the picture almost always needs to be stitched together by combining different datasets. That requires effort and still, the whole picture might not be complete.

The pg_stat_monitor extension attempts to provide a more holistic picture by providing much-needed query performance insights in a single view. The extension has been evolving over the past year and is now nearing the GA release.

Some Useful Extensions

Currently, you may be relying on a number of extensions to understand how a query behaves, the time taken in planning and execution phases, min/max/meantime values, index hits, query plan, and client application details. Here are some extensions that you might already be very familiar with.

pg_stat_activity

This view is available by default with PostgreSQL. It provides one row per server process along with current activity and query text.

In case you’d like to learn more about it, hop over to the official PostgreSQL documentation here.

pg_stat_statements

This extension is part of the contrib packages provided with the PostgreSQL server. However, you’d have to create the extension manually. It’s a query-wise aggregation of statistical data with min/max/mean/standard deviation for execution and planning times and various useful information and query text.

You can read more about pg_stat_statements at the official PostgreSQL documentation site.

auto_explain

Another useful extension is provided by the PostgreSQL server. It dumps query plans in the server log for any query exceeding a time threshold specified by a GUC

(Grand Unified Configuration).

You can find more about auto_explain here.

pg_stat_monitor

Whilst all previously mentioned views/extensions are great in their own right, one needs to manually combine client/connection information from pg_stat_activity, statistical data from pg_stat_statements, and query plan from auto_analyze to complete the dataset to understand query performance patterns

And that’s precisely the pain that pg_stat_monitor alleviates.

The feature set has been growing over the past year, with it providing, in a single view, all performance-related information that you may need to debug a low performant query. For more information about the extension see our GitHub repository, or for user-specific documentation, see our user guide.

Feature Set

Some features that were part of earlier releases are already discussed in this blog, however, for completeness, I’m going to discuss those here as well.

  • Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals; time buckets. This allows for much better data accuracy, especially in the case of high resolution or unreliable networks.
  • Multi-Dimensional Grouping:  While pg_stat_statements groups counters by (userid, dbid, queryid),  pg_stat_monitor uses a more detailed group for higher precision:
    • Bucket ID (bucket),
    • User ID (userid),
    • Database ID (dbid),
    • Query ID (queryid),
    • Client IP Address (client_ip),
    • Plan ID (planid),
    • Application Name (application_name).

This allows you to drill down into the performance of queries coming from particular client addresses and applications, which we at Percona have found to be very valuable in a number of cases.

  • Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual query examples.
  • Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. Also, we found having query parameter values is very helpful, as you can run EXPLAIN on it, or easily play with modifying the query to make it run better, as well as making communication about the query clearer when discussing with other DBAs and application developers.
  • Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
  • Histogram: Visual representation is very helpful when it can help identify issues. With the help of the histogram function, you can now view a timing/calling data histogram in response to a SQL query. And yes, it even works in psql.
SELECT * FROM histogram(0, 'F44CD1B4B33A47AF') AS a(range TEXT, freq INT, bar TEXT);
       range        | freq |              bar
--------------------+------+--------------------------------
  (0 - 3)}          |    2 | ??????????????????????????????
  (3 - 10)}         |    0 |
  (10 - 31)}        |    1 | ???????????????
  (31 - 100)}       |    0 |
  (100 - 316)}      |    0 |
  (316 - 1000)}     |    0 |
  (1000 - 3162)}    |    0 |
  (3162 - 10000)}   |    0 |
  (10000 - 31622)}  |    0 |
  (31622 - 100000)} |    0 |
(10 rows)

  • Functions: It may come as a surprise, but we do understand that functions may internally execute statements!!! To help ease the tracking and analysis, pg_stat_monitor now provides a column that specifically helps keep track of the top query for a statement so that you can backtrack to the originating function.
  • Relation Names: Relations used in a query are available in the “relations” column in the pg_stat_monitor view. This reduces work at your and makes analysis simpler and quicker.
  • Query Types: With query classification as SELECT, INSERT, UPDATE or DELETE, analysis becomes simpler. It’s another effort reduced at your end, and another simplification by pg_stat_monitor.
SELECT bucket, substr(query,0, 50) AS query, cmd_type FROM pg_stat_monitor WHERE elevel = 0;
 bucket |                       query                       | cmd_type 
--------+---------------------------------------------------+----------
      4 | END                                               | 
      4 | SELECT abalance FROM pgbench_accounts WHERE aid = | SELECT
      4 | vacuum pgbench_branches                           | 
      4 | select count(*) from pgbench_branches             | SELECT
      4 | UPDATE pgbench_accounts SET abalance = abalance + | UPDATE
      4 | truncate pgbench_history                          | 
      4 | INSERT INTO pgbench_history (tid, bid, aid, delta | INSERT

  • Query Metadata: Google’s Sqlcommenter is a useful tool that in a way bridges that gap between ORM libraries and understanding database performance. And we support it. So, you can now put any key value data in the comments in /* … */ syntax (see Sqlcommenter documentation for details) in your SQL statements, and the information will be parsed by pg_stat_monitor and made available in the comments column in pg_stat_monitor view.
CREATE EXTENSION hstore;
CREATE FUNCTION text_to_hstore(s text) RETURNS hstore AS $$
BEGIN
    RETURN hstore(s::text[]);
EXCEPTION WHEN OTHERS THEN
    RETURN NULL;
END; $$ LANGUAGE plpgsql STRICT;


SELECT 1 AS num /* { "application", java_app, "real_ip", 192.168.1.1} */;
 num 
-----
   1
(1 row)

SELECT query, text_to_hstore(comments)->'real_ip' AS real_ip from pg_stat_monitor;
query                                                                       |  real_ip 
----------------------------------------------------------------------------+-------------
 SELECT $1 AS num /* { "application", psql_app, "real_ip", 192.168.1.3) */  | 192.168.1.1

  • Logging Error and Warning: As seen in different monitoring/statics collector tools, most of the tools/extensions only monitor the successful queries. But in many cases, monitoring ERROR, WARNING, and LOG give meaningful information to debug the issue. pg_stat_monitor not only monitors the ERROR/WARNINGS/LOG but also collects the statistics about these queries. In PostgreSQL queries with ERROR/WARNING there is an error level (elevel), SQL Code (sqlcode), and an error message is attached. Pg_stat_monitor collects all this information along with its aggregates.
SELECT substr(query,0,50) AS query, decode_error_level(elevel) AS elevel,sqlcode, calls, substr(message,0,50) message 
FROM pg_stat_monitor;
                       query                       | elevel | sqlcode | calls |                      message                      
---------------------------------------------------+--------+---------+-------+---------------------------------------------------
 select substr(query,$1,$2) as query, decode_error |        |       0 |     1 | 
 select bucket,substr(query,$1,$2),decode_error_le |        |       0 |     3 | 
 select 1/0;                                       | ERROR  |     130 |     1 | division by zero

We’ve Come a Long Way

What started as a concept is now nearing its final approach. The pg_stat_monitor extension has evolved and has become very feature-rich. We have no doubt about its usefulness for DBAs, performance engineers, application developers, and anyone who needs to look at query performance. We believe it can help save many hours and help identify unexpected query behaviors. 

pg_stat_monitor is available on Github. We are releasing it to get feedback from the community on what we’re doing right and what we should do differently before we release pg_stat_monitor as a generally available version to be supported for years to come. Please check it out,  drop us a note, file an issue, or make a pull request!

Try Percona Distribution for PostgreSQL today; it’s free to download and use! 

Jul
30
2021
--

Percona Monthly Bug Report: July 2021

July 2021 Percona Bug Report

July 2021 Percona Bug ReportHere at Percona, we operate on the premise that full transparency makes a product better. We strive to build the best open-source database products, but also to help you manage any issues that arise in any of the databases that we support. And, in true open-source form, report back on any issues or bugs you might encounter along the way.

We constantly update our bug reports and monitor other boards to ensure we have the latest information, but we wanted to make it a little easier for you to keep track of the most critical ones. These posts are a central place to get information on the most noteworthy open and recently resolved bugs. 

In this July 2021 edition of our monthly bug report, we have the following list of bugs:

Percona Server for MySQL/MySQL Bugs

 MySQL#77239: When using multi-threaded replication, SQL thread in replica stop with the following error sometimes, and further SQL Thread does not preserve this information about the error in its status and replica starts without any error.

Last_SQL_Errno: 1756

Last_SQL_Error: … The slave coordinator and worker threads are stopped, possibly leaving data in an inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases, you have to examine your data (see the documentation for details).

There are few cases mentioned in the bug report which will trigger this error, one of the cases is when regularly stopping replication occasionally it throws 1756 error for no apparent reason, even if resuming replication works without problems later. 

In my test, I’m not able to reproduce this issue with the Percona Server for MySQL.

Affects Version/s: 8.0 [Tested/Reported version 8.0.25]

 

PS-7778 (MySQL#104168): The prepare statement can be failed when triggers with DEFINER are used. This is a regression bug introduce after WL#9384 implementation. Issue not reproducible in lower versions, tested with 8.0.20

Affects Version/s: 8.0 [Tested/Reported version 8.0.22, 8.0.25]

 

MySQL#102586:  When doing a multiple-table DELETE that is anticipating a foreign key ON DELETE CASCADE, the statements work on the primary but it breaks row-based replication.

Affects Version/s: 8.0, 5.7  [Tested/Reported version 8.0.23, 5.7.33]

 

Percona XtraDB Cluster

PXC-3449: When ALTER TABLE (TOI) is executed in a user session, sometimes it happens that it conflicts (MDL) with high priority transaction, which causes BF-BF to abort and server termination.

Affects Version/s: 8.0  [Tested/Reported version 8.0.21]

Fixed Version/s: 8.0.25

 

Percona XtraBackup

PXB-2375:  In some cases, XtraBackup will write the wrong binlog filename, pos, and GTID combination info in xtrabackup_binlog_info. Due to this, XtraBackup might not work as expected with GTID.

If we are using this backup with GTID position details in xtrabackup_binlog_info to create a new replica, then most likely replication will break due to incorrect GTID position.

Looks like the GTID position is not consistent with binlog file pos, they are captured differently and later printed together in xtrabackup_binlog_info  file.

Workaround to avoid this bug,

  •  Use binary log coordinates 
  • Take a backup in non-peak hours since this issue mostly occurred when MySQL is under heavy write operations.

Affects Version/s:  8.0 [Tested/Reported version 8.0.14]

 

Percona Toolkit

PT-1889: Incorrect output when using pt-show-grants for users based on MySQL roles, and as a result, they can not be applied back properly on MySQL server. Due to this, we can not use pt-show-grants for MySQL roles until this issue is fixed.

Affects Version/s:  3.2.1

 

PT-1747: pt-online-schema-change was bringing the database into a broken state when applying the “rebuild_constraints” foreign keys modification method if any of the child tables were blocked by the metadata lock.

Affects Version/s:  3.0.13

Fixed Version: 3.3.2

 

PMM  [Percona Monitoring and Management]

PMM-8004: When mongos connection terminates/interrupted could be kill -9 , timeout, network issue, etc, if at this time mongodb_exporter is trying to get DB status or databases list from mongos then it crashes at runtime with panic error.

Affects Version/s: 2.x  [Tested/Reported version 2.18,2.19]

 

PMM-8307: Users unable to use PMM with a large number of DB servers (500+) added for monitoring due to default configuration limitations for allowed numbers of connections.

As a workaround, we can increase worker_connections on Nginx.

Affects Version/s:  2.x  [Tested/Reported version 2.18.0]

Fixed version: 2.21.0

 

PMM-7846:  Adding MongoDB instance via pmm-admin with tls option is not working and failing with error Connection check failed: timeout (context deadline exceeded)

Affects Version/s: 2.x  [Tested/Reported version 2.13, 2.16]

 

PMM-4665: Frequent error messages in pmm-agent.log for components like tokudb storage engine which are not supported by upstream MySQL. As a result, it increases the overall log file size due to these additional messages.

Affects Version/s:  2.x  [Tested/Reported version 2.13.0]

Fixed version: 2.19.0

 

Summary

We welcome community input and feedback on all our products. If you find a bug or would like to suggest an improvement or a feature, learn how in our post, How to Report Bugs, Improvements, New Feature Requests for Percona Products.

For the most up-to-date information, be sure to follow us on Twitter, LinkedIn, and Facebook.

Quick References:

Percona JIRA  

MySQL Bug Report

Report a Bug in a Percona Product

MySQL 8.0.24 Release notes

___

About Percona:

As the only provider of distributions for all three of the most popular open source databases—PostgreSQL, MySQL, and MongoDB—Percona provides expertise, software, support, and services no matter the technology.

Whether it’s enabling developers or DBAs to realize value faster with tools, advice, and guidance, or making sure applications can scale and handle peak loads, Percona is here to help.

Percona is committed to being open source and preventing vendor lock-in. Percona contributes all changes to the upstream community for possible inclusion in future product releases.

Jul
29
2021
--

4 key areas SaaS startups must address to scale infrastructure for the enterprise

Startups and SMBs are usually the first to adopt many SaaS products. But as these customers grow in size and complexity — and as you rope in larger organizations — scaling your infrastructure for the enterprise becomes critical for success.

Below are four tips on how to advance your company’s infrastructure to support and grow with your largest customers.

Address your customers’ security and reliability needs

If you’re building SaaS, odds are you’re holding very important customer data. Regardless of what you build, that makes you a threat vector for attacks on your customers. While security is important for all customers, the stakes certainly get higher the larger they grow.

Given the stakes, it’s paramount to build infrastructure, products and processes that address your customers’ growing security and reliability needs. That includes the ethical and moral obligation you have to make sure your systems and practices meet and exceed any claim you make about security and reliability to your customers.

Here are security and reliability requirements large customers typically ask for:

Formal SLAs around uptime: If you’re building SaaS, customers expect it to be available all the time. Large customers using your software for mission-critical applications will expect to see formal SLAs in contracts committing to 99.9% uptime or higher. As you build infrastructure and product layers, you need to be confident in your uptime and be able to measure uptime on a per customer basis so you know if you’re meeting your contractual obligations.

While it’s hard to prioritize asks from your largest customers, you’ll find that their collective feedback will pull your product roadmap in a specific direction.

Real-time status of your platform: Most larger customers will expect to see your platform’s historical uptime and have real-time visibility into events and incidents as they happen. As you mature and specialize, creating this visibility for customers also drives more collaboration between your customer operations and infrastructure teams. This collaboration is valuable to invest in, as it provides insights into how customers are experiencing a particular degradation in your service and allows for you to communicate back what you found so far and what your ETA is.

Backups: As your customers grow, be prepared for expectations around backups — not just in terms of how long it takes to recover the whole application, but also around backup periodicity, location of your backups and data retention (e.g., are you holding on to the data too long?). If you’re building your backup strategy, thinking about future flexibility around backup management will help you stay ahead of these asks.

Jul
29
2021
--

ConverseNow is targeting restaurant drive-thrus with new $15M round

One year after voice-based AI technology company ConverseNow raised a $3.3 million seed round, the company is back with a cash infusion of $15 million in Series A funding in a round led by Craft Ventures.

The Austin-based company’s AI voice ordering assistants George and Becky work inside quick-serve restaurants to take orders via phone, chat, drive-thru and self-service kiosks, freeing up staff to concentrate on food preparation and customer service.

Joining Craft in the Series A round were LiveOak Venture Partners, Tensility Venture Partners, Knoll Ventures, Bala Investments, 2048 Ventures, Bridge Investments, Moneta Ventures and angel investors Federico Castellucci and Ashish Gupta. This new investment brings ConverseNow’s total funding to $18.3 million, Vinay Shukla, co-founder and CEO of ConverseNow, told TechCrunch.

As part of the investment, Bryan Rosenblatt, partner at Craft Ventures, is joining the company’s board of directors, and said in a written statement that “post-pandemic, quick-service restaurants are primed for digital transformation, and we see a unique opportunity for ConverseNow to become a driving force in the space.”

At the time when ConverseNow raised its seed funding in 2020, it was piloting its technology in just a handful of stores. Today, it is live in over 750 stores and grew seven times in revenue and five times in headcount.

Restaurants were some of the hardest-hit industries during the pandemic, and as they reopen, Shukla said their two main problems will be labor and supply chain, and “that is where our technology intersects.”

The AI assistants are able to step in during peak times when workers are busy to help take orders so that customers are not waiting to place their orders, or calls get dropped or abandoned, something Shukla said happens often.

It can also drive more business. ConverseNow said it is shown to increase average orders by 23% and revenue by 20%, while adding up to 12 hours of extra deployable labor time per store per week.

Company co-founder Rahul Aggarwal said more people prefer to order remotely, which has led to an increase in volume. However, the more workers have to multitask, the less focus they have on any one job.

“If you step into restaurants with ConverseNow, you see them reimagined,” Aggarwal said. “You find workers focusing on the job they like to do, which is preparing food. It is also driving better work balance, while on the customer side, you don’t have to wait in the queue. Operators have more time to churn orders, and service time comes down.”

ConverseNow is one of the startups within the global restaurant management software market that is forecasted to reach $6.94 billion by 2025, according to Grand View Research. Over the past year, startups in the space attracted both investors and acquirers. For example, point-of-sale software company Lightspeed acquired Upserve in December for $430 million. Earlier this year, Sunday raised $24 million for its checkout technology.

The new funding will enable ConverseNow to continue developing its line-busting technology and invest in marketing, sales and product innovation. It will also be working on building a database from every conversation and onboarding new customers quicker, which involves inputting the initial menu.

By leveraging artificial intelligence, the company will be able to course-correct any inconsistencies, like background noise on a call, and better predict what a customer might be saying. It will also correct missing words and translate the order better. In the future, Shukla and Aggarwal also want the platform to be able to tell what is going on around the restaurant — what traffic is like, the weather and any menu promotions to drive upsell.

 

Jul
29
2021
--

PostgreSQL PL/Java – A How-To, Part 1

PostgreSQL PL:Java

We’ve recently received some questions regarding PL/Java and I found it hard to get clear instructions searching on the internet. It’s not that there is no good information out there, but most of it is either incomplete, outdated, or confusing and I decided to create this short “how-to” and show how to install it and how to get it running with few examples.

Installation

I will show here how to install it from sources, first because my platform doesn’t have the compiled binaries, and second because if your platform has the binaries from the package manager you can just install it from there, for example using YUM or APT. Also, note that I’m using PL/Java without the designation “TRUSTED” and a Postgres database superuser for simplicity. I would recommend reading the documentation about users and privileges here[1].

The versions of the software I’m using here are:

  • PostgreSQL 12.7
  • PL/Java 1.6.2
  • OpenJDK 11
  • Apache Maven 3.6.3

I downloaded the sources from “https://github.com/tada/pljava/releases“, unpackaged and compiled with maven:

wget https://github.com/tada/pljava/archive/refs/tags/V1_6_2.tar.gz
tar -xf V1_6_2.tar.gz
cd pljava-1_6_2
mvn clean install
java -jar pljava-packaging/target/pljava-pg12.jar

I’ll assume here that you know maven enough and won’t go through the “mvn” command. The “java -jar pljava-packaging/target/pljava-pg12.jar” will copy/install the needed files and packages into Postgres folders. Note that maven used my Postgres version and created the jar file with the version: “pljava-pg12.jar“, so pay attention to the version you have there as the jar file will change if you have a different Postgres version!

I can now create the extension into the database I will use it. I’m using the database “demo” in this blog:

$ psql demo
psql (12.7)
Type "help" for help.

demo=# CREATE EXTENSION pljava;
WARNING: Java virtual machine not yet loaded
DETAIL: libjvm: cannot open shared object file: No such file or directory
HINT: SET pljava.libjvm_location TO the correct path to the jvm library (libjvm.so or jvm.dll, etc.)
ERROR: cannot use PL/Java before successfully completing its setup
HINT: Check the log for messages closely preceding this one, detailing what step of setup failed and what will be needed, probably setting one of the "pljava." configuration variables, to complete the setup. If there is not enough help in the log, try again with different settings for "log_min_messages" or "log_error_verbosity".

Not exactly what I was expecting but I got a good hint: “HINT: SET pljava.libjvm_location TO the correct path to the jvm library (libjvm.so or jvm.dll, etc.)“. Ok, I had to find the libjvm my system is using to configure Postgres. I used the SET command to do it online:

demo=# SET pljava.libjvm_location TO '/usr/lib/jvm/java-11-openjdk-11.0.11.0.9-5.fc34.x86_64/lib/server/libjvm.so';
NOTICE: PL/Java loaded
DETAIL: versions:
PL/Java native code (1.6.2)
PL/Java common code (1.6.2)
Built for (PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3), 64-bit)
Loaded in (PostgreSQL 12.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.1.1 20210531 (Red Hat 11.1.1-3), 64-bit)
OpenJDK Runtime Environment (11.0.11+9)
OpenJDK 64-Bit Server VM (11.0.11+9, mixed mode, sharing)
NOTICE: PL/Java successfully started after adjusting settings
HINT: The settings that worked should be saved (using ALTER DATABASE demo SET ... FROM CURRENT or in the "/v01/data/db/pg12/postgresql.conf" file). For a reminder of what has been set, try: SELECT name, setting FROM pg_settings WHERE name LIKE 'pljava.%' AND source = 'session'
NOTICE: PL/Java load successful after failed CREATE EXTENSION
DETAIL: PL/Java is now installed, but not as an extension.
HINT: To correct that, either COMMIT or ROLLBACK, make sure the working settings are saved, exit this session, and in a new session, either: 1. if committed, run "CREATE EXTENSION pljava FROM unpackaged", or 2. if rolled back, simply "CREATE EXTENSION pljava" again.
SET

Also used the “ALTER SYSTEM” to make it persistent across all my databases as it writes the given parameter setting to the “postgresql.auto.conf” file, which is read in addition to “postgresql.conf“:

demo=# ALTER SYSTEM SET pljava.libjvm_location TO '/usr/lib/jvm/java-11-openjdk-11.0.11.0.9-5.fc34.x86_64/lib/server/libjvm.so';
ALTER SYSTEM

Now we have it installed we can check the system catalog if it is indeed there:

demo=# SELECT * FROM pg_language WHERE lanname LIKE 'java%';
oid    | lanname | lanowner | lanispl | lanpltrusted | lanplcallfoid | laninline | lanvalidator | lanacl
-------+---------+----------+---------+--------------+---------------+-----------+--------------+-------------------
16428  | java    | 10       | t       | t            | 16424         | 0         | 16427        | {charly=U/charly}
16429  | javau   | 10       | t       | f            | 16425         | 0         | 16426        |
(2 rows)

And test if it is working:

demo=# CREATE FUNCTION getProperty(VARCHAR)
RETURNS VARCHAR
AS 'java.lang.System.getProperty'
LANGUAGE java;
CREATE FUNCTION
demo=# SELECT getProperty('java.version');
getproperty
-------------
11.0.11
(1 row)

It’s working! Time to try something useful.

Accessing Database Objects with PL/Java

The majority of the examples I found showed how to do a “hello world” from a Java class or how to calculate a Fibonacci sequence but nothing how to access database objects. Well, nothing wrong with those examples but I suppose that one who installs PL/Java in his database would like to access database objects from inside of a Java function and this is what we gonna do here.

I will use the sample database “pagila” that can be found here[2] for our tests in this post.

For this first example, I will create a simple class with a static method that will be accessed outside like any Postgres function. The function will receive an integer argument and use it to search the table “customer”, column “customer_id” and will print the customer’s id, full name, email,  and address:

package com.percona.blog.pljava;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class Customers {
	private static String m_url = "jdbc:default:connection";

	public static String getCustomerInfo(Integer id) throws SQLException {
		Connection conn = DriverManager.getConnection(m_url);
		String query = "SELECT c.customer_id, c.last_name ||', '|| c.first_name as full_name, "
				+ " c.email, a.address, ci.city, a.district "
				+ " FROM customer c"
				+ "	 JOIN address a on a.address_id = c.address_id "
				+ "	JOIN city ci on ci.city_id = a.city_id "
				+ " WHERE customer_id = ?";

		PreparedStatement stmt = conn.prepareStatement(query);
		stmt.setInt(1, id);
		ResultSet rs = stmt.executeQuery();
		rs.next();
		String ret; 
		ret = "- ID: " + rs.getString("customer_id") ;
		ret += "\n- Name: " + rs.getString("full_name");
		ret += "\n- Email: " + rs.getString("email");
		ret += "\n- Address: " + rs.getString("address");
		ret += "\n- City: " + rs.getString("city");
		ret += "\n- District: " + rs.getString("district");
		ret += "\n--------------------------------------------------------------------------------";

		stmt.close();
		conn.close();

		return (ret);
	}
}

I’ve compiled and created the “jar” file manually with the below commands:

javac com/percona/blog/pljava/Customers.java
jar -c -f /app/pg12/lib/demo.jar com/percona/blog/pljava/Customers.class

Note that I’ve created the jar file inside the folder “/app/pg12/lib”, keep notes because we’ll use this information in the next step, loading the jar file inside Postgres:

demo=# SELECT sqlj.install_jar( 'file:///app/pg12/lib/demo.jar', 'demo', true );
 install_jar 
-------------
(1 row)

demo=# SELECT sqlj.set_classpath( 'public', 'demo' );
 set_classpath 
---------------
(1 row)

The install_jar function has the signature “install_jar(<jar_url>, <jar_name>, <deploy>)” and it loads a jar file from a location appointed by an URL into the SQLJ jar repository. It is an error if a jar with the given name already exists in the repository or if the jar doesn’t exist in the URL or the database isn’t able to read it:

demo=# SELECT sqlj.install_jar( 'file:///app/pg12/lib/demo2.jar', 'demo', true );
ERROR:  java.sql.SQLException: I/O exception reading jar file: /app/pg12/lib/demo2.jar (No such file or directory)
demo=# SELECT sqlj.install_jar( 'file:///app/pg12/lib/demo.jar', 'demo', true );
 install_jar 
------------- 
(1 row)

demo=# SELECT sqlj.install_jar( 'file:///app/pg12/lib/demo.jar', 'demo', true );
ERROR:  java.sql.SQLNonTransientException: A jar named 'demo' already exists

The function set_classpath defines a classpath for the given schema, in this example the schema “public”. A classpath consists of a colon-separated list of jar names or class names. It’s an error if the given schema does not exist or if one or more jar names references non-existent jars.

The next step is to create the Postgres functions:

demo=# CREATE FUNCTION getCustomerInfo( INT ) RETURNS CHAR AS 
    'com.percona.blog.pljava.Customers.getCustomerInfo( java.lang.Integer )'
LANGUAGE java;
CREATE FUNCTION

We can now use it:

demo=# SELECT getCustomerInfo(100);
                                 getcustomerinfo                                  
----------------------------------------------------------------------------------
 - ID: 100                                                                       +
 - Name: HAYES, ROBIN                                                            +
 - Email: ROBIN.HAYES@sakilacustomer.org                                         +
 - Address: 1913 Kamakura Place                                                  +
 - City: Jelets                                                                  +
 - District: Lipetsk                                                             +
 --------------------------------------------------------------------------------
(1 row)

Sweet, we have our first Java function inside our Postgres demo database.

Now, in our last example here I will add another method to this class, now to list all the payments from a given customer and calculate its total:

package com.percona.blog.pljava;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class Customers {
	private static String m_url = "jdbc:default:connection";

	public static String getCustomerInfo(Integer id) throws SQLException {
		Connection conn = DriverManager.getConnection(m_url);
		String query = "SELECT c.customer_id, c.last_name ||', '|| c.first_name as full_name, "
				+ " c.email, a.address, ci.city, a.district "
				+ " FROM customer c"
				+ "	 JOIN address a on a.address_id = c.address_id "
				+ "	JOIN city ci on ci.city_id = a.city_id "
				+ " WHERE customer_id = ?";

		PreparedStatement stmt = conn.prepareStatement(query);
		stmt.setInt(1, id);
		ResultSet rs = stmt.executeQuery();
		rs.next();
		String ret; 
		ret = "- ID: " + rs.getString("customer_id") ;
		ret += "\n- Name: " + rs.getString("full_name");
		ret += "\n- Email: " + rs.getString("email");
		ret += "\n- Address: " + rs.getString("address");
		ret += "\n- City: " + rs.getString("city");
		ret += "\n- District: " + rs.getString("district");
		ret += "\n--------------------------------------------------------------------------------";

		stmt.close();
		conn.close();

		return (ret);
	}

	public static String getCustomerTotal(Integer id) throws SQLException {
		Connection conn;
		PreparedStatement stmt;
		ResultSet rs;
		String result;
		double total;

		conn = DriverManager.getConnection(m_url);
		stmt = conn.prepareStatement(
				"SELECT c.customer_id, c.first_name, c.last_name FROM customer c WHERE c.customer_id = ?");
		stmt.setInt(1, id);
		rs = stmt.executeQuery();
		if (rs.next()) {
			result = "Customer ID  : " + rs.getInt("customer_id");
			result += "\nCustomer Name: " + rs.getString("last_name") + ", " + rs.getString("first_name");
			result += "\n--------------------------------------------------------------------------------------------------------";
		} else {
			return null;
		}

		stmt = conn.prepareStatement("SELECT p.payment_date, p.amount FROM payment p WHERE p.customer_id = ? ORDER BY 1");
		stmt.setInt(1, id);
		rs = stmt.executeQuery();
		total = 0;

		while (rs.next()) {
			result += "\nPayment date: " + rs.getString("payment_date") + ",    Value: " + rs.getString("amount");
			total += rs.getFloat("amount");
		}
		result += "\n--------------------------------------------------------------------------------------------------------";
		result += "\nTotal: " +String.format("%1$,.2f",  total);
		
		stmt.close();
		conn.close();
		return (result);
	}
}

Same instructions to compile:

javac com/percona/blog/pljava/Customers.java 
jar -c -f /app/pg12/lib/demo.jar com/percona/blog/pljava/Customers.class

Then we need to replace the loaded jar file for the newly created and create the function inside Postgres:

demo=# SELECT sqlj.replace_jar( 'file:///app/pg12/lib/demo.jar', 'demo', true );
 replace_jar 
-------------
(1 row)

demo=# CREATE FUNCTION getCustomerTotal( INT ) RETURNS CHAR AS 
    'com.percona.blog.pljava.Customers.getCustomerTotal( java.lang.Integer )'
LANGUAGE java;
CREATE FUNCTION

And the result is:

demo=# SELECT getCustomerInfo(100);
                                 getcustomerinfo                                  
----------------------------------------------------------------------------------
 - ID: 100                                                                       +
 - Name: HAYES, ROBIN                                                            +
 - Email: ROBIN.HAYES@sakilacustomer.org                                         +
 - Address: 1913 Kamakura Place                                                  +
 - City: Jelets                                                                  +
 - District: Lipetsk                                                             +
 --------------------------------------------------------------------------------
(1 row)

We finish this part here and with this last example. At this point, we are able to access objects, loop through a resultset, and return the result back as a single object like a TEXT. I will discuss how to return an array/resultset, how to use PL/Java functions within triggers, and how to use external resources in part two and part three of this article, stay tuned!

[1] https://tada.github.io/pljava/use/policy.html
[2] https://www.postgresql.org/ftp/projects/pgFoundry/dbsamples/pagila/

Jul
29
2021
--

Homebase raises $71M for a team management platform aimed at SMBs and their hourly workers

Small and medium enterprises have become a big opportunity in the world of B2B technology in the last several years, and today a startup that’s building tools aimed at helping them manage their teams of workers is announcing some funding that underscores the state of that market.

Homebase, which provides a platform that helps SMBs manage various services related to their hourly workforces, has closed $71 million in funding, a Series C that values the company between $500 million and $600 million, according to sources close to the startup.

The round has a number of big names in it that are as much a sign of how large VCs are valuing the SMB market right now as it is of the strategic interest of the individuals who are participating. GGV Capital is leading the round, with past backers Bain Capital Ventures, Baseline Ventures, Bedrock, Cowboy Ventures and Khosla Ventures also participating. Individuals include Focus Brands President Kat Cole; Jocelyn Mangan, a board member at Papa John’s and Chownow and former COO of Snag; former CFO of payroll and benefits company Gusto, Mike Dinsdale; Guild Education founder Rachel Carlson; star athletes Jrue and Lauren Holiday; and alright alright alright actor and famous everyman and future political candidate Matthew McConaughey.

Homebase has raised $108 million to date.

The funding is coming on the heels of strong growth for Homebase (which is not to be confused with the U.K./Irish home improvement chain of the same name, nor the YC-backed Vietnamese proptech startup).

The company now has some 100,000 small businesses, with 1 million employees in total, on its platform. Businesses use Homebase to manage all manner of activities related to workers that are paid hourly, including (most recently) payroll, as well as shift scheduling, timeclocks and timesheets, hiring and onboarding, communication and HR compliance.

John Waldmann, Homebase’s founder and CEO, said the funding will go toward both continuing to bring on more customers as well as expanding the list of services offered to them, which could include more features geared to frontline and service workers, as well as features for small businesses who might also have some “desk” workers who might still work hourly.

The common thread, Waldmann said, is not the exact nature of those jobs, but the fact that all of them, partly because of that hourly aspect, have been largely underserved by tech up to now.

“From the beginning, our mission was to help local businesses and their teams,” he said. Part of his inspiration came from people he knew: a childhood friend who owned an independent, expanding restaurant chain, and was going through the challenges of managing his teams there, carrying out most of his work on paper; and his sister, who worked in hospitality, which didn’t look all that different from his restaurant friend’s challenges. She had to call in to see when she was working, writing her hours in a notebook to make sure she got paid accurately. 

“There are a lot of tech companies focused on making work easier for folks that sit at computers or desks, but are building tools for these others,” Waldmann said. “In the world of work, the experience just looks different with technology.”

Homebase currently is focused on the North American market — there are some 5 million small businesses in the U.S. alone, and so there is a lot of opportunity there. The huge pressure that many have experienced in the last 16 months of COVID-19 living, leading some to shut down altogether, has also focused them on how to manage and carry out work much more efficiently and in a more organized way, ensuring you know where your staff is and that your staff knows what it should be doing at all times.

What will be interesting is to see what kinds of services Homebase adds to its platform over time: In a way, it’s a sign of how hourly wage workers are becoming a more sophisticated and salient aspect of the workforce, with their own unique demands. Payroll, which is now live in 27 states, also comes with pay advances, opening the door to other kinds of financial services for Homebase, for example.

“Small businesses are the lifeblood of the American economy, with more than 60% of Americans employed by one of our 30 million small businesses. In a post-pandemic world, technology has never been more important to businesses of all sizes, including SMBs,” Jeff Richards, managing partner at GGV Capital and new Homebase board member, said in a statement. “The team at Homebase has worked tirelessly for years to bring technology to SMBs in a way that helps drive increased profitability, better hiring and growth. We’re thrilled to see Homebase playing such an important role in America’s small business recovery and thrilled to be part of the mission going forward.”

It’s interesting to see McConaughey involved in this round, given that he’s most recently made a turn toward politics, with plans to run for governor of Texas in 2022.

“Hardworking people who work in and run restaurants and local businesses are important to all of us,” he said in a statement. “They play an important role in giving our cities a sense of livelihood, identity and community. This is why I’ve invested in Homebase. Homebase brings small business operations into the modern age and helps folks across the country not only continue to work harder, but work smarter.”

Jul
29
2021
--

Coralogix logs $55M for its new take on production analytics, now valued at $300M-$400M

Data may be the new oil, but it’s only valuable if you make good use of it. Today, a startup that has built a new kind of production analytics platform for developers, security engineers, and data scientists to track and better understand how data is moving around their networks is announcing a round of funding that underscores the demand for their technology.

Coralogix, which provides stateful streaming services to engineering teams, has picked up $55 million in a Series C round of funding.

The round was led by Greenfield Partners, with Red Dot Capital Partners, StageOne Ventures, Eyal Ofer’s O.G. Tech, Janvest Capital Partners, Maor Investments and 2B Angels also participating.

This Series C is coming about 10 months after the company’s Series B of $25 million, and from what we understand, Coralogix’s valuation is now in the range of $300 million to $400 million, a big jump for the startup, coming on the back of it growing 250% since this time last year, racking up some 2,000 paying customers, with some small teams paying as little as $100/year and large enterprises paying $1.5 million/year.

Previously, Coralogix — founded in Tel Aviv and with an HQ also in San Francisco — had also raised a round of $10 million.

Coralogix got its start as a platform aimed at quality assurance support for R&D and engineering teams. The focus here is on log analytics and metrics for platform engineers, and this still forms a big part of its business today. Added to that, in recent years, Coralogix’s tools are being applied to cloud security services, contributing to a company’s threat intelligence by providing a way to observe data for any inconsistencies that typically might point to a breach or another incident. (It integrates with Alien Vault and others for this purpose.)

The third area that is just picking up now and will be developed further — one of the uses of this investment, in fact — will be to develop how Coralogix is used for business intelligence. This is a particularly interesting area because it plays into how Coralogix is built, to provide analytics on data before it is indexed.

“It’s about high-volume, but low-value data,” Ariel Assaraf, Coralogix’s CEO, said in an interview. “Customers don’t want to store the data [or index it] but want to view it live and visualize it. We are starting to see a use case where business information and our analytics come together for sentiment analysis and other areas.”

There are dozens of strong companies providing tools these days to cover log analytics and data observability, underscoring the general growth and importance of DevOps these days. They include companies like DataDog, Sumo Logic and Splunk.

However, Assaraf believes that what sets his company apart is its approach: Essentially, it has devised a way of observing and analyzing data streams before they get indexed, giving engineers more flexibility to query the data in different ways and glean more insights, faster. The other issue with indexing, he said, is that it impacts latency, which also has a big impact on overall costs for an organization.

For many of Coralogix’s competitors, turning around the nature of the business to focus not first on indexing would be akin to completely rebuilding the business, hard to do at their scale (although this is what Coralogix did when it pivoted as a small company several years ago, which is when Assaraf took on the role of CEO). One company he believes might be more of a direct rival is Confluent.

“I think we will see Confluent getting into observability very soon because they have the streaming capabilities,” he said, “but not the tools we have.” Another potential competitor looming on the horizon: Salesforce, and its potential move into that area, underscores the shifting sands of what is powering enterprise IT investment decisions today.

Salesforce already has Heroku, Slack and Tableau, three major tools developers use for tracking and working with data, Assaraf pointed out, and there were strong rumors of it trying to buy DataDog, “so we definitely see where they are going. For sure, they understand the way things are changing. All the budgets when Salesforce first started were in marketing and sales. Now you sell to IT. Salesforce understands that shift to developers, and so that is where they are going.”

It makes for a very interesting landscape and future for companies like Coralogix, one that investors believe the startup will continue to shape as it has up to now.

“The dramatic shift in digital transformation is generating an explosion of data, which until now has forced enterprises to decide between cost and coverage,” said Shay Grinfeld, managing partner at Greenfield Partners. “Coralogix’s real-time streaming analytics pipeline employs proprietary algorithms to break this tradeoff and generate significant cost savings. Coralogix has built a customer roster that comprises some of the largest and most innovative companies in the world. We’re thrilled to partner with Ariel and the Coralogix team on their journey to reinvent the future of data observability.”

Jul
28
2021
--

MongoDB: Modifying Documents Using Aggregation Pipelines and Update Expressions

MongoDB Modifying Documents

MongoDB Modifying DocumentsUpdating documents on MongoDB prior to version 4.2 was quite limited. It was not possible to set values to a conditional expression, combine fields, or update a field based on the value of another field on the server-side. Tracing a parallel to the SQL update statements, for example, it wasn’t possible to do something like the following:

Update t1 set t1.f1 = t1.f2 where…

It wasn’t possible to use a conditional expression either, something easily achieved with SQL standards:

UPDATE t1 SET t1.f1 = CASE WHEN f2 = 1 THEN 1 WHEN f2 = 2 THEN 5 END WHERE…

If something similar to both examples above was required and the deployment was 3.4+, probably the usage of $addFields would be an alternative way to accomplish it. However, it would not touch the current document because the $out output destination could only be a different collection.

With older versions, the only way around was creating a cursor with aggregation pipelines and iterating it on the client side. Inside the loop, it was possible to update using the proper $set values. It was a difficult and tedious task, which would result in a full javascript code.

With MongoDB 4.2 and onwards, it is possible to use an aggregation pipeline to update MongoDB documents conditionally, allowing the update/creation of a field based on another field. This article presents some very common/basic operations which are easily achieved with SQL databases.

Field Expressions in Update Commands (v4.2+)

Updating a field with the value of some other field:

This is similar to the classic example of an SQL command: update t1 set t1.f1 = t1.f2 + t1.f3

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.update({_id:3},[{$set:{result:{$add: [ "$f2", "$f3" ] } }} ]);
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest2.find({_id:3});
{ "_id" : 3, "f1" : 30, "f2" : 300, "f3" : 3000, "result" : 3300 }

The key point is the “$” on the front of the field names being referenced (“f2” and “f3” in this example). These are the simplest type of field path expression, as they’re called in the MongoDB documentation. You’ve probably seen them in the aggregation pipeline before, but it was only in v4.2 that you could also use them in a normal update command.

Applying “CASE” conditions:

It is quite suitable now to determine conditions for a field value while updating a collection:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({_id:3});
{ "_id" : 3, "grade" : 8 }


replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.update(
  { _id: 3}, 
  [
    { $set: {result : { 
      $switch: {branches: [
        { case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, 
        { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, 
        { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } 
      ] } 
    } } } 
  ] 
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })


replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({ _id: 3});
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Adding new fields for a specific filtered doc:

Let’s say that you want to stamp a document with the updated date = NOW and add a simple comment field:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})
{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z") }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.update( 
  { _id: 3 }, 
  [ 
    { $set: { "comment": "Comment3", mod_date: "$$NOW"} } 
  ] 
)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest.find({_id:3})
{ "_id" : 3, "description" : "Field 3", "rating" : 2, "updt_date" : ISODate("2021-05-06T22:00:00Z"), "comment" : "Comment3", "mod_date" : ISODate("2021-07-05T18:48:44.710Z") }

Reaching several Docs with the same expression:

It is possible now to either use the command updateMany() and reach multiple docs with the same pipeline. 

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});
{ "_id" : 1, "grade" : 8}
{ "_id" : 2, "grade" : 5}
{ "_id" : 3, "grade" : 8}

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.updateMany({}, 
  [
    { $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } 
  ] 
)
{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 2 }

replset:PRIMARY> db.getSiblingDB("dbtest").colltest3.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5, "result" : "NOPE" }
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Or use the command option { multi: true } if you want to stick to using the original db.collection.update() command. Note that the default is false, which means that only the first occurrence will be updated.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ] )
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5 }
{ "_id" : 3, "grade" : 8 }

When specifying {multi:true} the expected outcome is finally achieved:

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.update({}, [{ $set: {result : { $switch: {branches: [{ case: { $gte: [ "$grade", 7 ] }, then: "PASSED" }, { case: { $lte: [ "$grade", 5 ] }, then: "NOPE" }, { case: { $eq: [ "$grade", 6 ] }, then: "UNDER ANALYSIS" } ] } } } } ],{multi:true} )
WriteResult({ "nMatched" : 3, "nUpserted" : 0, "nModified" : 2 })

replset:PRIMARY> db.getSiblingDB("dbtest").colltest4.find({});
{ "_id" : 1, "grade" : 8, "result" : "PASSED" }
{ "_id" : 2, "grade" : 5, "result" : "NOPE" }
{ "_id" : 3, "grade" : 8, "result" : "PASSED" }

Update by $merge Stage in the Aggregation Pipeline

Prior to version 4.2, addressing the result of an aggregate pipeline to a new collection was achieved by using $out. Starting on version 4.2 it is possible to use $merge which is way more flexible considering that while using $out, the entire collection will be replaced, and with merge, it is possible to replace a single document and a few or more things. You may want to refer to the comparison table described here:

https://docs.mongodb.com/manual/reference/operator/aggregation/merge/#std-label-merge-out-comparison

With MongoDB 4.4 and onwards, it is allowed to update a collection directly on the aggregate pipeline through the $merge stage. The magic happens after determining the output collection with the same name as the one being aggregated. The example below illustrates how to flag the max grade of the student Rafa in math class:

  • Original document
replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"})
{ "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math" }

  • The aggregation pipeline
replset:PRIMARY> db.getSiblingDB("dbtest").students2.aggregate( [{ $match : { "name": "Rafa","class":"math" } }, {$project:{maxGrade:{$max:"$grades"}}}, {$merge : { into: { db: "dbtest", coll: "students2" }, on: "_id",  whenMatched: "merge", whenNotMatched:"discard"} } ]);

  • Checking the result
replset:PRIMARY> db.getSiblingDB("dbtest").students2.find({"name": "Rafa","class":"math"})
{ "_id" : ObjectId("6100081e21f08fe0d19bda41"), "name" : "Rafa", "grades" : [ 4, 5, 6, 9 ], "class" : "math", "maxGrade" : 9 }

Note that the maxGrade field was merged into the doc, flagging that the max grade achieved by that student in math was 9.

Watch out: behind the scenes, the merge will trigger an update against the same collection. If that update changes the physical location of the document, the update might revisit the same document multiple times or even get into an infinite loop (Halloween Problem)

The other cool thing is using the $merge stage to work exactly how a SQL command INSERT AS SELECT works (and this is possible with MongoDB 4.2 and onwards). The example below demonstrates how to fill the collection colltest_reporting with the result of an aggregation hit against colltest5.

replset:PRIMARY> db.getSiblingDB("dbtest").colltest5.aggregate( [{ $match : { class: "A" } }, { $group: { _id: "$class",maxGrade: { $max: "$grade" } }},  {$merge : { into: { db: "dbtest", coll: "colltest_reporting" }, on: "_id",  whenMatched: "replace", whenNotMatched: "insert" } } ] );
replset:PRIMARY> db.getSiblingDB("dbtest").colltest_reporting.find()
{ "_id" : "A", "maxGrade" : 8 }

Conclusion

There are plenty of new possibilities which will make a developer’s life easier (especially the life of those developers who are coming from SQL databases) considering that the aggregation framework provides several operators and various different stages to play. Although, it is important to highlight that the complexity of a pipeline may incur performance degradation (that may be a topic for another blog post). For more information on updates with aggregation pipelines, please refer to the official documentation.

Jul
28
2021
--

Building and Testing Percona Distribution for MongoDB Operator

Testing Percona Distribution for MongoDB Operator

Testing Percona Distribution for MongoDB OperatorRecently I wanted to play with the latest and greatest Percona Distribution for MongoDB Operator which had a bug fix I was interested in. The bug fix was merged in the main branch of the git repository, but no version of the Operator that includes this fix was released yet. I started the Operator by cloning the main branch, but the bug was still reproducible. The reason was simple – the main branch had the last released version of the Operator in bundle.yaml, instead of the main branch build:

  spec:
      containers:
        - name: percona-server-mongodb-operator
          image: percona/percona-server-mongodb-operator:1.9.0

instead of 

   spec:
      containers:
        - name: percona-server-mongodb-operator
          image: perconalab/percona-server-mongodb-operator:main

Then I decided to dig deeper to see how hard it is to do a small change in the Operator code and test it.

This blog post is a beginner contributor guide where I tried to follow our CONTRIBUTING.md and Building and testing the Operator manual to build and test Percona Distribution for MongoDB Operator.

Requirements

The requirements section was the first blocker for me as I’m used to running Ubuntu, but examples that we have are for CentOS and MacOS. For all Ubuntu fans below are the instructions: 

echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
sudo apt-get update
sudo apt-get install -y google-cloud-sdk docker.io kubectl jq
sudo snap install helm --classic
sudo snap install yq --channel=v3/stable
curl -s -L https://github.com/openshift/origin/releases/download/v3.11.0/openshift-origin-client-tools-v3.11.0-0cbc58b-linux-64bit.tar.gz | sudo tar -C /usr/bin --strip-components 1 --wildcards -zxvpf - '*/oc'

I have also prepared a Pull Request to fix our docs and drafted a cloud-init file to simplify environment provisioning.

Build

Get the code from GitHub main branch:

git clone https://github.com/percona/percona-server-mongodb-operator

Change some code. Now it is time to build the Operator image and push it to the registry. DockerHub is a nice choice for beginners as it does not require any installation or configuration, but for keeping it local you might want to install your own registry. See Docker Registry, Harbor, Trow.

./e2e-tests/build

command builds the image and pushes it to the registry which you specify in IMAGE environment variable like this:

export IMAGE=bob/my_repository_for_test_images:K8SPSMDB-372-fix-feature-X

Fixing the Issues

For me the execution of the build command failed for multiple reasons:

1. Most probably you need to run it as root to get access to docker unix socket or just add the user to the docker group:

Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock

2. Once I ran it with root I got the following error:

"--squash" is only supported on a Docker daemon with experimental features enabled

It is quite easy to fix it by adding the experimental flag into /etc/docker/daemon.json file:

{
    "experimental": true
}

I have added it into the cloud-init file and will fix it in the same PR in the docs.

3. The third failure was on the last stage of pushing the image: 

denied: requested access to the resource is denied

Obviously, you should be authorized to push to the registry.

docker login

fixed it for me just fine.

Finally, the image is built and pushed to the registry:

The push refers to repository [docker.io/bob/my_repository_for_test_images]
0014bf17d462: Pushed
...
K8SPSMDB-372-fix-feature-X: digest: sha256:458066396fdd6ac358bcd78ed4d8f5279ff0295223f1d7fbec0e6d429c01fb16 size: 949

Test

./e2e-tests/run

command executes the tests in e2e-tests folder one-by-one, as you see there are multiple scenarios:

"$dir/init-deploy/run" || fail "init-deploy"
"$dir/limits/run" || fail "limits"
"$dir/scaling/run" || fail "scaling"
"$dir/monitoring/run" || fail "monitoring"
"$dir/monitoring-2-0/run" || fail "monitoring-2-0"
"$dir/liveness/run" || fail "liveness"
"$dir/one-pod/run" || fail "one-pod"
"$dir/service-per-pod/run" || fail "service-per-pod"
"$dir/arbiter/run" || fail "arbiter"
"$dir/demand-backup/run" || fail "demand-backup"
"$dir/demand-backup-sharded/run" || fail "demand-backup-sharded"
"$dir/scheduled-backup/run" || fail "scheduled-backup"
"$dir/security-context/run" || fail "security-context"
"$dir/storage/run" || fail "storage"
"$dir/self-healing/run" || fail "self-healing"
"$dir/self-healing-chaos/run" || fail "self-healing-chaos"
"$dir/operator-self-healing/run" || fail "operator-self-healing"
"$dir/operator-self-healing-chaos/run" || fail "operator-self-healing-chaos"
"$dir/smart-update/run" || fail "smart-update"
"$dir/version-service/run" || fail "version-service"
"$dir/users/run" || fail "users"
"$dir/rs-shard-migration/run" || fail "rs-shard-migration"
"$dir/data-sharded/run" || fail "data-sharded"
"$dir/upgrade/run" || fail "upgrade"
"$dir/upgrade-sharded/run" || fail "upgrade-sharded"
"$dir/upgrade-consistency/run" || fail "upgrade-consistency"
"$dir/pitr/run" || fail "pitr"
"$dir/pitr-sharded/run" || fail "pitr-sharded"

Obviously, it is possible to run the tests one by one.

It is required to have kubectl configured and pointing to the working Kubernetes cluster. If something is missing or not working the tests are going to tell you that.

The only issue I faced is the readability of the test results. The logging of the test execution is pretty verbose, so I would recommend redirecting the output to some file for further debugging purposes. 

`./e2e-tests/run >> /tmp/e2e-tests.out 2>&1

We in Percona rely on Jenkins to automatically test and verify the results for each Pull Request.

Conclusion

Contribution guides are written for developers by developers, so they often have some gaps or unclear instructions which sometimes require experience to resolve. Such minor issues might scare off potential contributors and as a result, the project does not get the Pull Request with an awesome implementation of the brightest idea. Percona embraces open source culture and values contributors by providing simple tools to develop and test the ideas.

Writing this blog post resulted in two Pull Requests:

  1. Use
    :main

     tag for container images in the main branch (link)

  2. Removing some gaps in the docs (link)

There is always room for improvement and a time to find a better way. Please let us know if you face any issues with contributing your ideas to Percona products. You can do that on the Community Forum or JIRA. Read more about contribution guidelines for Percona Distribution for MongoDB Operator in CONTRIBUTING.md.

Jul
28
2021
--

Atera raises $77M at a $500M valuation to help SMBs manage their remote networks like enterprises do

When it comes to software to help IT manage workers’ devices wherever they happen to be, enterprises have long been spoiled for choice — a situation that has come in especially handy in the last 18 months, when many offices globally have gone remote and people have logged into their systems from home. But the same can’t really be said for small and medium enterprises: As with so many other aspects of tech, they’ve long been overlooked when it comes to building modern IT management solutions tailored to their size and needs.

But there are signs of that changing. Today, a startup called Atera that has been building remote, and low-cost, predictive IT management solutions specifically for organizations with less than 1,000 employees, is announcing a funding round of $77 million — a sign of the demand in the market, and Atera’s own success in addressing it. The investment values Atera at $500 million, the company confirmed.

The Tel Aviv-based startup has amassed some 7,000 customers to date, managing millions of endpoints — computers and other devices connected to them — across some 90 countries, providing real-time diagnostics across the data points generated by those devices to predict problems with hardware, software and network, or with security issues.

Atera’s aim is to use the funding both to continue building out that customer footprint, and to expand its product — specifically adding more functionality to the AI that it currently uses (and for which Atera has been granted patents) to run predictive analytics, one of the technologies that today are part and parcel of solutions targeting larger enterprises but typically are absent from much of the software out there aimed at SMBs.

“We are in essence democratizing capabilities that exist for enterprises but not for the other half of the economy, SMBs,” said Gil Pekelman, Atera’s CEO, in an interview.

The funding is being led by General Atlantic, and it is notable for being only the second time that Atera has ever raised money — the first was earlier this year, a $25 million round from K1 Investment Management, which is also in this latest round. Before this year, Atera, which was founded in 2016, turned profitable in 2017 and then intentionally went out of profit in 2019 as it used cash from its balance sheet to grow. Through all of that, it was bootstrapped. (And it still has cash from that initial round earlier this year.)

As Pekelman — who co-founded the company with Oshri Moyal (CTO) — describes it, Atera’s approach to remote monitoring and management, as the space is typically called, starts first with software clients installed at the endpoints that connect into a network, which give IT managers the ability to monitor a network, regardless of the actual physical range, as if it’s located in a single office. Around that architecture, Atera essentially monitors and collects “data points” covering activity from those devices — currently taking in some 40,000 data points per second.

To be clear, these data points are not related to what a person is working on, or any content at all, but how the devices behave, and the diagnostics that Atera amasses and focuses on cover three main areas: hardware performance, networking and software performance and security. Through this, Atera’s system can predict when something might be about to go wrong with a machine, or why a network connection might not be working as it should, or if there is some suspicious behavior that might need a security-oriented response. It supplements its work in the third area with integrations with third-party security software — Bitdefender and Acronis among them — and by issuing updated security patches for devices on the network.

The whole system is built to be run in a self-service way. You buy Atera’s products online, and there are no salespeople involved — in fact most of its marketing today is done through Facebook and Google, Pekelman said, which is one area where it will continue to invest. This is one reason why it’s not really targeting larger enterprises (the others are the level of customization that would be needed; as well as more sophisticated service level agreements). But it is also the reason why Atera is so cheap: it costs $89 per month per IT technician, regardless of the number of endpoints that are being managed.

“Our constituencies are up to 1,000 employees, which is a world that was in essence quite neglected up to now,” Pekelman said. “The market we are targeting and that we care about are these smaller guys and they just don’t have tools like these today.” Since its model is $89 dollars per month per technician using the software, it means that a company with 500 people with four technicians is paying $356 per month to manage their networks, peanuts in the greater scheme of IT services, and one reason why Atera has caught on as more and more employees have gone remote and are looking like they will stay that way.

The fact that this model is thriving is also one of the reason and investors are interested.

“Atera has developed a compelling all-in-one platform that provides immense value for its customer base, and we are thrilled to be supporting the company in this important moment of its growth trajectory,” said Alex Crisses, MD, global head of New Investment Sourcing and co-head of Emerging Growth at General Atlantic, in a statement. “We are excited to work with a category-defining Israeli company, extending General Atlantic’s presence in the country’s cutting-edge technology sector and marking our fifth investment in the region. We look forward to partnering with Gil, Oshri and the Atera team to help the company realize its vision.”

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com