May
28
2021
--

Once a buzzword, digital transformation is reshaping markets

The notion of digital transformation evolved from a buzzword joke to a critical and accelerating fact during the COVID-19 pandemic. The changes wrought by a global shift to remote work and schooling are myriad, but in the business realm they have yielded a change in corporate behavior and consumer expectation — changes that showed up in a bushel of earnings reports this week.

TechCrunch may tend to have a private-company focus, but we do keep tabs on public companies in the tech world as they often provide hints, notes and other pointers on how startups may be faring. In this case, however, we’re working in reverse; startups have told us for several quarters now that their markets are picking up momentum as customers shake up their buying behavior with a distinct advantage for companies helping customers move into the digital realm. And public company results are now confirming the startups’ perspective.

The accelerating digital transformation is real, and we have the data to support the point.

What follows is a digest of notes concerning the recent earnings results from Box, Sprout Social, Yext, Snowflake and Salesforce. We’ll approach each in micro to save time, but as always there’s more digging to be done if you have time. Let’s go!

Enterprise earnings go up

Kicking off with Yext, the company beat expectations in its most recent quarter. Today its shares are up 18%. And a call with the company’s CEO Howard Lerman underscored our general thesis regarding the digital transformation’s acceleration.

In brief, Yext’s evolution from a company that plugged corporate information into external search engines to building and selling search tech itself has been resonating in the market. Why? Lerman explained that consumers more and more expect digital service in response to their questions — “who wants to call a 1-800 number,” he asked rhetorically — which is forcing companies to rethink the way they handle customer inquiries.

In turn, those companies are looking to companies like Yext that offer technology to better answer customer queries in a digital format. It’s customer-friendly, and could save companies money as call centers are expensive. A change in behavior accelerated by the pandemic is forcing companies to adapt, driving their purchase of more digital technologies like this.

It’s proof that a transformation doesn’t have to be dramatic to have pretty strong impacts on how corporations buy and sell online.

May
28
2021
--

Talking Drupal #296 – Linux 4 Everyone

Today joining us is Jason Evangelho to talk about his journey to Linux and the community he is building.

www.talkingdrupal.com/296

Topics

  • Guest Host Fatima joining us
  • Guest Jason Evangelho
  • Fatima thanks to vaccines was able to visit a friend she had not seen in a while
  • John back from vacation
  • Nic is painting his basement
  • Stephen first week did not produce the show
  • Jason bought tickets to see the Foo Fighters live… in 2022
  • Now known as a “Linux guy”, what triggered your switch to Linux
  • What is Linux for Everyone all about
  • Proton
  • Linux for Everyone podcast, youtube, and community
  • Where do you stand on the scale of FOSS always and using proprietary software
  • Nic and Stephen Advocate for Linux on Talking Drupal
  • What holds the Linux market share back
  • EDU and Gov in Linux
  • Best benefit of using Linux
  • Top tips for switching to Linux

Resources

Ditching Windows: 2 Weeks With Ubuntu Linux On The Dell XPS 13 Linux for Everyone Proton Voice.com Make it Linux Lenovo Hardware Video Gnu Health

Guests

Jason Evangelho linux4everyone.com @Linux4Everyone

Hosts

Stephen Cross – www.stephencross.com @stephencross

Nic Laflin – www.nLighteneddevelopment.com @nicxvan

John Picozzi – www.oomphinc.com @johnpicozzi

Fatima Sarah Khalid – @sugaroverflow

May
27
2021
--

Box beats expectations, raises guidance as it looks for a comeback

Box executives have been dealing with activist investor Starboard Value over the last year, along with fighting through the pandemic like the rest of us. Today the company reported earnings for the first quarter of its fiscal 2022. Overall, it was a good quarter for the cloud content management company.

The firm reported revenue of $202.4 million up 10% compared to its year-ago result, numbers that beat Box projections of between $200 million to $201 million. Yahoo Finance reports the analyst consensus was $200.5 million, so the company also bested street expectations.

The company has faced strong headwinds the past year, in spite of a climate that has been generally favorable to cloud companies like Box. A report like this was badly needed by the company as it faces a board fight with Starboard over its direction and leadership.

Company co-founder and CEO Aaron Levie is hoping this report will mark the beginning of a positive trend. “I think you’ve got a better economic climate right now for IT investment. And then secondarily, I think the trends of hybrid work, and the sort of long term trends of digital transformation are very much supportive of our strategy,” he told TechCrunch in a post-earnings interview.

While Box acquired e-signature startup SignRequest in February, it won’t actually be incorporating that functionality into the platform until this summer. Levie said that what’s been driving the modest revenue growth is Box Shield, the company’s content security product and the platform tools, which enable customers to customize workflows and build applications on top of Box.

The company is also seeing success with large accounts. Levie says that he saw the number of customers spending more than $100,000 with it grow by nearly 50% compared to the year-ago quarter. One of Box’s growth strategies has been to expand the platform and then upsell additional platform services over time, and those numbers suggest that the effort is working.

While Levie was keeping his M&A cards close to the vest, he did say if the right opportunity came along to fuel additional growth through acquisition, he would definitely give strong consideration to further inorganic growth. “We’re going to continue to be very thoughtful on M&A. So we will only do M&A that we think is attractive in terms of price and the ability to accelerate our roadmap, or the ability to get into a part of a market that we’re not currently in,” Levie said.

A closer look at the financials

Box managed modest growth acceleration for the quarter, existing only if we consider the company’s results on a sequential basis. In simpler terms, Box’s newly reported 10% growth in the first quarter of its fiscal 2022 was better than the 8% growth it earned during the fourth quarter of its fiscal 2021, but worse than the 13% growth it managed in its year-ago Q1.

With Box, however, instead of judging it by normal rules, we’re hunting in its numbers each quarter for signs of promised acceleration. By that standard, Box met its own goals.

How did investors react? Shares of the company were mixed after-hours, including a sharp dip and recovery in the value of its equity. The street appears to be confused by the results, weighing the report and working out whether its moderately accelerating growth is sufficiently enticing to warrant holding onto its equity, or more perversely if its growth is not expansive enough to fend off external parties hunting for more dramatic changes at the firm.

Sticking to a high-level view of Box’s results, apart from its growth numbers Box has done a good job shaking fluff out of its operations. The company’s operating margins (GAAP and not) both improved, and cash generation also picked up.

Perhap most importantly, Box raised its guidance from “the range of $840 million to $848 million” to “$845 to $853 million.” Is that a lot? No. It’s +$5 million to both the lower and upper-bounds of its targets. But if you squint, the company’s Q4 to Q1 revenue acceleration, and upgraded guidance could be an early indicator of a return to form.

Levie admitted that 2020 was a tough year for Box. “Obviously, last year was a complicated year in terms of the macro environment, the pandemic, just lots of different variables to deal with…” he said. But the CEO continues to think that his organization is set up for future growth.

Will Box manage to perform well enough to keep activist shareholders content? Levie thinks if he can string together more quarters like this one, he can keep Starboard at bay. “I think when you look at the next three quarters, the ability to guide up on revenue, the ability to guide up on profitability. We think it’s a very very strong earnings report and we think it shows a lot of the momentum in the business that we have right now.”

May
27
2021
--

New Features in PostgreSQL 14: Bulk Inserts for Foreign Data Wrappers

PostgreSQL 14 Bulk Inserts for Foreign Data Wrappers

Foreign Data Wrapper based on SQL-MED is one the coolest features of PostgreSQL. The feature set of foreign data wrapper is expanding since version 9.1. We know that the PostgreSQL 14 beta is out and GA will be available shortly, therefore it is helpful to study the upcoming features of PostgreSQL 14. There are a lot of them, along with some improvements in foreign data wrapper. A new performance feature, “Bulk Insert“, is added in PostgreSQL 14. The API is extended and allows bulk insert of the data into the foreign table, therefore, using that API, any foreign data wrapper now can implement Bulk Insert. It is definitely more efficient than inserting individual rows.

The API contains two new functions, which can be used to implement the bulk insert.

There is no need to explain these functions here because it is useful for people interested in having that functionality into their foreign data wrapper like mysql_fdw, mongo_fdw, and oracle_fdw. If someone is interested to see that, they can see it in the PostgreSQL documentation. But the good news is, postgres_fdw already implement that and have that in PostgreSQL 14.

There is a new server option is added which is batch_size, and you can specify that when creating the foreign server or creating a foreign table.

  • Create a postgres_fdw extension
CREATE EXTENSION postgres_fdw;

  • Create a foreign server without batch_size
CREATE SERVER postgres_svr 
       FOREIGN DATA WRAPPER postgres_fdw 
       OPTIONS (host '127.0.0.1');

CREATE USER MAPPING FOR vagrant
       SERVER postgres_svr
       OPTIONS (user 'postgres', password 'pass');

CREATE FOREIGN TABLE foo_remote (a INTEGER,
                                 b CHAR,
                                 c TEXT,
                                 d VARCHAR(255))
       SERVER postgres_svr
       OPTIONS(table_name 'foo_local');

EXPLAIN (VERBOSE, COSTS OFF) insert into foo_remote values (generate_series(1, 1), 'c', 'text', 'varchar');
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Insert on public.foo_remote
   Remote SQL: INSERT INTO public.foo_local(a, b, c, d) VALUES ($1, $2, $3, $4)
   Batch Size: 1
   ->  ProjectSet
         Output: generate_series(1, 1), 'c'::character(1), 'text'::text, 'varchar'::character varying(255)
         ->  Result
(6 rows)

  • Execution time with batch_size not specified
EXPLAIN ANALYZE 
        INSERT INTO foo_remote
        VALUES (generate_series(1, 100000000),
                'c',
                'text',
                'varchar');
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
Insert on foo_remote  (cost=0.00..500000.02 rows=0 width=0) (actual time=4591443.250..4591443.250 rows=0 loops=1)
   ->  ProjectSet  (cost=0.00..500000.02 rows=100000000 width=560) (actual time=0.006..31749.132 rows=100000000 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 4.988 ms
Execution Time: 4591447.101 ms -- timing is important
(5 rows)

  • Create a foreign table with batch_size = 10, in case no batch_size is specified with server creation
CREATE FOREIGN TABLE foo_remote (a INTEGER,
                                 b CHAR,
                                 c TEXT,
                                 d VARCHAR(255))
        SERVER postgres_svr OPTIONS(table_name 'foo_local', batch_size '10');

  • Create a foreign server with batch_size = 10, now every table of that server will use the batch_size 10
CREATE SERVER postgres_svr_bulk
       FOREIGN DATA WRAPPER postgres_fdw
       OPTIONS (host '127.0.0.1', batch_size = '10'); -- new option batch_size

CREATE USER MAPPING FOR vagrant
       SERVER postgres_svr
       OPTIONS (user 'postgres', password 'pass');

CREATE FOREIGN TABLE foo_remote (a INTEGER,
                                 b CHAR,
                                 c TEXT,
                                 d VARCHAR(255))
        SERVER postgres_svr OPTIONS(table_name 'foo_local');

EXPLAIN (VERBOSE, COSTS OFF) insert into foo_remote_bulk values (generate_series(1, 1), 'c', 'text', 'varchar');

                                                QUERY PLAN                                                 

-----------------------------------------------------------------------------------------------------------
Insert on public.foo_remote_bulk
   Remote SQL: INSERT INTO public.foo_local_bulk(a, b, c, d) VALUES ($1, $2, $3, $4)
   Batch Size: 10
   ->  ProjectSet
         Output: generate_series(1, 1), 'c'::character(1), 'text'::text, 'varchar'::character varying(255)
         ->  Result
(6 rows)

  • Execution time with batch_size = 10:
EXPLAIN ANALYZE
        INSERT INTO foo_remote_bulk
        VALUES (generate_series(1, 100000000),
                'c',
                'text',
                'varchar');
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Insert on foo_remote_bulk  (cost=0.00..500000.02 rows=0 width=0) (actual time=822224.678..822224.678 rows=0 loops=1)
   ->  ProjectSet  (cost=0.00..500000.02 rows=100000000 width=560) (actual time=0.005..10543.845 rows=100000000 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
 Planning Time: 0.250 ms
 Execution Time: 822239.178 ms -- timing is important
(5 rows)

Conclusion

PostgreSQL is expanding the feature list of foreign data wrappers, and Bulk Insert is another good addition. As this feature is added to the core, I hope all other foreign data wrappers will implement it as well.

May
27
2021
--

Breinify announces $11M seed to bring data science to the marketing team

Breinify is a startup working to apply data science to personalization, and do it in a way that makes it accessible to nontechnical marketing employees to build more meaningful customer experiences. Today the company announced a funding round totaling $11 million.

The investment was led by Gutbrain Ventures and PBJ Capital with participation from Streamlined Ventures, CXO Fund, Amino Capital, Startup Capital Ventures and Sterling Road.

Breinify co-founder and CEO Diane Keng says that she and co-founder and CTO Philipp Meisen started the company to bring predictive personalization based on data science to marketers with the goal of helping them improve a customer’s experience by personalizing messages tailored to individual tastes.

“We’re big believers that the world, especially consumer brands, really need strong predictive personalization. But when you think about consumer big brands or the retailers that you buy from, most of them aren’t data scientists, nor do they really know how to activate [machine learning] at scale,” Keng told TechCrunch.

She says that she wanted to make this type of technology more accessible by hiding the complexity behind the algorithms powering the platform. “Instead of telling you how powerful the algorithms are, we show you [what that means for the] consumer experience, and in the end what that means for both the consumer and you as a marketer individually,” she said.

That involves the kind of customizations you might expect around website messaging, emails, texts or whatever channel a marketer might be using to communicate with the buyer. “So the AI decides you should be shown these products, this offer, this specific promotion at this time, [whether it’s] the web, email or SMS. So you’re not getting the same content across different channels, and we do all that automatically for you, and that’s [driven by the algorithms],” she said.

Breinify launched in 2016 and participated in the TechCrunch Disrupt Startup Battlefield competition in San Francisco that year. She said it was early days for the company, but it helped them focus their approach. “I think it gave us a huge stage presence. It gave us a chance to test out the idea just to see where the market was in regards to needing a solution like this. We definitely learned a lot. I think it showed us that people were interested in personalization,” she said. And although the company didn’t win the competition, it ended up walking away with a funding deal.

Today the startup is growing fast and has 24 employees, up from 10 last year. Keng, who is an Asian woman, places a high premium on diversity.

“We partner with about four different kinds of diversity groups right now to source candidates, but at the end of the day, I think if you are someone that’s eager to learn, and you might not have all the skills yet, and you’re [part of an under-represented] group we encourage everyone to apply as much as possible. We put a lot of work into trying to create a really well-rounded group,” she said.

May
27
2021
--

How InnoDB Handles TEXT/BLOB Columns

InnoDB Handles TEXT BLOB Columns

InnoDB Handles TEXT BLOB ColumnsRecently we had a debate in the consulting team about how InnoDB handles TEXT/BLOB columns. More specifically, the argument was around the Barracuda file format with dynamic rows.

In the InnoDB official documentation, you can find this extract:

When a table is created with ROW_FORMAT=DYNAMIC, InnoDB can store long variable-length column values (for VARCHAR, VARBINARY, and BLOB and TEXT types) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page.

Whether columns are stored off-page depends on the page size and the total size of the row. When a row is too long, the longest columns are chosen for off-page storage until the clustered index record fits on the B-tree page. TEXT and BLOB columns that are less than or equal to 40 bytes are stored in line.

The first paragraph indicates InnoDB can store long columns fully off-page, in an overflow page. A little further down in the documentation, the behavior of short values, the ones with lengths less than 40 bytes, is described. These short values are stored in the row.

Our argument was about what happened for lengths above 40 bytes. Are these values stored in the rows or in overflow pages. I don’t really need a reason to start digging on a topic, imagine when I have one. With simple common tools, let’s experiment and find out.

Experimentation with TEXT/BLOB

For our experiment, we need a MySQL database server, a table with a TEXT column, and the hexdump utility. I spun up a simple LXC instance in my lab with Percona-server 8.0 and created the following table:

CREATE TABLE `testText` (
`id` int(11) unsigned NOT NULL,
`before` char(6) NOT NULL DEFAULT 'before',
`data` text NOT NULL,
`after` char(5) NOT NULL DEFAULT 'after',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The before and after columns are there to help locate the TEXT column in the datafile page. We can insert a few rows in the table, a TEXT value with size being a multiplier of 16 is convenient with hexdump, successive identical lines are replaced by ‘*’.

mysql> insert into testText (id,data) values (1,repeat('0123456789abcdef',1));
Query OK, 1 row affected (0.01 sec)
mysql> insert into testText (id,data) values (2,repeat('0123456789abcdef',2));
Query OK, 1 row affected (0.00 sec)
mysql> insert into testText (id,data) values (3,repeat('0123456789abcdef',3));
Query OK, 1 row affected (0.01 sec)

Then we use the hexdump utility at the shell level, the first leaf page starts at offset 0xc000:

# hexdump -C /var/lib/mysql/test/testText.ibd
...
0000c060  02 00 1b 69 6e 66 69 6d  75 6d 00 04 00 0b 00 00  |...infimum......|
0000c070  73 75 70 72 65 6d 75 6d  10 00 00 10 00 32 00 00  |supremum.....2..|
0000c080  00 01 00 00 00 05 74 3f  d9 00 00 01 5d 01 10 62  |......t?....]..b|
0000c090  65 66 6f 72 65 30 31 32  33 34 35 36 37 38 39 61  |efore0123456789a|
0000c0a0  62 63 64 65 66 61 66 74  65 72 20 00 00 18 00 42  |bcdefafter ....B|
0000c0b0  00 00 00 02 00 00 00 05  74 40 da 00 00 01 5e 01  |........t@....^.|
0000c0c0  10 62 65 66 6f 72 65 30  31 32 33 34 35 36 37 38  |.before012345678|
0000c0d0  39 61 62 63 64 65 66 30  31 32 33 34 35 36 37 38  |9abcdef012345678|
0000c0e0  39 61 62 63 64 65 66 61  66 74 65 72 30 00 00 20  |9abcdefafter0.. |
0000c0f0  ff 7e 00 00 00 03 00 00  00 05 74 45 dd 00 00 01  |.~........tE....|
0000c100  62 01 10 62 65 66 6f 72  65 30 31 32 33 34 35 36  |b..before0123456|
0000c110  37 38 39 61 62 63 64 65  66 30 31 32 33 34 35 36  |789abcdef0123456|
*
0000c130  37 38 39 61 62 63 64 65  66 61 66 74 65 72 00 00  |789abcdefafter..|

Clearly, the values are stored inside the row, even the third one with a length of 48.

Cutoff to an Overflow Page

If we continue to increase the length, the behavior stays the same up to a length of 8080 bytes (505 repetitions). If we add 16 bytes to the length, the row becomes larger than half of the available space on the page. At this point, the TEXT value is moved to an overflow page and replaced by a 20 bytes pointer in the row itself. The pointer in the rows looks like this:

0000c060 02 00 1c 69 6e 66 69 6d 75 6d 00 02 00 0b 00 00 |...infimum......|
0000c070 73 75 70 72 65 6d 75 6d 14 c0 00 00 10 ff f1 00 |supremum........|
0000c080 00 00 01 00 00 00 05 74 80 a4 00 00 01 fe 01 10 |.......t........|
0000c090 62 65 66 6f 72 65 00 00 01 02 00 00 00 04 00 00 |before..........|
0000c0a0 00 26 00 00 00 00 00 00 1f a0 61 66 74 65 72 00 |.&........after.|
0000c0b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

The 20 bytes pointer is: 00 00 01 02 00 00 00 04 00 00 00 26 00 00 00 00 00 00 1f a0
Space ID: 00 00 01 02
First overflow page: 00 00 00 04 (4 x 0x4000 = 0x10000)
Offset in overflow page: 00 00 00 26
Version number: 00 00 00 00
Total length of the TEXT value: 00 00 1f a0 (0x1fa0 = 8096 = 16*506)

The overflow page:

00010000 c1 06 3d 24 00 00 00 04 00 00 00 00 00 00 00 00 |..=$............|
00010010 00 00 00 00 74 dd 8e 3f 00 0a 00 00 00 00 00 00 |....t..?........|
00010020 00 00 00 00 01 02 00 00 1f a0 ff ff ff ff 30 31 |..............01|
00010030 32 33 34 35 36 37 38 39 61 62 63 64 65 66 30 31 |23456789abcdef01|
*
00011fc0 32 33 34 35 36 37 38 39 61 62 63 64 65 66 00 00 |23456789abcdef..|
00011fd0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

The value header starting @ 0x10026: 00 00 1f a0 ff ff ff ff
Length in that overflow page: 00 00 1f a0 (0x1fa0 = 8096 = 16*506)
Next overflow page: ff ff ff ff (this means it is the last)

From the above, one can make a few interesting observations:

  • The 20 bytes pointer includes a space_id value so in theory, the overflow page could be in another tablespace (ibd file).
  • The total length is using 4 bytes even though this is a TEXT and not a LONGTEXT column. 2 bytes would have been sufficient.
  • The length of an overflow page chunk is also using 4 bytes for length even though the largest possible InnoDB page size is 64KB.

Performance Impacts

TEXT/BLOB columns are the source of a number of performance-related impacts. Let’s review the most obvious ones.

Storage

The first performance impact related to storage inefficiency. TEXT/BLOB values are stored in chunks of 16KB if the default InnoDB page size is used. That means, on average, about 8KB per value is lost when overflow pages are used. This leads to larger data files and less efficient caching.

Reads

The presence of TEXT/BLOB columns can significantly increase the number of read IOPs required for queries. For example, let’s consider the following simple query:

SELECT * from myTable limit 10;

If we ignore caching, without TEXT/BLOB columns, the above query would require only one read IOP per level in the primary key btree of myTable. For a small table, this could be one or two read IOPs. Now, if each row has a 1MB TEXT/BLOB column, the same simple query would require in excess of 640 read IOPs since each TEXT/BLOB value is a chain of 64 pages of 16KB.

Writes

For this section, let’s assume a worst-case scenario with row-based replication enabled and a full row image. Now, if we insert a row with a 1MB value as a longText column in a Percona server 8.0 instance, we have:

+----------------------------------+---------------------------+
| FILE_NAME                        | SUM_NUMBER_OF_BYTES_WRITE |
+----------------------------------+---------------------------+
| /var/lib/mysql/#ib_16384_0.dblwr |                   1572864 |
| /var/lib/mysql/ib_logfile0       |                   1174528 |
| /var/lib/mysql/test/testText.ibd |                   1130496 |
| /var/lib/mysql/binlog.000003     |                   1048879 |
| /tmp/MLfd=41                     |                   1048783 |
| /var/lib/mysql/undo_001          |                    278528 |
| /var/lib/mysql/undo_002          |                    131072 |
| /var/lib/mysql/ibdata1           |                     16384 |
| /var/lib/mysql/mysql.ibd         |                     16384 |
+----------------------------------+---------------------------+

For a total of around 6.4 MB. This is not surprising, there are two log files and the data is also written twice because of the doublewrite buffer. The temporary file is used for the disk binlog cache and unless the transaction is very long, it won’t actually be written to storage.

Anyway, this is just to set the stage for what happens after an update. If I just change the last letter of the longText value, the amount of data written raises to approximately 10 MB.

+----------------------------------+---------------------------+
| FILE_NAME                        | SUM_NUMBER_OF_BYTES_WRITE |
+----------------------------------+---------------------------+
| /var/lib/mysql/#ib_16384_0.dblwr | 2260992                   |
| /var/lib/mysql/test/testText.ibd | 2211840                   |
| /var/lib/mysql/binlog.000003     | 2097475                   |
| /tmp/MLfd=41                     | 2097379                   |
| /var/lib/mysql/ib_logfile0       | 1129472                   |
| /var/lib/mysql/undo_001          | 32768                     |
| /var/lib/mysql/ibdata1           | 16384                     |
+----------------------------------+---------------------------+

The longText value is not modified in place, it is copied to a new set of overflow pages. The new and old overflow pages then need to be flushed to storage. Also, since we use the worst-case scenario of the full row image, the binary log entry has the old and new value stored but the InnoDB log files only have the new version.

I hope this illustrates why storing mutable data in a TEXT/BLOB column is a bad idea.

JSON

Although columns stored using the MySQL JSON data type are stored as a TEXT/BLOB column, MySQL 8.0 added some logic to allow in-place updates. The impact of an update to a large JSON column in 8.0 is not as severe as in 5.7.

How to Best Deal with TEXT/BLOB Columns?

Data Compression

Data compression is a very compelling option for TEXT/BLOB values By definition, those values are normally large and thus, usually compress well. This is nothing new and has been reported in previous previously. PostgreSQL, for one, compresses by default its TEXT/BLOB columns (called TOAST).

The best option for compression is always the application when it is possible of course. As we have just seen, this reduces the write load and spares the database from the CPU burden of compression.

Another option with MySQL is the InnoDB Barracuda table compression. When used, TEXT/BLOB values are compressed before being written to the overflow pages. This is much more efficient than compressing the pages one at a time.

Finally, if you are using Percona Server or MariaDB, you have access to transparent column compression. This is the second-best option, performance-wise, behind compression by the application is not possible.

Avoid Returning TEXT/BLOB Columns

When there are large TEXT/BLOB columns in a table, the cost of accessing those columns is high. Because of this, it is important to select only the columns that are needed and avoid the default use of “select * from”. Unfortunately, many ORM frameworks instantiate objects by grabbing all the columns. When using such frameworks, you should consider storing the TEXT/BLOB in a separate table with a loose 1:1 relationship with the original table. That way, the ORM is able to instantiate the object without necessarily forcing a read of the TEXT/BLOB columns.

Conclusion

I hope this article improved your understanding of TEXT/BLOB values in InnoDB. When used correctly, TEXT/BLOB columns can be useful and have a limited impact on the performance of the database.

May
27
2021
--

mmhmm, the video conferencing software, kicks off summer with a bunch of new features

mmhmm, the communications platform developed by Phil Libin and the All Turtles team, is getting a variety of new features. According to Libin, there are parts of video communication today that can not only match what we get in the real world, but exceed it.

That’s how this next iteration of mmhmm is meant to deliver.

The new headline feature is mmhmm Chunky, which allows the presenter to break up their script and presentation into “chunks.” Think of the presenter the same way you think of slides in a deck. Each one gets the full edit treatment and final polish. With Chunky, mmhmm users can break up their presentation into chunks to perfect each individual bit of information.

A presenter can switch between live and pre-recorded chunks in a presentation. So you can imagine a salesman making a pitch and switching over to his explanation of the pricing as a pre-recorded piece of his pitch, or a teacher who has a pre-recorded chunk on a particular topic can throw to that mid-class.

But mmhmm didn’t just think about the creation side, but also the consumption side. Folks in the audience can jump around between chunks and slides to catch up, or even view in a sped-up mode to consume more quickly. Presenters can see where folks in the audience are as they present or later on.

Libin sees this feature as a way to supercharge time.

“At mmhmm, we stopped doing synchronous updates with our fully distributed team,” said Libin. “We don’t have meetings anymore where people take turns updating each other because it’s not very efficient. Now the team just sends around their quick presentations, and I can watch it in double speed because people can listen faster than people can talk. But we don’t have to do it at the same time. Then, when we actually talk synchronously, it’s reserved for that live back-and-forth about the important stuff.”

mmhmm is also announcing that it has developed its own video player, allowing folks to stream their mmhmm presentations to whichever website they’d like. As per usual, mmhmm will still work with Zoom, Google Meet, etc.

The new features list also includes an updated version of Copilot. For folks who remember, Copilot allowed one person to present and another person to “drive,” or art direct, the presentation from the background. Copilot 2.0 lets two people essentially video chat side by side, in whatever environment they’d like.

Libin showed me a presentation/conversation he did with a friend where they were both framed up in Libin’s house. He clarified that this feature works best with one-on-one conversations, or, one-on-one conversations in front of a large audience, such as a fireside chat.

Alongside mmhmm Chunky, streaming and Copilot 2.0, the platform is also doing a bit of spring cleaning with regards to organization. Users will have a Presentation Library where they can save and organize their best takes, and organizations can also use “Loaf” to store all the best videos and presentations company-wide for consumption later. The team also revamped Presets to make it easier to apply a preset to a bunch of slides at once or switch between presets more easily.

A couple other notes: mmhmm is working to bring the app to both iOS and Android very soon, and launch out of beta on Windows.

Libin explained that not every single feature described here will launch today, but rather you’ll see features trickle out each week as we head into summer. He’ll be giving a keynote on the new features here at 10 a.m. PT/1 p.m. ET.

May
27
2021
--

Australian startup Pyn raises $8M seed to bring targeted communication in-house

Most marketers today know how to send targeted communications to customers, and there are many tools to help, but when it comes to sending personalized in-house messages, there aren’t nearly as many options. Pyn, an early-stage startup based in Australia, wants to change that, and today it announced an $8 million seed round.

Andreessen Horowitz led the investment with help from Accel and Ryan Sanders (the co-founder of BambooHR) and Scott Farquhar (co-founder and co-CEO at Atlassian).

That last one isn’t a coincidence, as Pyn co-founder and CEO Joris Luijke used to run HR at the company and later at Squarespace and other companies, and he saw a common problem trying to provide more targeted messages when communicating internally.

“I’ve been trying to do this my entire professional life, trying to personalize the communication that we’re sending to our people. So that’s what Pyn does. In a nutshell, we radically personalize employee communications,” Luijke explained. His co-founder Jon Williams was previously a co-founder at Culture Amp, an employee experience management platform he helped launch in 2011 (and which raised more than $150 million), so the two of them have been immersed in this idea.

They bring personalization to Pyn by tracking information in existing systems that companies already use, such as Workday, BambooHR, Salesforce or Zendesk, and they can use this data much in the same way a marketer uses various types of information to send more personalized messages to customers.

That means you can cut down on the company-wide emails that might not be relevant to everyone and send messages that should matter more to the people receiving them. And as with a marketing communications tool, you can track how many people have opened the emails and how successful you were in hitting the mark.

David Ulevitch, general partner at a16z and lead investor in this deal, points out that Pyn also provides a library of customizable communications materials to help build culture and set policy across an organization. “It also treats employee communication channels as the rails upon which to orchestrate management practices across an organization [by delivering] a library of management playbooks,” Ulevitch wrote in a blog post announcing the investment.

The startup, which launched in 2019, currently has 10 employees, with teams working in Australia and the Bay Area in California. Williams says that already half the team is female and the plan is to continue putting diversity front and center as they build the company.

“Joris has mentioned ‘radical personalization’ as this specific mantra that we have, and I think if you translate that into an organization, that is all about inclusion in reality, and if we want to be able to cater for all the specific needs of people, we need to understand them. So [diversity is essential] to us,” Williams said.

While the company isn’t ready to discuss specifics in terms of customer numbers, it cites Shopify, Rubrik and Carta as early customers, and the founders say there was a lot of interest when the pandemic hit last year and the need for more frequent and meaningful types of communication became even more paramount.

 

May
27
2021
--

Webinar June 29: Unlocking the Mystery of MongoDB Shard Key Selection

Choose the Right MongoDB Shard Key

Choose the Right MongoDB Shard KeyDo You Know How to Choose the Right MongoDB Shard Key for Your Business?

In our upcoming panel, Percona MongoDB experts Mike Grayson, Kimberly Wilkins, and Vinicius Grippa will discuss the complex issue of MongoDB shard key selection and offer advice on the measures to take if things go wrong.

Selecting the right shard key is one of the most important MongoDB design decisions you make, as it impacts performance and data management. Choosing the wrong shard key can have a disastrous effect on both.

MongoDB 5.0 is due to be released this summer, and it is likely to include another big change around sharding. Following last year’s 4.4 release that included refinable shard keys, we expect to see a new feature that allows for fully changeable shard keys for the first time.

But, even with refinable and changeable options, shard key selection will continue to be a crucial MongoDB task.

Join us for the panel, where Mike, Kimberly, and Vinicius will highlight some of the perils and pitfalls to avoid, as well as offering shard key best practices such as:

* Factors to consider when selecting your shard key

* Hidden “gotcha’s” around shard selection and architecture

* Examples of the worst shard keys our experts have seen

* What happens when you have a busted shard key, and how you can mitigate the impact

* What’s happening “under MongoDB’s hood” with all the changes?

* The future of shard keys for MongoDB

Please join Percona Technical Experts Mike Grayson, Kimberly Wilkins, and Vinicius Grippa on June 29, 2021, at 1 pm EDT for their webinar Unlocking the Mystery of MongoDB Shard Key Selection.

Register for Webinar

If you can’t attend, sign up anyway, and we’ll send you the slides and recording afterward.

May
26
2021
--

Databricks introduces Delta Sharing, an open-source tool for sharing data

Databricks launched its fifth open-source project today, a new tool called Delta Sharing designed to be a vendor-neutral way to share data with any cloud infrastructure or SaaS product, so long as you have the appropriate connector. It’s part of the broader Databricks open-source Delta Lake project.

As CEO Ali Ghodsi points out, data is exploding, and moving data from Point A to Point B is an increasingly difficult problem to solve with proprietary tooling. “The number one barrier for organizations to succeed with data is sharing data, sharing it between different views, sharing it across organizations — that’s the number one issue we’ve seen in organizations,” Ghodsi explained.

Delta Sharing is an open-source protocol designed to solve that problem. “This is the industry’s first-ever open protocol, an open standard for sharing a data set securely. […] They can standardize on Databricks or something else. For instance, they might have standardized on using AWS Data Exchange, Power BI or Tableau — and they can then access that data securely.”

The tool is designed to work with multiple cloud infrastructure and SaaS services and out of the gate there are multiple partners involved, including the Big Three cloud infrastructure vendors Amazon, Microsoft and Google, as well as data visualization and management vendors like Qlik, Starburst, Collibra and Alation and data providers like Nasdaq, S&P and Foursquare

Ghodsi said the key to making this work is the open nature of the project. By doing that and donating it to The Linux Foundation, he is trying to ensure that it can work across different environments. Another big aspect of this is the partnerships and the companies involved. When you can get big-name companies involved in a project like this, it’s more likely to succeed because it works across this broad set of popular services. In fact, there are a number of connectors available today, but Databricks expects that number to increase over time as contributors build more connectors to other services.

Databricks operates on a consumption pricing model much like Snowflake, meaning the more data you move through its software, the more money it’s going to make, but the Delta Sharing tool means you can share with anyone, not just another Databricks customer. Ghodsi says that the open-source nature of Delta Sharing means his company can still win, while giving customers more flexibility to move data between services.

The infrastructure vendors also love this model because the cloud data lake tools move massive amounts of data through their services and they make money too, which probably explains why they are all on board with this.

One of the big fears of modern cloud customers is being tied to a single vendor as they often were in the 1990s and early 2000s when most companies bought a stack of services from a single vendor like Microsoft, IBM or Oracle. On one hand, you had the veritable single throat to choke, but you were beholden to the vendor because the cost of moving to another one was prohibitively high. Companies don’t want to be locked in like that again and open source tooling is one way to prevent that.

Databricks was founded in 2013 and has raised almost $2 billion. The latest round was in February for $1 billion at a $28 billion valuation, an astonishing number for a private company. Snowflake, a primary competitor, went public last September. As of today, it has a market cap of over $66 billion.

Powered by WordPress | Theme: Aeros 2.0 by TheBuckmaker.com