A quick script to ZSTD all your shredded tables

Mike’s recent post about compressing Snowplow tables works great for atomic.events, with clients seeing compression down to 30% of the original size or so. But what about all your shredded tables?

For now you have to manually convert the output from igluctl while we wait for our pull request to make it into a release, from then on this will be automatic. There’s also a pull request in to change the default compression for atomic.events too.

Run the following code from the command line from the root of your schema definitions and it’ll automatically convert everything relevant to ZSTD. This was written by one of our staff.

sed -s -r -i -e '/root_(id|tstamp)/s/ENCODE (BYTEDICT|DELTA|DELTA32K|LZO|MOSTLY8|MOSTLY16|MOSTLY32|RUNLENGTH|TEXT255|TEXT32K|ZSTD)/ENCODE RAW/' -e '/root_(id|tstamp)/!s/ENCODE (BYTEDICT|DELTA|DELTA32K|LZO|MOSTLY8|MOSTLY16|MOSTLY32|RAW|RUNLENGTH|TEXT255|TEXT32K)/ENCODE ZSTD/' -- sql/com.yourcompany/*.sql

Popular posts like this

Monitoring Snowplow bad rows using Lambda and Cloudwatch

Decoding Snowplow real-time bad rows (Thrift)

Make big data small again with Redshift ZSTD compression