clickhouse is a column oriented OLAP database. i’ve started using it about half a year ago. i’m impressed. earlier i’ve read about it on percona’s blog but did not fully grasp how performant it was. i’ve tried it when i wrestling with MariaDB’s query planner on table with ~100M rows got me tired and each query took minutes to finish. clickhosue still feels like magic – the data size went down from tens of GB to few GB, query execution time – from minutes to low seconds.
some useful queries:
get size of each of tables’ columns:
SELECT name,type,formatReadableSize(data_compressed_bytes) AS size_compressed, formatReadableSize(data_uncompressed_bytes) AS size_raw, floor(data_uncompressed_bytes/data_compressed_bytes,2) as ratio, compression_codec from system.columns where table ='t';
get on-disk size of all tables [src]:
select concat(database, '.', table) as table, formatReadableSize(sum(bytes)) as size, sum(rows) as rows, max(modification_time) as latest_modification, sum(bytes) as bytes_size, any(engine) as engine, formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size from system.parts where active group by database, table order by bytes_size desc;
variation of above, including information about TTL, where set:
with details as ( select concat(database, '.', table) as table, formatReadableSize(sum(bytes)) as size, sum(rows) as rows, max(modification_time) as latest_modification, sum(bytes) as bytes_size, any(engine) as engine, formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size, formatReadableSize(sum(bytes_on_disk)) as bytes_on_disk, formatReadableSize(sum(data_uncompressed_bytes)) as data_uncompressed_bytes from system.parts where active and database!='system' group by database, table order by bytes_size desc) select details.*,extract(tables.engine_full,'.*(TTL.*)SETTINGS.*' ) as ttl_cleanup from details join system.tables on concat(tables.database,'.',tables.name)=details.table;
another take [ src ]
select parts.*, columns.compressed_size, columns.uncompressed_size, columns.ratio from ( select table, formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed_size, formatReadableSize(sum(data_compressed_bytes)) AS compressed_size, sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio from system.columns where database = currentDatabase() group by table ) columns right join ( select table, sum(rows) as rows, max(modification_time) as latest_modification, formatReadableSize(sum(bytes)) as disk_size, formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size, any(engine) as engine, sum(bytes) as bytes_size from system.parts where active and database = currentDatabase() group by database, table ) parts on columns.table = parts.table order by parts.bytes_size desc;
things to adjust after the installation, based on https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-settings-to-adjust/ , https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/
prevent never-ending growth of few stats/logs tables:
ALTER TABLE system.asynchronous_metric_log MODIFY TTL event_date + INTERVAL 2 week; ALTER TABLE system.query_thread_log MODIFY TTL event_date + INTERVAL 2 week; ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 1 week; ALTER TABLE system.metric_log MODIFY TTL event_date + INTERVAL 1 week; ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 1 week;
looks like above has to be done after each clickhouse-server upgrade. also note – disk space is not released immediately after cleanup, you have to wait ~ 8 minutes for that to happen – see here.