clickhouse

clickhouse is a column oriented OLAP database. i’ve started using it about half a year ago. i’m impressed. earlier i’ve read about it on percona’s blog but did not fully grasp how performant it was. i’ve tried it when i wrestling with MariaDB’s query planner on table with ~100M rows got me tired and each query took minutes to finish. clickhosue still feels like magic – the data size went down from tens of GB to few GB, query execution time – from minutes to low seconds.

some useful queries:

get size of each of tables’ columns:

SELECT name,type,formatReadableSize(data_compressed_bytes) AS size_compressed, formatReadableSize(data_uncompressed_bytes) AS size_raw, floor(data_uncompressed_bytes/data_compressed_bytes,2) as ratio, compression_codec from system.columns where table ='t';

get on-disk size of all tables [src]:

select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size
from system.parts
where active
group by database, table
order by bytes_size desc;

variation of above, including information about TTL, where set:

with details as ( select concat(database, '.', table)                         as table,
       formatReadableSize(sum(bytes))                       as size,
       sum(rows)                                            as rows,
       max(modification_time)                               as latest_modification,
       sum(bytes)                                           as bytes_size,
       any(engine)                                          as engine,
       formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size, formatReadableSize(sum(bytes_on_disk)) as bytes_on_disk, formatReadableSize(sum(data_uncompressed_bytes)) as data_uncompressed_bytes
from system.parts
where active and database!='system'
group by database, table
order by bytes_size desc) select details.*,extract(tables.engine_full,'.*(TTL.*)SETTINGS.*' ) as ttl_cleanup from details join system.tables on concat(tables.database,'.',tables.name)=details.table;

another take [ src ]

select parts.*,
       columns.compressed_size,
       columns.uncompressed_size,
       columns.ratio
from (
         select table,
                formatReadableSize(sum(data_uncompressed_bytes))          AS uncompressed_size,
                formatReadableSize(sum(data_compressed_bytes))            AS compressed_size,
                sum(data_compressed_bytes) / sum(data_uncompressed_bytes) AS ratio
         from system.columns
         where database = currentDatabase()
         group by table
         ) columns
         right join (
    select table,
           sum(rows)                                            as rows,
           max(modification_time)                               as latest_modification,
           formatReadableSize(sum(bytes))                       as disk_size,
           formatReadableSize(sum(primary_key_bytes_in_memory)) as primary_keys_size,
           any(engine)                                          as engine,
           sum(bytes)                                           as bytes_size
    from system.parts
    where active and database = currentDatabase()
    group by database, table
    ) parts on columns.table = parts.table
order by parts.bytes_size desc;

things to adjust after the installation, based on https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-settings-to-adjust/ , https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/

prevent never-ending growth of few stats/logs tables:

ALTER TABLE system.asynchronous_metric_log MODIFY TTL  event_date + INTERVAL 2 week;
ALTER TABLE system.query_thread_log MODIFY TTL event_date + INTERVAL 2 week;
ALTER TABLE system.trace_log MODIFY TTL event_date + INTERVAL 1 week;
ALTER TABLE system.metric_log MODIFY TTL event_date + INTERVAL 1 week;
ALTER TABLE system.query_log MODIFY TTL event_date + INTERVAL 1 week;

looks like above has to be done after each clickhouse-server upgrade. also note – disk space is not released immediately after cleanup, you have to wait ~ 8 minutes for that to happen – see here.

Leave a Reply

Your email address will not be published. Required fields are marked *

(Spamcheck Enabled)