using vector.dev to get haproxy, nginx logs into ClickHouse

at work we have few reverse proxies exposing some APIs. ClickHouse turned out to be great tool for analyzing logs, running ad-hoc research or gathering stats cyclically. below – configs that allow me to get HTTP requests passing via those proxies, including POST bodies, into ClickHouse. i’m using vector.dev to fetch, transform and ship logs. ... Read More

manticore search available via ClickHouse

i’m a big fan of ClickHouse; over time it became central hub for any sort of business intelligence analytics. at work we replicate data to it, use dictionaries loaded over http and mysql, recently – we’re also using MySQL table or database functions to be able to run queries with joins across data from various ... Read More

using clickhouse-local to analyze archived log files

at work we’re hoarding log files. it’s a low-cost, low-tech solution: btrfs, some python script archiving /var/log/*.log, *.1 from hundreds of servers. we have a peace of mind that whatever it is – as long as it’s logging to that folder – we’ll have an archive of it. till now, whenever there was a need ... Read More

ClickHouse – dictionary with string keys

i wanted to create a clickhouse dictionary that used String as a key, not an int. docs mention here: “A composite key can consist of a single element. This makes it possible to use a string as the key, for instance”. I’ve been trying this and failed few times. creation worked but i could not ... Read More

clickhouse – getting n most significant digits of a number

i’m testing some wild idea of finding matches in quite a large dataset. part of the problem is that scaling of input is unknown, 123 might be correct match for 1.23 or for 12.3. here’s an expression that returns me n=4 most significant digits of each float from the input array:

clickhouse

clickhouse is a column oriented OLAP database. i’ve started using it about half a year ago. i’m impressed. earlier i’ve read about it on percona’s blog but did not fully grasp how performant it was. i’ve tried it when i wrestling with MariaDB’s query planner on table with ~100M rows got me tired and each ... Read More