at work we have few reverse proxies exposing some APIs. ClickHouse turned out to be great tool for analyzing logs, running ad-hoc research or gathering stats cyclically. below – configs that allow me to get HTTP requests passing via those proxies, including POST bodies, into ClickHouse. i’m using vector.dev to fetch, transform and ship logs.
case 1 – reverse proxy on nginx:
/etc/nginx/conf.d/logformat.conf – defining log format well formatted Newline Delimited JSON
log_format json_combined escape=json '{'
'"ts":"$time_iso8601",'
'"remote_addr":"$remote_addr",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_body":"$request_body",'
'"request_time":"$request_time",'
'"http_user_agent":"$http_user_agent"'
'}';
vhost that proxies traffic to the backend – /etc/nginx/sites-enabled/vhost
server {
listen 80 default_server;
server_name some.name.com;
access_log /var/log/nginx/some.name.com-access.log json_combined;
root /var/www/html;
location / {
proxy_pass http://127.0.0.1:1234/;
proxy_connect_timeout 10s;
proxy_read_timeout 300s;
}
}
single line from from /var/log/nginx/some.name.com-access.log, pretty printed:
{
"ts": "2025-11-21T16:29:41+00:00",
"remote_addr": "10.16.10.100",
"request": "POST /something... HTTP/1.1",
"status": "200",
"body_bytes_sent": "539041",
"request_body": "........post body......",
"request_time": "11.257",
"http_user_agent": "some.agent"
}
vector.dev config to get this into ClickHouse:
sources:
nginx_logs:
type: file
include:
- /var/log/nginx/some.name.com-access.log
read_from: end
transforms:
process:
type: remap
inputs:
- nginx_logs
source: |
.=parse_json!(.message)
sinks:
clickhouse:
date_time_best_effort: true
auth:
strategy: basic
user: clickhouse_user
password: ................
compression: none
type: clickhouse
inputs:
- process
endpoint: http://address.of.clickhouse.server:8123
database: mylogs
table: nginx_log
skip_unknown_fields: true
table structure in clickhouse – mylogs.nginx_log:
CREATE TABLE mylogs.nginx_log
(
`ts` DateTime,
`remote_addr` LowCardinality(String),
`request` LowCardinality(String),
`status` LowCardinality(String),
`body_bytes_sent` UInt32,
`request_body` String,
`request_time` Float32,
`http_user_agent` LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY ts
TTL ts + toIntervalDay(90)
SETTINGS index_granularity = 8192
case 2 – doing the same with haprxy
part of haproxy.cfg defining log format:
log-format "{\"timestamp\": \"[%t]\", \"backend_name\": \"%b\", \"backend_ip_port\": \"%si:%sp\", \"remote_addr\": \"%ci\", \"request\": \"%r\", \"status\": \"%ST\", \"body_bytes_sent\": \"%B\", \"request_time\":\"%TR\", \"total_session_time\": \"%Tt\", \"server_response_time\": \"%Tr\", \"http_user_agent\": \"%[capture.req.hdr(1)]\", \"request_body\": \"%[capture.req.hdr(0),json(utf8s)]\"}"
/etc/rsyslog.d/49-haproxy.conf – to route logs to local file:
$AddUnixListenSocket /var/lib/haproxy/dev/log
:programname, startswith, "haproxy" {
/var/log/haproxy.log
stop
}
sample log line in /var/log/haproxy.log:
2025-11-21T16:48:01.651481+00:00 serverName haproxy[487893]: {"timestamp": "[21/Nov/2025:16:48:00.441]", "backend_name": "bl_workers", "backend_ip_port": "10.7.37.11:443", "remote_addr": "172.16.1.11", "request": "POST /.... HTTP/1.1", "status": "200", "body_bytes_sent": "345", "request_time":"0", "total_session_time": "1209", "server_response_time": "1156", "http_user_agent": "....", "request_body": "................."}
config for vector.dev – we need to extract from syslog format actual JSON message, then transform timestamp from common log format to ISO8601:
sources:
haproxy_logs:
type: file
include:
- /var/log/haproxy.log
read_from: end
transforms:
strip_and_parse_json:
type: remap
inputs:
- haproxy_logs
source: |
json, err = parse_regex(.message, r'^(?:\S+\s+){3}(?P<message>.*)$')
if err == null {
json = parse_json!(json.message)
json.ts = parse_timestamp!(.timestamp, format: "%d/%b/%Y:%H:%M:%S.%3f")
json.request_time = parse_int!(json.server_response_time)/1000.0
. = json
}
sinks:
console:
type: console
inputs:
- strip_and_parse_json
encoding:
codec: text
clickhouse:
date_time_best_effort: true
auth:
strategy: basic
user: clickhouse_user
password: .............
compression: none
type: clickhouse
inputs:
- strip_and_parse_json
endpoint: http://address.of.clickhouse.server:8123
database: mylogs
table: haproxy_log
skip_unknown_fields: true
clickhouse schema:
CREATE TABLE mylogs.haproxy_log
(
`ts` DateTime,
`remote_addr` LowCardinality(String),
`request` LowCardinality(String),
`status` LowCardinality(String),
`body_bytes_sent` UInt32,
`request_body` String,
`request_time` Float32,
`http_user_agent` LowCardinality(String),
`backend_ip_port` LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY ts
TTL ts + toIntervalDay(90)
SETTINGS index_granularity = 8192
when trying to get the vector.dev config working – it was helpful to run ngrep and see directly in terminal what vector tried to send to ClickHouse. i was using this syntax:
ngrep -W byline "POST" tcp and host address.of.clickhouse.server -c 250