{"id":3517,"date":"2023-06-15T05:22:09","date_gmt":"2023-06-15T04:22:09","guid":{"rendered":"https:\/\/kudzia.eu\/b\/?p=3517"},"modified":"2023-06-15T07:41:45","modified_gmt":"2023-06-15T06:41:45","slug":"using-vector-to-send-json-formatted-nginx-logs-to-clickhouse","status":"publish","type":"post","link":"https:\/\/kudzia.eu\/b\/2023\/06\/using-vector-to-send-json-formatted-nginx-logs-to-clickhouse\/","title":{"rendered":"using vector to send JSON-formatted nginx logs to ClickHouse"},"content":{"rendered":"\n<p>for some particular case i need to log and later analyze POST bodies of HTTP requests sent via nginx reverse proxy. ClickHouse will store the logs and be useful in analysis.<\/p>\n\n\n\n<!--more-->\n\n\n\n<p>nginx configuration:<\/p>\n\n\n\n<p>in \/etc\/nginx\/conf.d\/pQd-logformat.conf i&#8217;m defining log format that&#8217;s ndjson \/ newline delimited json:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>log_format json_combined escape=json '{'\n        '\"ts\":\"$time_iso8601\",'\n        '\"remote_addr\":\"$remote_addr\",'\n        '\"request\":\"$request\",'\n        '\"status\": \"$status\",'\n        '\"body_bytes_sent\":\"$body_bytes_sent\",'\n        '\"request_body\":\"$request_body\",'\n        '\"request_time\":\"$request_time\",'\n        '\"http_user_agent\":\"$http_user_agent\"'\n'}';\n<\/code><\/pre>\n\n\n\n<p>in vhost definition &#8211; kept in \/etc\/nginx\/sites-enabled\/my.vhost.com i&#8217;m referring to that format:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>server {\n        server_name my.vhost.com;\n        access_log  \/var\/log\/nginx\/my.vhost.com-access.log json_combined;\n        root \/var\/www\/html;\n        index index.html index.htm index.nginx-debian.html;\n\n        location \/ {\n                proxy_pass https:\/\/original.address\/;\n                proxy_connect_timeout 10s;\n                proxy_read_timeout 300s;\n        }\n\n}\n<\/code><\/pre>\n\n\n\n<p>after a restart nginx is producing this in \/var\/log\/nginx\/my.vhost.com-access.log <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\"ts\":\"2023-06-15T04:09:36+00:00\",\"remote_addr\":\"10.12.40.5\",\"request\":\"POST \/something HTTP\/1.1\",\"status\": \"400\",\"body_bytes_sent\":\"138\",\"request_body\":\"some payload\",\"request_time\":\"0.809\",\"http_user_agent\":\"\"}\n{\"ts\":\"2023-06-15T04:09:42+00:00\",\"remote_addr\":\"10.12.40.5\",\"request\":\"POST \/somethingelse HTTP\/1.1\",\"status\": \"400\",\"body_bytes_sent\":\"138\",\"request_body\":\"another payload\",\"request_time\":\"0.798\",\"http_user_agent\":\"\"}\n<\/code><\/pre>\n\n\n\n<p>clickhouse setup:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE DATABASE nginx;\nCREATE TABLE nginx.log\n(\n    `ts` DateTime,\n    `remote_addr` LowCardinality(String),\n    `request` LowCardinality(String),\n    `status` LowCardinality(String),\n    `body_bytes_sent` UInt32,\n    `request_body` String,\n    `request_time` Float32,\n    `http_user_agent` LowCardinality(String)\n)\nENGINE = MergeTree\nORDER BY ts\nSETTINGS index_granularity = 8192;\n\nCREATE USER nginx_log_importer IDENTIFIED WITH plaintext_password BY 'somepass';\nGRANT INSERT ON nginx.log TO nginx_log_importer;\n<\/code><\/pre>\n\n\n\n<p>vector, which i&#8217;ll use to transfer logs from nginx log files to ClickHouse, will use ClickHouse&#8217;s http interface on port 8123 TCP. i&#8217;m ensuring that firewalls allow for communication from server with nginx+vector to one with ClickHouse.<\/p>\n\n\n\n<p>lastly setup of vector, running on machine with nginx &#8211; kept in \/etc\/vector\/vector.toml<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;sources.nginx_logs]\ntype = \"file\"\ninclude = &#91; \"\/var\/log\/nginx\/my.vhost.com-access.log\" ]\nread_from = \"end\"\n\n&#91;transforms.process]\ntype = \"remap\"\ninputs = &#91;\"nginx_logs\"]\nsource = '''\n.=parse_json!(.message)\n'''\n\n&#91;sinks.print]\ntype = \"console\"\ninputs = &#91;\"process\"]\nencoding.codec = \"json\"\n\n&#91;sinks.clickhouse]\ndate_time_best_effort=true\nauth.strategy=\"basic\"\nauth.user=\"nginx_log_importer\"\nauth.password=\"somepass\"\ncompression=\"none\"\ntype = \"clickhouse\"\ninputs = &#91;\"process\"]\nendpoint = \"http:\/\/addressOfClickhouseServer:8123\"\ndatabase = \"nginx\"\ntable = \"log\"\nskip_unknown_fields = true\n<\/code><\/pre>\n\n\n\n<p>i&#8217;m running vector manually to see if all is working:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>vector --config \/etc\/vector\/vector.toml<\/code><\/pre>\n\n\n\n<p>and i&#8217;m making http requests to my.vhost.com to generate some traffic. logs nicely get into ClickHouse&#8217;s new table:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>select * from browserless_log order by ts\\G\nRow 20:\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nts:              2023-06-15 04:09:36\nremote_addr:     10.0.00.5\nrequest:         POST \/somethin HTTP\/1.1\nstatus:          400\nbody_bytes_sent: 138\nrequest_body:    some payload\nrequest_time:    0.809\nhttp_user_agent:\n\nRow 21:\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\nts:              2023-06-15 04:09:42\nremote_addr:     10.0.0.5\nrequest:         POST \/somethingelse  HTTP\/1.1\nstatus:          400\nbody_bytes_sent: 138\nrequest_body:    another payload\nrequest_time:    0.798\nhttp_user_agent:<\/code><\/pre>\n\n\n\n<p>to start vector as a background service:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>chown vector:vector -R \/var\/lib\/vector\/\nsystemctl enable --now vector\nsystemctl start vector<\/code><\/pre>\n\n\n\n<p>once i see all works fine &#8211; i also remove [sinks.print] section from \/etc\/vector\/vector.toml<\/p>\n\n\n\n<p>above is based on:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>https:\/\/clickhouse.com\/docs\/en\/integrations\/vector<\/li>\n\n\n\n<li>https:\/\/medium.com\/datadenys\/using-vector-to-feed-nginx-logs-to-clickhouse-in-real-time-197745d9e88b<\/li>\n\n\n\n<li>https:\/\/vector.dev\/docs\/reference\/configuration\/sinks\/clickhouse\/<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>for some particular case i need to log and later analyze POST bodies of HTTP requests sent via nginx reverse proxy. ClickHouse will store the logs and be useful in analysis.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[112,63],"class_list":["post-3517","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-clickhouse","tag-nginx"],"_links":{"self":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3517","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/comments?post=3517"}],"version-history":[{"count":4,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3517\/revisions"}],"predecessor-version":[{"id":3522,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/posts\/3517\/revisions\/3522"}],"wp:attachment":[{"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/media?parent=3517"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/categories?post=3517"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kudzia.eu\/b\/wp-json\/wp\/v2\/tags?post=3517"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}