getting unique string stats for large file; where data has small cardinality

#!/bin/bash

path=/some/path/to/logs
(
for f in $( ls -1 $path|grep access.log|grep 2022110 ) ; do
 zcat "$path/$f"|awk '{print $7}'|awk -F '?' '{print $1}'
done
)|awk '{unique_servlets[$0]++}END{for (servlet_name in unique_servlets){ print unique_servlets[servlet_name]" "servlet_name  } }'|sort -n

sadly, not all of our logs are in clickhouse, meaning chewing them can be time consuming and not-so-fun.

Leave a Reply

Your email address will not be published. Required fields are marked *

(Spamcheck Enabled)