getting unique string stats for large file; where data has small cardinality

2022-12-142023-06-05 by .pQd

#!/bin/bash

path=/some/path/to/logs
(
for f in $( ls -1 $path|grep access.log|grep 2022110 ) ; do
 zcat "$path/$f"|awk '{print $7}'|awk -F '?' '{print $1}'
done
)|awk '{unique_servlets[$0]++}END{for (servlet_name in unique_servlets){ print unique_servlets[servlet_name]" "servlet_name  } }'|sort -n

sadly, not all of our logs are in clickhouse, meaning chewing them can be time consuming and not-so-fun.

Leave a Reply Cancel reply