Fast grepping (avoiding UTF overhead)

export LANG=C; grep string longBigFile.log
greps using only ascii, skipping the overhead of matching UTF chars. Some stats: $ export LANG=C; time grep -c Quit /var/log/mysqld.log 7432 real 0m0.191s user 0m0.112s sys 0m0.079s $ export LANG=en_US.UTF-8; time grep -c Quit /var/log/mysqld.log 7432 real 0m13.462s user 0m9.485s sys 0m3.977s Try strace-ing grep with and without LANG=C

2009-07-14 12:48:02

What Others Think

Tried, saw no effect.
penpen · 466 weeks and 5 days ago
No difference here either in Fedora 11, or Ubuntu 9.04.
flatcap · 466 weeks and 5 days ago
Maybe you're still in LANG=C ;) Try setting another LANG and grep.
ioggstream · 466 weeks and 5 days ago
I tried setting LANG to both values and after the cache gets hot, I see no difference under ubuntu.
bwoodacre · 466 weeks and 4 days ago
test made on RHEL. The same applies to many *nixes see it seems fixed on ubuntu 9.04
ioggstream · 466 weeks and 4 days ago
you dont have to actually do the export. If you remove the export and the semi-colon around the LANG=C the LANG envirnoment variable will become C for as long as the grep command runs. echo $LANG gives en_US.utf8 LANG=C grep 'foo' /var/log/whatever.log.0 runs in a quicker mode on some distros echo $LANG still gives en_US.utf8;
coffeeaddict_nl · 462 weeks and 5 days ago

