Fast grepping (avoiding UTF overhead)

export LANG=C; grep string longBigFile.log
greps using only ascii, skipping the overhead of matching UTF chars. Some stats: $ export LANG=C; time grep -c Quit /var/log/mysqld.log 7432 real 0m0.191s user 0m0.112s sys 0m0.079s $ export LANG=en_US.UTF-8; time grep -c Quit /var/log/mysqld.log 7432 real 0m13.462s user 0m9.485s sys 0m3.977s Try strace-ing grep with and without LANG=C

0
2009-07-14 12:48:02

What Others Think

Tried, saw no effect.
penpen · 487 weeks and 5 days ago
No difference here either in Fedora 11, or Ubuntu 9.04.
flatcap · 487 weeks and 5 days ago
Maybe you're still in LANG=C ;) Try setting another LANG and grep.
ioggstream · 487 weeks and 5 days ago
I tried setting LANG to both values and after the cache gets hot, I see no difference under ubuntu.
bwoodacre · 487 weeks and 4 days ago
test made on RHEL. The same applies to many *nixes see https://bugzilla.redhat.com/show_bug.cgi?id=499220 it seems fixed on ubuntu 9.04
ioggstream · 487 weeks and 4 days ago
you dont have to actually do the export. If you remove the export and the semi-colon around the LANG=C the LANG envirnoment variable will become C for as long as the grep command runs. echo $LANG gives en_US.utf8 LANG=C grep 'foo' /var/log/whatever.log.0 runs in a quicker mode on some distros echo $LANG still gives en_US.utf8;
coffeeaddict_nl · 483 weeks and 6 days ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: