count the number of specific characters in a file or text stream

find /some/path -type f -and -iregex '.*\.mp3$' -and -print0 | tr -d -c '\000' |wc -c
In this example, the command will recursively find files (-type f) under /some/path, where the path ends in .mp3, case insensitive (-iregex). It will then output a single line of output (-print0), with results terminated by a the null character (octal 000). Suitable for piping to xargs -0. This type of output avoids issues with garbage in paths, like unclosed quotes. The tr command then strips away everything but the null chars, finally piping to wc -c, to get a character count. I have found this very useful, to verify one is getting the right number of before you actually process the results through xargs or similar. Yes, one can issue the find without the -print0 and use wc -l, however if you want to be 1000% sure your find command is giving you the expected number of results, this is a simple way to check. The approach can be made in to a function and then included in .bashrc or similar. e.g. count_chars() { tr -d -c "$1" | wc -c; } In this form it provides a versatile character counter of text streams :)
Sample Output
8848

1
By: kyle0r
2012-03-31 21:57:33

These Might Interest You

  • I find this terribly useful for grepping through a file, looking for just a block of text. There's "grep -A # pattern file.txt" to see a specific number of lines following your pattern, but what if you want to see the whole block? Say, the output of "dmidecode" (as root): dmidecode | awk '/Battery/,/^$/' Will show me everything following the battery block up to the next block of text. Again, I find this extremely useful when I want to see whole blocks of text based on a pattern, and I don't care to see the rest of the data in output. This could be used against the '/etc/securetty/user' file on Unix to find the block of a specific user. It could be used against VirtualHosts or Directories on Apache to find specific definitions. The scenarios go on for any text formatted in a block fashion. Very handy.


    85
    awk '/start_pattern/,/stop_pattern/' file.txt
    atoponce · 2009-03-28 14:28:59 7
  • Count how many times a pattern is present into a stream. It can be one or more lines. No overlapping. It means searching for aa on aaa will output 1 not 2.


    0
    echo something | awk '{ total += gsub(/yourstring/,"") } END { print total }'
    bugmenot · 2014-12-16 20:58:42 0

  • 5
    awk '{count[length]++}END{for(i in count){printf("%d: %d\n", count[i], i)}}'
    bpfx · 2009-02-05 14:20:39 3
  • Ever need to get some text that is a specific number of characters long? Use this function to easily generate it! Doesn't look pretty, but sure does work for testing purposes! Show Sample Output


    0
    genRandomText() { a=( a b c d e f g h i j k l m n o p q r s t u v w x y z );f=0;for i in $(seq 1 $(($1-1))); do r=$(($RANDOM%26)); if [ "$f" -eq 1 -a $(($r%$i)) -eq 0 ]; then echo -n " ";f=0;continue; else f=1;fi;echo -n ${a[$r]};done;echo"";}
    bbbco · 2012-01-20 21:18:16 0

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: