Delimiter Hunting

comm -13 <(od -vw1 -tu1 dummy.txt|cut -c9-|sort -u) <(seq 0 127|sort)|perl -pe '$_=chr($_)'|od -c
Search in decimal rather than hex. od dumps the character list, cut to remove offsets, sort -u gives the used characters. seq gives the comparison list, but we need this sorted alphabetically for comm, which does the filtering. I drop to perl to convert back to characters (is there a better way?) and then use od to dump them in a print-safe format.
Sample Output
0000000  \0 001  \n   j   k  \v  \f   z   {   |   }   ~ 177  \r 016 017
0000020 020 021 022 023 002 024 025 026 027 030 031 032 033 034 035 003
0000040 036 037       !   "   #   $   %   &   ' 004   (   )   *   +   ,
0000060   -   .   /   0   1 005   2   3   4   5   6   7   8   9   :   ;
0000100 006   <   =   >   ?   @   A   B   C   D   E  \a   F   G   H   I
0000120   J   K   L   M   N   O  \b   P   Q   R   S   T   U   V   W   X
0000140   Y  \t   Z   [   \   ]   ^   _   `   a   b   c
0000154

0
By: bazzargh
2012-01-09 01:32:20

2 Alternatives + Submit Alt

  • Scan a file and print out a list of ASCII characters that are not used in the file which can then be safely used to delimit fields. Useful when needing to convert CSV files using "," to a single character delimiter. Piping it into less at the end (which could be redundant) stops the command characters being interpreted by the terminal.


    0
    for i in `seq 0 9` A B C D E F; do for j in `seq 0 9` A B C D E F; do HEX=\$\'\\x${i}${j}\'; if ! eval grep -qF "$HEX" file; then eval echo $HEX \\x${i}${j}; fi; done; done 2> /dev/null | less
    moogmusic · 2012-01-05 10:09:07 0
  • Here's a perl version that only considers printable characters. Change the regex /[[:print:]]/ to look for different sets of delimiter characters.


    0
    perl -e '$f = join("", <>); for (0..127) {$_ = chr($_); if (/[[:print:]]/) {print if index($f, $_) < 0}} print "\n"'
    putnamhill · 2012-01-05 23:38:06 0

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: