Delimiter Hunting

for i in `seq 0 9` A B C D E F; do for j in `seq 0 9` A B C D E F; do HEX=\$\'\\x${i}${j}\'; if ! eval grep -qF "$HEX" file; then eval echo $HEX \\x${i}${j}; fi; done; done 2> /dev/null | less
Scan a file and print out a list of ASCII characters that are not used in the file which can then be safely used to delimit fields. Useful when needing to convert CSV files using "," to a single character delimiter. Piping it into less at the end (which could be redundant) stops the command characters being interpreted by the terminal.

0
By: moogmusic
2012-01-05 10:09:07

2 Alternatives + Submit Alt

  • Here's a perl version that only considers printable characters. Change the regex /[[:print:]]/ to look for different sets of delimiter characters.


    0
    perl -e '$f = join("", <>); for (0..127) {$_ = chr($_); if (/[[:print:]]/) {print if index($f, $_) < 0}} print "\n"'
    putnamhill · 2012-01-05 23:38:06 0
  • Search in decimal rather than hex. od dumps the character list, cut to remove offsets, sort -u gives the used characters. seq gives the comparison list, but we need this sorted alphabetically for comm, which does the filtering. I drop to perl to convert back to characters (is there a better way?) and then use od to dump them in a print-safe format. Show Sample Output


    0
    comm -13 <(od -vw1 -tu1 dummy.txt|cut -c9-|sort -u) <(seq 0 127|sort)|perl -pe '$_=chr($_)'|od -c
    bazzargh · 2012-01-09 01:32:20 0

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: