Rebuild a Delimited File with a Unique Delimiter

sed 's/$/uniqueString/' file.old | sed 's/,/\n/g' | sed ':loop;/^\"[^\"]*$/N;s/\n/,/;/[^\"]$/t loop' | sed ':loop;N;s/\n/@/g;/uniqueString$/!b loop;s/uniqueString$//' > file.new
Useful for CSV files. In the command, the file in question is comma delimited but contains double quoted fields containing commas and contains no @ symbols (as confirmed with http://www.commandlinefu.com/commands/view/9998/delimiter-hunting). This command converts the delimiting commas to @s while preserving the commas in the fields using the "uniqueString" to mark the ends of lines.

0
By: moogmusic
2012-01-06 10:06:40

These Might Interest You

  • grep 'HOME.*' data.txt | awk '{print $2}' | awk '{FS="/"}{print $NF}' OR awk '/HOME/ {print $2}' data.txt | awk -F'/' '{print $NF}' In this example, we are having a text file that is having several entries like: --- c1 c2 c3 c4 this is some data HOME /dir1/dir2/.../dirN/somefile1.xml HOME /dir1/dir2/somefile2.xml some more data --- for lines starting with HOME, we are extracting the second field that is a 'file path with file name', and from that we need to get the filename only and ignore the slash delimited path. The output would be: somefile1.xml somefile2.xml (In case you give a -ive - pls give the reasons as well and enlighten the souls :-) )


    -3
    grep 'HOME.*' data.txt | awk '{print $2}' | awk '{FS="/"}{print $NF}' OR USE ALTERNATE WAY awk '/HOME/ {print $2}' data.txt | awk -F'/' '{print $NF}'
    rommelsharma · 2009-03-05 07:28:26 7
  • Removes duplicates in the specified field/column while outputting entire lines. An elegant command for processing tab (or otherwise) delimited data.


    1
    awk '!array[$1]++' file.txt
    bede · 2012-08-23 21:04:51 0
  • * Find all file sizes and file names from the current directory down (replace "." with a target directory as needed). * sort the file sizes in numeric order * List only the duplicated file sizes * drop the file sizes so there are simply a list of files (retain order) * calculate md5sums on all of the files * replace the first instance of two spaces (md5sum output) with a \0 * drop the unique md5sums so only duplicate files remain listed * Use AWK to aggregate identical files on one line. * Remove the blank line from the beginning (This was done more efficiently by putting another "IF" into the AWK command, but then the whole line exceeded the 255 char limit). >>>> Each output line contains the md5sum and then all of the files that have that identical md5sum. All fields are \0 delimited. All records are \n delimited.


    0
    find . -type f -not -empty -printf "%-25s%p\n"|sort -n|uniq -D -w25|cut -b26-|xargs -d"\n" -n1 md5sum|sed "s/ /\x0/"|uniq -D -w32|awk -F"\0" 'BEGIN{l="";}{if(l!=$1||l==""){printf "\n%s\0",$1}printf "\0%s",$2;l=$1}END{printf "\n"}'|sed "/^$/d"
    alafrosty · 2013-10-22 13:34:19 0
  • Scan a file and print out a list of ASCII characters that are not used in the file which can then be safely used to delimit fields. Useful when needing to convert CSV files using "," to a single character delimiter. Piping it into less at the end (which could be redundant) stops the command characters being interpreted by the terminal.


    0
    for i in `seq 0 9` A B C D E F; do for j in `seq 0 9` A B C D E F; do HEX=\$\'\\x${i}${j}\'; if ! eval grep -qF "$HEX" file; then eval echo $HEX \\x${i}${j}; fi; done; done 2> /dev/null | less
    moogmusic · 2012-01-05 10:09:07 0

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: