Commands by zhangweiwu (9)

  • The exported TSV file of Google Adwords' first five columns are text, they usually should collapse into one cell, a multi-line text cell, but there is no guaranteed way to represent line-break within cells for .tsv file format, thus Google split it to 5 columns. The problem is, with 5 columns of text, there are hardly space to put additional fields while maintain printable output. This script collapses the first five columns of each row into one single multi-line text cell, for console output or direct send to printer.


    -1
    awk -F $'\t' '{printf $1 LS $2 LS $3 LS $4 LS $5; for (i = 7; i < NF; i++) printf $i "\t"; printf "\n--\n";}' LS=$'\n' 'Ad report.tsv' | column -t -s $'\t'
    zhangweiwu · 2011-02-28 10:52:16 4
  • The exported TSV file of Google Adwords' first five columns are text, they usually should collapse into one cell, a multi-line text cell, but there is no guaranteed way to represent line-break within cells for .tsv file format, thus Google split it to 5 columns. The problem is, with 5 columns of text, there are hardly space to put additional fields while maintain printable output. This script collapses the first five columns of each row into one single multi-line text cell. new line character we use Line-Separator character (unicode U+2028), which is respected by gnumeric. It outputs a new .tsv file that opens in gnumeric.


    0
    awk -F $'\t' '{printf $1 LS $2 LS $3 LS $4 LS $5; for (i = 7; i < NF; i++) printf $i "\t"; printf "\n";}' LS=`env printf '\u2028'` 'Ad report.tsv'
    zhangweiwu · 2011-02-28 10:48:46 3
  • To rip DVD movie to ogg format using ffmpeg, follow these steps. 1) find the vob files on the mounted video DVD in VIDEO_TS that stores the movie itself. There would be a few other VOB files that stores splash screen or special features, the vob files for the movie itself can be identified by its superior size. You can verify these vob files by playing them directly with a player (e.g. mplayer) 2) concatenate all such vob files, pipe to ffmpeg 3) calculate the video size and crop size. The ogg video size must be multiple of 16 on both width and height, this is inherit limitation of theora codec. In my case I took 512x384. The -vcodec parameter is necessary because ffmpeg doesn't support theora by itself. -acodec is necessary otherwise ffmpeg uses flac by default.


    5
    cat VIDEO_TS/VTS_01_[1234].VOB | nice ffmpeg -i - -s 512x384 -vcodec libtheora -acodec libvorbis ~/Videos/dvd_rip.ogg
    zhangweiwu · 2010-09-14 14:45:27 23
  • This command is suitable to use as application launching command for a desktop shortcut. It checks if the application is already running by pgrepping its process ID, and offer user to kill the old process before starting a new one. It is useful for a few x11 application that, if re-run, is more likely a mistake. In my example, x2vnc is an x11 app that does not quit when its connection is broken, and would not work well when a second process establish a second connection after the first broken one. The LC_ALL=C for xmesseng is necessary for OpenSUSE systems to avoid a bug. If you don't find needing it, remove the "env LC_ALL=C" part


    0
    sh -c 'if pgrep x2vnc && env LC_ALL=C xmessage -button "Kill it:0,Ignore it:1" "Another connection is already running. Should I kill it instead of ignoring it?"; then killall x2vnc; fi; x2vnc -passwd /home/Ariel/.vnc/passwd -east emerson:0'
    zhangweiwu · 2010-07-06 09:11:12 3
  • You might want to check what file and directory names would be renamed or chopped if you create iso 9660 level 2 image out of them. Use this command to check first. Show Sample Output


    1
    find . -regextype posix-extended -not -regex '.*/[A-Za-z_]*([.][A-Za-z_]*)?'
    zhangweiwu · 2010-06-25 00:27:09 3
  • This is useful when you got a reserved IP address like 192.168.0.100 and want to find out what IP address is used to access the Internet. You have to know a server with 'efingerd -n' configured, like www.linuxbanks.cn as above. Other method to find out this information are for example access www.tell-my-ip.com and grep the output. The finger method have the advantage that it is easy to deploy a service like www.tell-my-ip.com, as you only need to get efingerd installed.


    -4
    finger @www.linuxbanks.cn | grep -oE '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | head -n1
    zhangweiwu · 2010-05-05 14:58:55 5
  • in "a.html", find all images referred as relative URI in an HTML file by "src" attribute of "img" element, replace them with "data:" URI. This useful to create single HTML file holding all images in it, as a replacement of the IE-created .mht file format. The generated HTML works fine on every other browser except IE, as well as many HTML editors like kompozer, while the .mht format only works for IE, but not for every other browser. Compare to the KDE's own single-file-web-page format "war" format, which only opens correctly on KDE, the HTML file with "data:" URI is more universally supported. The above command have many bugs. My commandline-fu is too limited to fix them: 1. it assume all URLs are relative URIs, thus works in this case: <img src="images/logo.png"/> but does not work in this case: <img src="http://www.my_web_site.com/images/logo.png" /> This may not be a bug, as full URIs perhaps should be ignored in many use cases. 2. it only work for images whoes file name suffix is one of .jpg, .gif, .png, albeit images with .jpeg suffix and those without extension names at all are legal to HTML. 3. image file name is not allowed to contain "(" even though frequently used, as in "(copy of) my car.jpg". Besides, neither single nor double quotes are allowed. 4. There is infact a big flaw in this, file names are actually used as regular expression to be replaced with base64 encoded content. This cause the script to fail in many other cases. Example: 'D:\images\logo.png', where backward slash have different meaning in regular expression. I don't know how to fix this. I don't know any command that can do full text (no regular expression) replacement the way basic editors like gedit does. 5. The original a.html are not preserved, so a user should make a copy first in case things go wrong.


    4
    grep -ioE "(url\(|src=)['\"]?[^)'\"]*" a.html | grep -ioE "[^\"'(]*.(jpg|png|gif)" | while read l ; do sed -i "s>$l>data:image/${l/[^.]*./};base64,`openssl enc -base64 -in $l| tr -d '\n'`>" a.html ; done;
    zhangweiwu · 2010-05-05 14:07:51 13
  • Sometimes you want to work on data sheets by using heirloom unic commands like cut, paste, sed, sort, wc and good old awk. But your user works on Microsoft Excel spreadsheet. The idea: 1) ask your user to save it as "Unicode Text" from Microsoft Excel and send the document to you; 2) use the given command to convert it to UTF-8 text. We carefully convert "\r\n" to local end-of-line character; and to convert "\n" (in Excel, means linebreak within the table cell") to "\r", which is carrier return but not end-of-line in Unix. If the "\n" is not replaced with "\r", for example, wc -l will report incorrect column number.


    0
    iconv -f UTF16LE -t UTF-8 < SOURCE | awk 'BEGIN { RS="\r\n";} { gsub("\n", "\r"); print;}' > TARGET
    zhangweiwu · 2010-04-04 07:16:57 3
  • The command gives size of all files smaller than 1024k, this information, together with disk usage, can help determin file system parameter (e.g. block size) or storage device (e.g. SSD v.s. HDD). Note if you use awk instead of "cut| dc", you easily breach maximum allowed number of records in awk. Show Sample Output


    1
    find dir -size -1024k -type f | xargs -d $'\n' -n1 ls -l | cut -d ' ' -f 5 | sed -e '2,$s/$/+/' -e '$ap' | dc
    zhangweiwu · 2009-12-28 04:23:01 5

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands


Check These Out

Display Spinner while waiting for some process to finish
alternatively, run the spinner for 5 seconds: timeout 5 bash -c 'spinner=( Ooooo oOooo ooOoo oooOo ooooO oooOo ooOoo oOooo); while true; do for i in ${spinner[@]}; do for j in seq 0 ${#i}; do echo -en "\b\b"; done; echo -ne "${i}"; sleep 0.2; done; done'

Find partition name using mount point
lsblk | grep mountpoint

list all opened ports on host

Get your outgoing IP address
should be very consistent cause it's google :-)

Get ethX mac addresses
I much prefer using /sbin/ip over /sbin/ifconfig for most everything. I find the interface and output to be much more consistent and it has many abilities that ifconfig, route, etc. do not. To get the mac address for only one interface, add 'show dev [interface]' to the 'ip link' part of the command: ip link show dev eth0 | grep 'link/ether' | awk '{print $2}' . Also, both this command and the ifconfig one do not require root access to run, so the sudo is not necessary.

get all Amazon cloud (amazonws etc) ipv4 subnets

Which processes are listening on a specific port (e.g. port 80)
swap out "80" for your port of interest. Can use port number or named ports e.g. "http"

For a $FILE, extracts the path, filename, filename without extension and extension.
Useful for use in other scripts for renaming, testing for extensions, etc.

Show changed files, ignoring permission, date and whitespace changes
Only shows files with actual changes to text (excluding whitespace). Useful if you've messed up permissions or transferred in files from windows or something like that, so that you can get a list of changed files, and clean up the rest.

delete all leading and trailing whitespace from each line in file


Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: