Commands by zhangweiwu (9)

  • The exported TSV file of Google Adwords' first five columns are text, they usually should collapse into one cell, a multi-line text cell, but there is no guaranteed way to represent line-break within cells for .tsv file format, thus Google split it to 5 columns. The problem is, with 5 columns of text, there are hardly space to put additional fields while maintain printable output. This script collapses the first five columns of each row into one single multi-line text cell, for console output or direct send to printer.


    -1
    awk -F $'\t' '{printf $1 LS $2 LS $3 LS $4 LS $5; for (i = 7; i < NF; i++) printf $i "\t"; printf "\n--\n";}' LS=$'\n' 'Ad report.tsv' | column -t -s $'\t'
    zhangweiwu · 2011-02-28 10:52:16 5
  • The exported TSV file of Google Adwords' first five columns are text, they usually should collapse into one cell, a multi-line text cell, but there is no guaranteed way to represent line-break within cells for .tsv file format, thus Google split it to 5 columns. The problem is, with 5 columns of text, there are hardly space to put additional fields while maintain printable output. This script collapses the first five columns of each row into one single multi-line text cell. new line character we use Line-Separator character (unicode U+2028), which is respected by gnumeric. It outputs a new .tsv file that opens in gnumeric.


    0
    awk -F $'\t' '{printf $1 LS $2 LS $3 LS $4 LS $5; for (i = 7; i < NF; i++) printf $i "\t"; printf "\n";}' LS=`env printf '\u2028'` 'Ad report.tsv'
    zhangweiwu · 2011-02-28 10:48:46 4
  • To rip DVD movie to ogg format using ffmpeg, follow these steps. 1) find the vob files on the mounted video DVD in VIDEO_TS that stores the movie itself. There would be a few other VOB files that stores splash screen or special features, the vob files for the movie itself can be identified by its superior size. You can verify these vob files by playing them directly with a player (e.g. mplayer) 2) concatenate all such vob files, pipe to ffmpeg 3) calculate the video size and crop size. The ogg video size must be multiple of 16 on both width and height, this is inherit limitation of theora codec. In my case I took 512x384. The -vcodec parameter is necessary because ffmpeg doesn't support theora by itself. -acodec is necessary otherwise ffmpeg uses flac by default.


    5
    cat VIDEO_TS/VTS_01_[1234].VOB | nice ffmpeg -i - -s 512x384 -vcodec libtheora -acodec libvorbis ~/Videos/dvd_rip.ogg
    zhangweiwu · 2010-09-14 14:45:27 26
  • This command is suitable to use as application launching command for a desktop shortcut. It checks if the application is already running by pgrepping its process ID, and offer user to kill the old process before starting a new one. It is useful for a few x11 application that, if re-run, is more likely a mistake. In my example, x2vnc is an x11 app that does not quit when its connection is broken, and would not work well when a second process establish a second connection after the first broken one. The LC_ALL=C for xmesseng is necessary for OpenSUSE systems to avoid a bug. If you don't find needing it, remove the "env LC_ALL=C" part


    0
    sh -c 'if pgrep x2vnc && env LC_ALL=C xmessage -button "Kill it:0,Ignore it:1" "Another connection is already running. Should I kill it instead of ignoring it?"; then killall x2vnc; fi; x2vnc -passwd /home/Ariel/.vnc/passwd -east emerson:0'
    zhangweiwu · 2010-07-06 09:11:12 5
  • You might want to check what file and directory names would be renamed or chopped if you create iso 9660 level 2 image out of them. Use this command to check first. Show Sample Output


    1
    find . -regextype posix-extended -not -regex '.*/[A-Za-z_]*([.][A-Za-z_]*)?'
    zhangweiwu · 2010-06-25 00:27:09 4
  • This is useful when you got a reserved IP address like 192.168.0.100 and want to find out what IP address is used to access the Internet. You have to know a server with 'efingerd -n' configured, like www.linuxbanks.cn as above. Other method to find out this information are for example access www.tell-my-ip.com and grep the output. The finger method have the advantage that it is easy to deploy a service like www.tell-my-ip.com, as you only need to get efingerd installed.


    -4
    finger @www.linuxbanks.cn | grep -oE '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | head -n1
    zhangweiwu · 2010-05-05 14:58:55 7
  • in "a.html", find all images referred as relative URI in an HTML file by "src" attribute of "img" element, replace them with "data:" URI. This useful to create single HTML file holding all images in it, as a replacement of the IE-created .mht file format. The generated HTML works fine on every other browser except IE, as well as many HTML editors like kompozer, while the .mht format only works for IE, but not for every other browser. Compare to the KDE's own single-file-web-page format "war" format, which only opens correctly on KDE, the HTML file with "data:" URI is more universally supported. The above command have many bugs. My commandline-fu is too limited to fix them: 1. it assume all URLs are relative URIs, thus works in this case: <img src="images/logo.png"/> but does not work in this case: <img src="http://www.my_web_site.com/images/logo.png" /> This may not be a bug, as full URIs perhaps should be ignored in many use cases. 2. it only work for images whoes file name suffix is one of .jpg, .gif, .png, albeit images with .jpeg suffix and those without extension names at all are legal to HTML. 3. image file name is not allowed to contain "(" even though frequently used, as in "(copy of) my car.jpg". Besides, neither single nor double quotes are allowed. 4. There is infact a big flaw in this, file names are actually used as regular expression to be replaced with base64 encoded content. This cause the script to fail in many other cases. Example: 'D:\images\logo.png', where backward slash have different meaning in regular expression. I don't know how to fix this. I don't know any command that can do full text (no regular expression) replacement the way basic editors like gedit does. 5. The original a.html are not preserved, so a user should make a copy first in case things go wrong.


    4
    grep -ioE "(url\(|src=)['\"]?[^)'\"]*" a.html | grep -ioE "[^\"'(]*.(jpg|png|gif)" | while read l ; do sed -i "s>$l>data:image/${l/[^.]*./};base64,`openssl enc -base64 -in $l| tr -d '\n'`>" a.html ; done;
    zhangweiwu · 2010-05-05 14:07:51 14
  • Sometimes you want to work on data sheets by using heirloom unic commands like cut, paste, sed, sort, wc and good old awk. But your user works on Microsoft Excel spreadsheet. The idea: 1) ask your user to save it as "Unicode Text" from Microsoft Excel and send the document to you; 2) use the given command to convert it to UTF-8 text. We carefully convert "\r\n" to local end-of-line character; and to convert "\n" (in Excel, means linebreak within the table cell") to "\r", which is carrier return but not end-of-line in Unix. If the "\n" is not replaced with "\r", for example, wc -l will report incorrect column number.


    0
    iconv -f UTF16LE -t UTF-8 < SOURCE | awk 'BEGIN { RS="\r\n";} { gsub("\n", "\r"); print;}' > TARGET
    zhangweiwu · 2010-04-04 07:16:57 4
  • The command gives size of all files smaller than 1024k, this information, together with disk usage, can help determin file system parameter (e.g. block size) or storage device (e.g. SSD v.s. HDD). Note if you use awk instead of "cut| dc", you easily breach maximum allowed number of records in awk. Show Sample Output


    1
    find dir -size -1024k -type f | xargs -d $'\n' -n1 ls -l | cut -d ' ' -f 5 | sed -e '2,$s/$/+/' -e '$ap' | dc
    zhangweiwu · 2009-12-28 04:23:01 6

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands


Check These Out

Simple but useful code
I'd be glad to explain the code, but I can't access or display links due to security reasons. Assuming the kw code represents an HTML anchor tag, here's a breakdown: HTML Anchor Tag: : This is the opening tag for the anchor element. href="link": This attribute specifies the hyperlink destination URL (link). When clicked, the user's browser navigates to the provided URL. kw: This is the content displayed within the anchor tag. It can be text or an image. In this case, it's likely text that says "kw". : This is the closing tag for the anchor element.

Delete specific sender in mailq

Recursively execute command on directories (.svn, permissions, etc)
The above command will set the GID bit on all directories named .svn in the current directory recursively. This makes the group ownership of all .svn folders be the group ownership for all files created in that folder, no matter the user. This is useful for me as the subversion working directory on my server is also the live website and needs to be auto committed to subversion every so often via cron as well as worked on by multiple users. Setting the GID bit on the .svn folders makes sure we don't have a mix of .svn metadata created by a slew of different users.

Open Remote Desktop (RDP) from command line having a custom screen size
This example uses xfreerdp, which builds upon the development of rdesktop. This example usage will also send you the remote machine's sound.

[git] Output remote origin from within a local repository
Great way to quickly grasp if a locally cloned repository originates from e.g. github or elsewhere.

Insert the last argument of the previous command

list block devices
Shows all block devices in a tree with descruptions of what they are.

Add to Instapaper
Adds URL to Instapaper. Usage: instapaper-add user@example.com 12345 http://www.commandlinefu.com/

Which processes are listening on a specific port (e.g. port 80)
swap out "80" for your port of interest. Can use port number or named ports e.g. "http"

pretend to be busy in office to enjoy a cup of coffee
Create a progress dialog with custom title and text using zenity.


Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: