transform relative URLs (shoddy hack but it works)

wget -k $URL
The "-k" flag will tell wget to convert links for local browsing; it works with mirroring (ie with "-r") or single-file downloads.

-2
By: minnmass
2010-08-21 17:39:53

These Might Interest You

  • in "a.html", find all images referred as relative URI in an HTML file by "src" attribute of "img" element, replace them with "data:" URI. This useful to create single HTML file holding all images in it, as a replacement of the IE-created .mht file format. The generated HTML works fine on every other browser except IE, as well as many HTML editors like kompozer, while the .mht format only works for IE, but not for every other browser. Compare to the KDE's own single-file-web-page format "war" format, which only opens correctly on KDE, the HTML file with "data:" URI is more universally supported. The above command have many bugs. My commandline-fu is too limited to fix them: 1. it assume all URLs are relative URIs, thus works in this case: <img src="images/logo.png"/> but does not work in this case: <img src="http://www.my_web_site.com/images/logo.png" /> This may not be a bug, as full URIs perhaps should be ignored in many use cases. 2. it only work for images whoes file name suffix is one of .jpg, .gif, .png, albeit images with .jpeg suffix and those without extension names at all are legal to HTML. 3. image file name is not allowed to contain "(" even though frequently used, as in "(copy of) my car.jpg". Besides, neither single nor double quotes are allowed. 4. There is infact a big flaw in this, file names are actually used as regular expression to be replaced with base64 encoded content. This cause the script to fail in many other cases. Example: 'D:\images\logo.png', where backward slash have different meaning in regular expression. I don't know how to fix this. I don't know any command that can do full text (no regular expression) replacement the way basic editors like gedit does. 5. The original a.html are not preserved, so a user should make a copy first in case things go wrong.


    4
    grep -ioE "(url\(|src=)['\"]?[^)'\"]*" a.html | grep -ioE "[^\"'(]*.(jpg|png|gif)" | while read l ; do sed -i "s>$l>data:image/${l/[^.]*./};base64,`openssl enc -base64 -in $l| tr -d '\n'`>" a.html ; done;
    zhangweiwu · 2010-05-05 14:07:51 2
  • Flips the y-axis to emulate a real mirror, uses low resolution for speed, this will also hide blemishes and the like :)


    2
    cvlc v4l2:// :vout-filter=transform :transform-type=vflip :v4l2-width=320 :v4l2-height=240 -f &
    Escher · 2009-09-16 15:30:52 0

  • 1
    cat urls.txt | wget -i- -T 10 -t 3 --waitretry 1
    kev · 2012-05-14 06:41:14 2
  • This will extract all of the urls from a firefox session (including urls in a tab's history). The sessionstore.js file is in ~/.mozilla/firefox/{firefox profile}


    2
    sed -e "s/\[{/\n/g" -e "s/}, {/\n/g" sessionstore.js | grep url | awk -F"," '{ print $1 }'| sed -e "s/url:\"\([^\"]*\)\"/\1/g" -e "/^about:blank/d" > session_urls.txt
    birnam · 2009-06-14 15:08:31 2

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: