embed referred images in HTML files

grep -ioE "(url\(|src=)['\"]?[^)'\"]*" a.html | grep -ioE "[^\"'(]*.(jpg|png|gif)" | while read l ; do sed -i "s>$l>data:image/${l/[^.]*./};base64,`openssl enc -base64 -in $l| tr -d '\n'`>" a.html ; done;
in "a.html", find all images referred as relative URI in an HTML file by "src" attribute of "img" element, replace them with "data:" URI. This useful to create single HTML file holding all images in it, as a replacement of the IE-created .mht file format. The generated HTML works fine on every other browser except IE, as well as many HTML editors like kompozer, while the .mht format only works for IE, but not for every other browser. Compare to the KDE's own single-file-web-page format "war" format, which only opens correctly on KDE, the HTML file with "data:" URI is more universally supported. The above command have many bugs. My commandline-fu is too limited to fix them: 1. it assume all URLs are relative URIs, thus works in this case: <img src="images/logo.png"/> but does not work in this case: <img src="http://www.my_web_site.com/images/logo.png" /> This may not be a bug, as full URIs perhaps should be ignored in many use cases. 2. it only work for images whoes file name suffix is one of .jpg, .gif, .png, albeit images with .jpeg suffix and those without extension names at all are legal to HTML. 3. image file name is not allowed to contain "(" even though frequently used, as in "(copy of) my car.jpg". Besides, neither single nor double quotes are allowed. 4. There is infact a big flaw in this, file names are actually used as regular expression to be replaced with base64 encoded content. This cause the script to fail in many other cases. Example: 'D:\images\logo.png', where backward slash have different meaning in regular expression. I don't know how to fix this. I don't know any command that can do full text (no regular expression) replacement the way basic editors like gedit does. 5. The original a.html are not preserved, so a user should make a copy first in case things go wrong.

4
2010-05-05 14:07:51

What Others Think

A braver soul could also try an mht2html converter script, for those non-IE user who received .mht file from other places and could not use them. I am not sure if it can be done in one line. I would be glad to write a script to do it and publish the script as a piece of software, if proven not possible to do it with commandline-fu.
zhangweiwu · 498 weeks ago
Brilliant. When turning it into a script, first ``cd `dirname $1```.
cayhorstmann · 219 weeks and 5 days ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: