mirroring / copy whole website for offline use localls with wget
--mirror >>> all pages --random-wait >>> makes it look like not a bot --recursive >>> all pages, follow links ? robots=off >>> ignore no robots request -U mozilla >>> makes it look like real user on a browser not commandline -R >>> reject these file types we only want html -c >>>continue. In case you had to stop the wget you can pick it right back up! --reject-regex '((.*)\?(.*))|(.*)' >>> skip urls with parameters (don't download the same page a million times -
Download all pdf files off of a website using wget. You can change the file type to download, changing the extension, as an example you can change pdf for txt in command. Show Sample Output
Bash process substitution which curls the website 'hashbang.sh' and executes the shell script embedded in the page. This is obviously not the most secure way to run something like this, and we will scold you if you try. The smarter way would be: Download locally over SSL > curl https://hashbang.sh >> hashbang.sh Verify integrty with GPG (If available) > gpg --recv-keys 0xD2C4C74D8FAA96F5 > gpg --verify hashbang.sh Inspect source code > less hashbang.sh Run > chmod +x hashbang.sh > ./hashbang.sh
Download latest NVIDIA Geforce x64 Windows7-8 driver from Nvidia's website. Pulls the latest download version (which includes beta). This is the "English" version. The following command includes a 'sed' line to replace "english" with "international" if needed. You can also replace the starting subdomain with "eu." "uk." and others. Enjoy this one liner! 1 character under the max :)
wget "us.download.nvidia.com$(wget -qO- "$(wget -qO- "nvidia.com/Download/processFind.aspx?psid=95&pfid=695&osid=19&lid=1&lang=en-us" | awk '/driverResults.aspx/ {print $4}' | cut -d "'" -f2 | head -n 1)" | awk '/url=/ {print $2}' | sed -e "s/english/international/" | cut -d '=' -f3 | cut -d '&' -f1)"
Show Sample Output
This will download all files of the type specified after "-A" from a website. Here is a breakdown of the options: -r turns on recursion and downloads all links on page -l1 goes only one level of links into the page(this is really important when using -r) -H spans domains meaning it will download links to sites that don't have the same domain -nd means put all the downloads in the current directory instead of making all the directories in the path -A mp3 filters to only download links that are mp3s(this can be a comma separated list of different file formats to search for multiple types) -e robots=off just means to ignore the robots.txt file which stops programs like wget from crashing the site... sorry http://example/url lol..
This command downloads the actual 20 most popular pictures from the website 500px. It uses a random name due to the fact the the pictures in 500px are stored with the same name. UPDATED: doesn't work if no referrer is specified: --referer='http://500px.com/'
?mirror : turn on options suitable for mirroring. -p : download all files that are necessary to properly display a given HTML page. ?convert-links : after the download, convert the links in document for local viewing. -P ./LOCAL-DIR : save all the files and directories to the specified directory.
yt-mp3chanrip() { for count in 1 51 101 151 201 251 301; do for i in $(curl -s http://gdata.youtube.com/feeds/api/users/"$1"/uploads\?start-index="$count"\&max-results=50 | grep -Eo "watch\?v=[^[:space:]\"\'\\]{11}" | uniq); do ffmpeg -i $(wget http://youtube.com/"$i" -qO- | sed -n "/fmt_url_map/{s/[\'\"\|]/\n/g;p}" | sed -n '/^fmt_url_map/,/videoplayback/p' | sed -e :a -e '$q;N;5,$D;ba' | tr -d '\n' | sed -e 's/\(.*\),\(.\)\{1,3\}/\1/') -vn -ab 128k "$(youtube-dl -e http://youtube.com/"$i").mp3"; done; done; unset count i; }
create the function and run with
yt-mp3chanrip YoutubeUsername
Great for channels like ukfDrumAndBass that only post music. No more need for third party browser plugins or websites that only convert one vid one at a time. It'll convert and save to CWD up to 300 of a user's videos to mp3s, one at a time. To increase, just increment the $count pattern. This is a concoction from commands #7718 and #7752, so it uses ffmpeg wget, curl, sed, and youtube-dl -- youtube-dl is only used to get the title of the video which it uses to name the mp3 file. You can use a different naming method if you want and the function should still work.
Download Websites to 5 Level and browse offline! -k -> convert-links (to browse offline) -r -> recursive download -l 5 -> level 5 example. http://gentoo-install.com :-)
Replace *** with the appropiate values
miss a class at UTOSC2010? need a refresher? use this to curl down all the presentations from the UTOSC website. (http://2010.utosc.com) NOTE/WARNING this will dump them in the current directory and there are around 37 and some are big - tested on OSX10.6.1 Show Sample Output
Refresh the cache of font directory , usefull after you download font (.ttf or other) from various website and you don't want to reboot or relogin . Close your word processor before using the command , after the refresh reopen your word processor , new fonts is avaible ! Show Sample Output
commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.
Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.
» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10
Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):
Subscribe to the feed for: