Commands tagged regex (60)

  • If you want all the URLs from all the sessions, you can use : perl -lne 'print for /url":"\K[^"]+/g' ~/.mozilla/firefox/*/sessionstore.js Thanks to tybalt89 ( idea of the "for" statement ). For perl purists, there's JSON and File::Slurp modules, buts that's not installed by default.


    0
    perl -lne 'print for /url":"\K[^"]+/g' $(ls -t ~/.mozilla/firefox/*/sessionstore.js | sed q)
    sputnick · 2009-12-14 00:51:54 2
  • Say you have a directory structure like "foo/, foo/data/, bar/, bar/data/". If you just want to ignore 'bar/data' and you use "ack --ignore-dir=data pattern" it will ignore both foo/data and bar/data and 'ignore-data=bar/data' etc won't work.


    0
    ack -a -G '^(?!.*bar/data.*).*$' pattern
    rkulla · 2010-05-10 00:13:11 4
  • I used to do a lot of path manipulation to set up my development environment (PATH, LD_LIBRARY_PATH, etc), and one part of my environment wasn't always aware of what the rest of the environment needed in the path. Thus resetting the entire PATH variable wasn't an option; modifying it made sense. The original version of the functions used sed, which turned out to be really slow when called many times from my bashrc, and it could take up to 10 seconds to login. Switching to parameter substitution sped things up significantly. The commands here don't clean up the path when they are done (so e.g. the path gets cluttered with colons). But the code is easy to read for a one-liner. The full function looks like this: remove_path() { eval PATHVAL=":\$$1:" PATHVAL=${PATHVAL//:$2:/:} # remove $2 from $PATHVAL PATHVAL=${PATHVAL//::/:} # remove any double colons left over PATHVAL=${PATHVAL#:} # remove colons from the beginning of $PATHVAL PATHVAL=${PATHVAL%:} # remove colons from the end of $PATHVAL export $1="$PATHVAL" } append_path() { remove_path "$1" "$2" eval PATHVAL="\$$1" export $1="${PATHVAL}:$2" } prepend_path() { remove_path "$1" "$2" eval PATHVAL="\$$1" export $1="$2:${PATHVAL}" } I tried using regexes to make this into a cleaner one-liner, but remove_path ended up being cryptic and not working as well: rp() { eval "[[ ::\$$1:: =~ ^:+($2:)?((.*):$2:)?(.*):+$ ]]"; export $1=${BASH_REMATCH[3]}:${BASH_REMATCH[4]}; }; Show Sample Output


    0
    rp() { local p; eval p=":\$$1:"; export $1=${p//:$2:/:}; }; ap() { rp "$1" "$2"; eval export $1=\$$1$2; }; pp() { rp "$1" "$2"; eval export $1=$2:\$$1; }
    cout · 2010-07-15 18:52:01 79
  • grep multiline in Perl regexp syntax with pcregrep Show Sample Output


    0
    pcregrep --color -M -N CRLF "owa_pattern\.\w+\W*\([^\)]*\)" source.sql
    hute37 · 2010-11-11 12:53:40 10

  • 0
    perl -e "tr/[A-Z]/[a-z]/;" -pi.save $(find . -type f)
    miccaman · 2010-11-25 12:48:39 4
  • Some source package have many 'README' kind of files, among many other regular files/directories. This command could be useful when one wants to list only 'README' kind of files among jungle of other files. (e.g. I came across this situation after downloading source for module-init-tools) Warning: This command would miss a file like => README.1 (or one with spaces in-between) Corrections welcome. Show Sample Output


    0
    ls | grep '^[A-Z0-9]*$'
    b_t · 2010-12-19 21:45:53 4
  • # Search for an available package on Debian systems using a regex so it only matches packages starting with 'tin'.


    0
    aptitude search ^tin
    defiantredpill · 2011-10-20 17:51:36 3
  • Uses sed with a regex to move the linenumbers to the line end. The plain regex (w/o escapes) looks like that: ^([^:]*):(.*) Show Sample Output


    0
    grep -n log4j MainPm.java | sed -e 's/^\([^:]*\):\(.*\)/\2 \1/'
    bash_vi · 2011-10-21 12:50:30 3

  • 0
    yum remove \*.i\?86
    niek · 2011-10-27 13:56:24 3
  • With thanks to dew on Efnet's #regex, back in 2005. This version indents subsequent lines after the first by one space, to make paragraphs visibly obvious -- remove the \3 to prevent this behavior. Lines are only broken at spaces: long strings with no spaces will not wrap, so URLs are safe. Replace the "75"s to make the regex linewrap to other amounts. From the unix commandline, "fold" is likely your better choice, but this snippet is handy in editors which allow regular expressions, in scripting, and other such situations where "fold" is unavailable. Show Sample Output


    0
    s/(?=.{75,})(?:(.{0,75})(?:\r\n?|\n\r?)|(.{0,75}))[ ]/\1\2\n /g
    DewiMorgan · 2012-02-28 02:27:20 3
  • Sometimes, especially when parsing HTML, you want "all text between two tags, that doesn't contain another tag". For example, to grab only the contents of the innermost <div>s, something like: /<div\b[^>]*>((?:(?!<div).)*)</div>/ ...may be your best option to capture that text. It's not always needed, but is a powerful arrow in your regex quiver in those cases when you do need it. Note that, in general, regular expressions are the Wrong Choice for parsing HTML, anyway. Better approaches are solutions which let you navigate the HTML as a proper DOM. But sometimes, you just need to use the tools available to you. If you don't, then you have two problems.


    0
    Opening_tag((?:(?!Unwanted_tag).)*)Closing_tag
    DewiMorgan · 2012-02-28 02:54:57 3
  • People are *going* to post the wrong ways to do this. It's one of the most common form-validation tasks, and also one of the most commonly messed up. Using a third party tool or library like exim means that you are future-proofing yourself against changes to the email standard, and protecting yourself against the fact that actually checking whether an email address is valid is *not possible*. Still, perhaps your boss is insisting you really do need to check them internally. OK. Read the RFCs. The bet before the @ is specified by RFC2821 and RFC2822. The domain name part is specified by RFC1035, RFC1101, RFC1123 and RFC2181. Generally, when people say "email address", they mean that part of the address that the RFC terms the "addr-spec": the "blah@domain.tld" address, with no display names, comments, quotes, etc. Also "root@localhost" and "root" should be invalid, as should arbitrary addressing schemes specified by a protocol indicator, like "jimbo@myprotocol:foo^bar^baz". So... With the smallest poetic license for readability (allowing underscores in domain names so we can use "\w" instead of "[a-z0-9]"), the RFCs give us: ^(?:"(?:[^"\\]|\\.)+"|[-^!#\$%&'*+\/=?`{|}~.\w]+)@(?=.{3,255}$)(?:[\w][\w-]{0,62}\.){1,128}[\w][\w-]{0,62}$ Not perfect, but the best I can come up with, and most compliant I've found. I'd be interested to see other people's ideas, though. It's still not going to verify you an address fersure, properly, 100% guaranteed legit, though. What else can you do? Well, you could also: * verify that the address is either a correct dotted-decimal IP, or contains letters. * remove reserved domains (.localhost, .example, .test, .invalid), reserved IP ranges, and so forth from the address. * check for banned domains (whitehouse.gov, example.com...) * check for known TLDs including alt tlds. * see if the domain has an MX record set up: if so, connect to that host, else connect to the domain. * see if the given address is accepted by the server as a recipient or sender (this fails for yahoo.*, which blocks after a few attempts, assuming you are a spammer, and for other domains like rediffmail.com, home.com). But these are moving well out of the realm of generic regex checks and into the realm of application-specific stuff that should be done in code instead - especially the latter two. Hopefully, this is all you needed to point out to your boss "hey, email validation this is a dark pit with no bottom, we really just want to do a basic check, then send them an email with a link in it: it's the industry standard solution." Of course, if you want to go nuts, here's an idea that you could do. Wouldn't like to do it myself, though: I'd rather just trust them until their mail bounces too many times. But if you want it, this (untested) code checks to see if the mail domain works. It's based on a script by John Coggeshall and Jesse Houwing that also asked the server if the specific email address existed, but I disliked that idea for several reasons. I suspect: it will get you blocked as a spambot address harvester pretty quick; a lot of servers would lie to you; it would take too much time; this way you can cache domains marked as "OK"; and I suspect it would add little to the reliability test. // Based on work by: John Coggeshall and Jesse Houwing. // http://www.zend.com/zend/spotlight/ev12apr.php mailRegex = '^(?:"(?:[^"\\\\]|\\\\.)+"|[-^!#\$%&\'*+\/=?`{|}~.\w]+)'; mailRegex .= '@(?=.{3,255}$)(?:[\w][\w-]{0,62}\.){1,128}[\w][\w-]{0,62}$'; function ValidateMail($address) {   global $mailRegex; // Yes, globals are evil. Put it inline if you want.   if (!preg_match($mailRegex)) {     return false;   }   list ( $localPart, $Domain ) = split ("@",$Email);   // connect to the first available MX record, or to domain if no MX record.   $ConnectAddress = new Array();   if (getmxrr($Domain, $MXHost)) {     $ConnectAddress = $MXHost;   } else {     $ConnectAddress[0] = $Domain;   }   // check all MX records in case main server is down - may take time!   for ($i=0; $i < count($ConnectAddress); $i++ ) {     $Connect = fsockopen ( $ConnectAddress[$i], 25 );     if ($Connect){       break;     }   }   if ($Connect) {     socket_set_blocking($Connect,0);     // Only works if socket_blocking is off.     if (ereg("^220", $Out = fgets($Connect, 1024))) {       fclose($Connect); // Unneeded, but let's help the gc.       return true;     }     fclose($Connect); // Help the gc.   }   return false; } Show Sample Output


    0
    perl -e "print 'yes' if `exim -bt $s_email_here | grep -c malformed`;"
    DewiMorgan · 2012-02-28 04:42:41 3
  • Reciprocally, we could get the node name from a give Tor IP address => ip2node() { curl -s -d "QueryIP=$1" http://torstatus.blutmagie.de/tor_exit_query.php | grep -oP "Server name:.*'>\K\w+" ; } ip2node 204.8.156.142 BostonUCompSci Show Sample Output


    0
    curl -s -d "CSField=Name" -d "CSInput=BostonUCompSci" http://torstatus.blutmagie.de/index.php | grep -oP "ip=\K(\d+)(\.\d+){3}"
    JisSey · 2012-03-09 16:52:27 3
  • Runs a diff on two files ignore comments and blank lines (diff -I=RE does not work as expected). Adapted from a post found on stackexchange.


    0
    diff -u <(grep -vE '^(#|$)' file1) <(grep -vE '^(#|$)' file2)
    taintedkernel · 2013-02-12 13:59:39 36

  • 0
    diff -BI '^#' file{1,2}
    zlemini · 2013-02-17 11:53:41 38
  • set BLOCK to "title" or any other HTML / RSS / XML tag and curl URL to get everything in-between e.g. some text


    0
    curl ${URL} 2>/dev/null|grep "<${BLOCK}>"|sed -e "s/.*\<${BLOCK}\>\(.*\)\<\/${BLOCK}\>.*/\1/g"
    c3w · 2013-08-31 14:53:54 0
  • Will find and list all core files from the current directory on. You can pass | xargs rm -i to be prompted for the removal if you'd like to double check before removal.


    0
    find . -type f -regex ".*/core.[0-9][0-9][0-9][0-9]$"
    H3liUS · 2014-01-17 16:44:47 7
  • Usage: command | hl 'regex'


    0
    hl() { while read -r; do printf '%s\n' "$(perl -p -e 's/('"$1"')/\a\e[7m$1\e[0m/g' <<< "$REPLY")"; done; }
    nyuszika7h · 2014-08-05 22:29:08 8
  • Thx Mass1 for the sharing


    0
    ifconfig | egrep -A2 "eth|wlan" | tr -d "\n"| sed 's/\-\-/\n/g'|awk '{print "mac: "$5 " " $7}' | sed 's/addr:/addr: /g'
    Koobiac · 2014-09-11 08:18:18 9
  • Its possible to user a simple regex to extract de username from the finger command. The final echo its optional, just for remove the initial space Show Sample Output


    0
    finger $(whoami) | egrep -o 'Name: [a-zA-Z0-9 ]{1,}' | cut -d ':' -f 2 | xargs echo
    swebber · 2014-09-24 01:22:07 9

  • 0
    finger $(whoami) | perl -ne '/Name: ([a-zA-Z0-9 ]{1,})/ && print "$1\n"'
    zil0g · 2014-09-30 11:37:47 11
  • Command returns valid IP addresses. Append the following regex to additionally filter out NAT and reserved IP addresses | grep -Ev "^0|\.0[0-9]|^10\.|^127\.|^169\.|^172\.(1[6-9]|2[0-9]|3[01])|^192.168.|^2(2[4-9]|3[0-9])|^2(4[0-9]|5[0-5])"


    0
    grep -Eoa "\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" Filetosearch.txt
    jsbrown · 2014-11-02 19:50:54 8
  • I personally like it very much and have wrapped it into a function, named "apt-propos" ;), also you can use --names-only option for a sort-of "apt-whatis"


    0
    apt-cache search byo | sed "s/^\([[:alnum:]\.-]*\) - /\1=%%%=- /" | column -s '=%%%=' -t
    VoidDroid · 2015-10-18 10:55:34 11
  • seed the random number generator, find all matches in a file put all matches from the capture group into an array return a random element from the array Show Sample Output


    0
    awk 'BEGIN{srand()} match($0, /DELTA=([0-9]+);/, a) {w[i++]=a[1]} END {print w[int(rand()*i)]}' file.name
    jkirchartz · 2015-11-13 17:56:34 11

  • 0
    echo "File;Creator;Producer";find . -name '*.pdf' -print0 | while IFS= read -d $'\0' line;do echo -n "$line;";pdfinfo "$line"|perl -ne 'if(/^(Creator|Producer):\s*(.*)$/){print"$2";if ($1 eq "Producer"){exit}else{print";"}}';echo;done 2>/dev/null
    langec · 2020-12-17 21:08:55 936
  •  < 1 2 3 > 

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands


Check These Out

list block level layout

awk date convert
Convert readable date/time with `date` command

Alias TAIL for automatic smart output
Run the alias command, then issue $ps aux | tail and resize your terminal window (putty/console/hyperterm/xterm/etc) then issue the same command and you'll understand. $ ${LINES:-`tput lines 2>/dev/null||echo -n 12`} Insructs the shell that if LINES is not set or null to use the output from `tput lines` ( ncurses based terminal access ) to get the number of lines in your terminal. But furthermore, in case that doesn't work either, it will default to using the default of 80. The default for TAIL is to output the last 10 lines, this alias changes the default to output the last x lines instead, where x is the number of lines currently displayed on your terminal - 7. The -7 is there so that the top line displayed is the command you ran that used TAIL, ie the prompt. Depending on whether your PS1 and/or PROMPT_COMMAND output more than 1 line (mine is 3) you will want to increase from -2. So with my prompt being the following, I need -7, or - 5 if I only want to display the commandline at the top. ( http://www.askapache.com/linux/bash-power-prompt.html ) 275MB/748MB [7995:7993 - 0:186] 06:26:49 Thu Apr 08 [askapache@n1-backbone5:/dev/pts/0 +1] ~ $ In most shells the LINES variable is created automatically at login and updated when the terminal is resized (28 linux, 23/20 others for SIGWINCH) to contain the number of vertical lines that can fit in your terminal window. Because the alias doesn't hard-code the current LINES but relys on the $LINES variable, this is a dynamic alias that will always work on a tty device.

generate a unique and secure password for every website that you login to
usage: sitepass MaStErPaSsWoRd example.com description: An admittedly excessive amount of hashing, but this will give you a pretty secure password, It also eliminates repeated characters and deletes itself from your command history. tr '!-~' 'P-~!-O' # this bit is rot47, kinda like rot13 but more nerdy rev # this avoids the first few bytes of gzip payload, and the magic bytes.

Route outbound SMTP connections through a addtional IP address rather than your primary

View all images
So you are in directory with loads of pictures laying around and you need to quickly scan through them all

Renaming a file without overwiting an existing file name
Sometimes in a hurry you may move or copy a file using an already existent file name. If you aliased the cp and mv command with the -i option you are prompted for a confirmation before overwriting but if your aliases aren't there you will loose the target file! The -b option will force the mv command to check if the destination file already exists and if it is already there a backup copy with an ending ~ is created.

Extract the MBR ID of a device
Useful when you want to know the mbrid of a device - for the purpose of making it bootable. Certain hybridiso distros, for eg the OpenSUSE live ISO uses the mbrid to find the live media. Use this command to find out the mbrid of your USB drive and then edit the /grub/mbrid file to match it.

Convert CSV to JSON
Replace 'csv_file.csv' with your filename.

Find files that were modified by a given command
Traces the system calls of a program. See http://linuxhelp.blogspot.com/2006/05/strace-very-powerful-troubleshooting.html for more information.


Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: