Commands by DewiMorgan (8)

  • People are *going* to post the wrong ways to do this. It's one of the most common form-validation tasks, and also one of the most commonly messed up. Using a third party tool or library like exim means that you are future-proofing yourself against changes to the email standard, and protecting yourself against the fact that actually checking whether an email address is valid is *not possible*. Still, perhaps your boss is insisting you really do need to check them internally. OK. Read the RFCs. The bet before the @ is specified by RFC2821 and RFC2822. The domain name part is specified by RFC1035, RFC1101, RFC1123 and RFC2181. Generally, when people say "email address", they mean that part of the address that the RFC terms the "addr-spec": the "blah@domain.tld" address, with no display names, comments, quotes, etc. Also "root@localhost" and "root" should be invalid, as should arbitrary addressing schemes specified by a protocol indicator, like "jimbo@myprotocol:foo^bar^baz". So... With the smallest poetic license for readability (allowing underscores in domain names so we can use "\w" instead of "[a-z0-9]"), the RFCs give us: ^(?:"(?:[^"\\]|\\.)+"|[-^!#\$%&'*+\/=?`{|}~.\w]+)@(?=.{3,255}$)(?:[\w][\w-]{0,62}\.){1,128}[\w][\w-]{0,62}$ Not perfect, but the best I can come up with, and most compliant I've found. I'd be interested to see other people's ideas, though. It's still not going to verify you an address fersure, properly, 100% guaranteed legit, though. What else can you do? Well, you could also: * verify that the address is either a correct dotted-decimal IP, or contains letters. * remove reserved domains (.localhost, .example, .test, .invalid), reserved IP ranges, and so forth from the address. * check for banned domains (whitehouse.gov, example.com...) * check for known TLDs including alt tlds. * see if the domain has an MX record set up: if so, connect to that host, else connect to the domain. * see if the given address is accepted by the server as a recipient or sender (this fails for yahoo.*, which blocks after a few attempts, assuming you are a spammer, and for other domains like rediffmail.com, home.com). But these are moving well out of the realm of generic regex checks and into the realm of application-specific stuff that should be done in code instead - especially the latter two. Hopefully, this is all you needed to point out to your boss "hey, email validation this is a dark pit with no bottom, we really just want to do a basic check, then send them an email with a link in it: it's the industry standard solution." Of course, if you want to go nuts, here's an idea that you could do. Wouldn't like to do it myself, though: I'd rather just trust them until their mail bounces too many times. But if you want it, this (untested) code checks to see if the mail domain works. It's based on a script by John Coggeshall and Jesse Houwing that also asked the server if the specific email address existed, but I disliked that idea for several reasons. I suspect: it will get you blocked as a spambot address harvester pretty quick; a lot of servers would lie to you; it would take too much time; this way you can cache domains marked as "OK"; and I suspect it would add little to the reliability test. // Based on work by: John Coggeshall and Jesse Houwing. // http://www.zend.com/zend/spotlight/ev12apr.php mailRegex = '^(?:"(?:[^"\\\\]|\\\\.)+"|[-^!#\$%&\'*+\/=?`{|}~.\w]+)'; mailRegex .= '@(?=.{3,255}$)(?:[\w][\w-]{0,62}\.){1,128}[\w][\w-]{0,62}$'; function ValidateMail($address) {   global $mailRegex; // Yes, globals are evil. Put it inline if you want.   if (!preg_match($mailRegex)) {     return false;   }   list ( $localPart, $Domain ) = split ("@",$Email);   // connect to the first available MX record, or to domain if no MX record.   $ConnectAddress = new Array();   if (getmxrr($Domain, $MXHost)) {     $ConnectAddress = $MXHost;   } else {     $ConnectAddress[0] = $Domain;   }   // check all MX records in case main server is down - may take time!   for ($i=0; $i < count($ConnectAddress); $i++ ) {     $Connect = fsockopen ( $ConnectAddress[$i], 25 );     if ($Connect){       break;     }   }   if ($Connect) {     socket_set_blocking($Connect,0);     // Only works if socket_blocking is off.     if (ereg("^220", $Out = fgets($Connect, 1024))) {       fclose($Connect); // Unneeded, but let's help the gc.       return true;     }     fclose($Connect); // Help the gc.   }   return false; } Show Sample Output


    0
    perl -e "print 'yes' if `exim -bt $s_email_here | grep -c malformed`;"
    DewiMorgan · 2012-02-28 04:42:41 0
  • With code, the only way to have spaces parsed correctly in any kind of portable way is to use ... but then long lines will not wrap. That's kinda important for low-res screens or smaller windows. Ideally, code blocks would be wrapped to the screen width, with line numbers and syntax hilighted, so that if someone does "view source", they'd see the unadulterated code for cutting and pasting. Then have all formatting done by CSS and javascript, such that: - if someone copies the text as displayed in any browser, they'll get properly formatted code in their clipboard, without wrapping - it fails gracefully so that it looks at least reasonable in all browsers, even those that don't know CSS/JS and can't do colours (eg lynx, screen readers) If anyone knows a way, that would make me happy. Until then, I am stuck with the above as the best I can do. For example, in LiveJournal, something like this: <div width="100%" style="(the code above)"><pre>Code goes here</pre> ... will look considerably better and more readable than the default <blockquote><pre></pre></blockquote>. It's not perfect, of course. If you have enough control to create your own css file, you should definitely do that instead.


    0
    overflow:auto;padding:5px;border-style:double;font-weight:bold;color:#00ff00;background-color:0;"><pre style="white-space:pre-wrap;white-space:-moz-pre-wrap !important;white-space:-pre-wrap;white-space:-o-pre-wrap;word-wrap:break-word;_white-space:pre;
    DewiMorgan · 2012-02-28 04:14:11 0
  • Sometimes, you don't want to just replace the spaces in the current folder, but through the whole folder tree - such as your whole music collection, perhaps. Or maybe you want to do some other renaming operation throughout a tree - this command's useful for that, too. To rename stuff through a whole directory tree, you might expect this to work: for a in `find . -name '* *'`;do mv -i "$a" ${a// /_};done No such luck. The "for" command will split its parameters on spaces unless the spaces are escaped, so given a file "foo bar", the above would not try to move the file "foo bar" to "foo_bar" but rather the file "foo" to "foo", and the file "bar" to "bar". Instead, find's -execdir and -depth arguments need to be used, to set a variable to the filename, and rename files within the directory before we rename the directory. It has to be -execdir and won't work with just -exec - that would try to rename "foo bar/baz quux" to "foo_bar/baz_quux" in one step, rather than going into "foo bar/", changing "baz quux" to "baz_quux", then stepping out and changing "foo bar/" into "foo_bar/". To rename just files, or just directories, you can put "-type f" or "-type d" after the "-depth" param. You could probably safely replace the "mv" part of the line with a "rename" command, like rename 'y/ /_/' *, but I haven't tried, since that's way less portable.


    1
    find . -depth -name '* *' -execdir bash \-c 'a="{}";mv -f "$a" ${a// /_}' \;
    DewiMorgan · 2012-02-28 04:03:40 1
  • Sometimes, I just want to back up a single client's databases. Fortunately, all clients have a set prefix to their database names. This makes my life easy! I just use 'CLIENTNAME_%' as my MYSQL_PATTERN in this command, and my life is suddenly easy. mysqldump params: -e - (optional) use extended insert. -B - what follows is a list of databases -v - (optional) give verbose output mysql params: -N - don't write column names in output (prevents us trying to back up a database called "Database").


    0
    mysqldump -eBv `echo "show databases like 'MYSQL_PATTERN'"|mysql -N`> OUTPUTFILE
    DewiMorgan · 2012-02-28 03:28:43 0
  • The normal output of 'diff' is a wonderful thing. But just sometimes, you want something that is a little more... well... readable. This is that command. -d - (optional) find the minimal set of changes -b - (optional) ignore changes in the amount of whitespace -B - (optional) ignore changes that just insert or delete blank lines -y - this is where the magic happens! Use the side-by-side output format. -w $COLUMNS - more magic! Instead of using 80 columns, use the current width of the terminal.


    0
    diff -dbByw $COLUMNS FILE1 FILE2
    DewiMorgan · 2012-02-28 03:19:20 0

  • 0
    :g/SEARCH/s//REPLACEMENT/g
    DewiMorgan · 2012-02-28 03:14:11 0
  • Sometimes, especially when parsing HTML, you want "all text between two tags, that doesn't contain another tag". For example, to grab only the contents of the innermost <div>s, something like: /<div\b[^>]*>((?:(?!<div).)*)</div>/ ...may be your best option to capture that text. It's not always needed, but is a powerful arrow in your regex quiver in those cases when you do need it. Note that, in general, regular expressions are the Wrong Choice for parsing HTML, anyway. Better approaches are solutions which let you navigate the HTML as a proper DOM. But sometimes, you just need to use the tools available to you. If you don't, then you have two problems.


    0
    Opening_tag((?:(?!Unwanted_tag).)*)Closing_tag
    DewiMorgan · 2012-02-28 02:54:57 0
  • With thanks to dew on Efnet's #regex, back in 2005. This version indents subsequent lines after the first by one space, to make paragraphs visibly obvious -- remove the \3 to prevent this behavior. Lines are only broken at spaces: long strings with no spaces will not wrap, so URLs are safe. Replace the "75"s to make the regex linewrap to other amounts. From the unix commandline, "fold" is likely your better choice, but this snippet is handy in editors which allow regular expressions, in scripting, and other such situations where "fold" is unavailable. Show Sample Output


    0
    s/(?=.{75,})(?:(.{0,75})(?:\r\n?|\n\r?)|(.{0,75}))[ ]/\1\2\n /g
    DewiMorgan · 2012-02-28 02:27:20 0

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands


Check These Out

Reinstall a Synology NAS without loosing any data from commandline.
Seen many questions how-to reinstall synology nas dsm without loosing data, here you go. Wait for a few min and then head over to http://nasip and setup your fresh installed nas.

Export all Mailman mailing lists Members to separate .txt files
Export all Mailman mailing lists Members to separate .txt files excluding "Mailman" and "Test" or add yours by && $1!="myDontWannaList"

Purge configuration file of all desinstalled package
From: http://www.debian-administration.org/users/fsateler/weblog/4

Zip each file in a directory individually with the original file name
This will list the files in a directory, then zip each one with the original filename individually. video1.wmv -> video1.zip video2.wmv -> video2.zip This was for zipping up large amounts of video files for upload on a Windows machine.

find all active IP addresses in a network
Have to run as superuser... but easier and more informational if you are looking for actual devices. Need to install arp-scan.

See entire packet payload using tcpdump.
This command will show you the entire payload of a packet. The final "s" increases the snaplength, grabbing the whole packet.

A signal trap that logs when your script was killed and what other processes were running at that time
trap is the bash builtin that allows you to execute commands when the current script receives a particular signal. Uses $0 for the script name, $$ for the script PID, tee to output to STDOUT as well as a log file and ps to log other running processes.

View non-printing characters with cat
Useful to detect number of tabs in an empty line, DOS newline (carriage return + newline). A tool that can help you understand why your parsing is not working.

gain all mp3s in subfolders w/o encoding
This will search all subfolders for mp3's and gain them to more or less sane defaults (without reencoding). http://mp3gain.sourceforge.net/ required!

Use /dev/full to test language I/O-failsafety
The Linux /dev/full file simulates a "disk full" condition, and can be used to verify how a program handles this situation. In particular, several programming language implementations do not print error diagnostics (nor exit with error status) when I/O errors like this occur, unless the programmer has taken additional steps. That is, simple code in these languages does not fail safely. In addition to Perl, C, C++, Tcl, and Lua (for some functions) also appear not to fail safely.


Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: