ugrep heart
The output will look like this,
☙ U+2619 REVERSED ROTATED FLORAL HEART BULLET
♡ U+2661 WHITE HEART SUIT
♥ U+2665 BLACK HEART SUIT
❣ U+2763 HEAVY HEART EXCLAMATION MARK ORNAMENT
❤ U+2764 HEAVY BLACK HEART
❥ U+2765 ROTATED HEAVY BLACK HEART BULLET
❦ U+2766 FLORAL HEART
❧ U+2767 ROTATED FLORAL HEART BULLET
⺖ U+2E96 CJK RADICAL HEART ONE
⺗ U+2E97 CJK RADICAL HEART TWO
⼼ U+2F3C KANGXI RADICAL HEART
You can, of course, use regular expressions. For example, if you are looking for the "pi" symbol, you could do this:
ugrep '\bpi\b'
REQUIREMENTS: Although this is written in Bash, it assumes you have Perl installed because it greps through the Perl Unicode character name module (/usr/lib/perl5/Unicode/CharName.pm). Note that it would not have made more sense to write this in Perl, since the CharName.pm module doesn't actually include a subroutine for looking up a character based on the description. (Weird.)
BUGS: In order to fit this script in the commandlinefu limits, a couple bugs were added. ① Astral characters beyond the BMP (basic multilingual plane) are not displayed correctly, but see below. ② Perl code from the perl module being grepped is sometimes extraneously matched.
MISFEATURES: Bash's printf cannot, given a Unicode codepoint, print the resulting character to the terminal. GNU's coreutils printf (usually "/usr/bin/printf") can do so, but it is brokenly pedantic about how many hexadecimal digits follow the escape sequence and will actually die with an error if you give the wrong number. This is especially annoying since Unicode code points are usually variable length with implied leading zeros. The CharNames.pm file represents BMP characters as 4 hexits, but astral characters as 5. In the actual version of this script that I use, I've kludged around this misfeature by zero-padding to 8 hexits like so,
/usr/bin/printf "\U$(printf "%08x" 0x$hex)"
TIP 1: The author recommends "xsel" for command line cut-and-paste. For example,
ugrep biohazard | xsel
TIP 2: In Emacs, instead of running this command in a subshell, you can type Unicode code points directly by pressing Control-Q first, but you'll likely want to change the default input from octal to hexadecimal. (setq read-quoted-char-radix 16).
TIP 3: Of course, if you're using X, and you want to type one of the more common unusual characters, it's easiest of all to do it with your Compose (aka Multi) key. For example, hitting [Compose] <3 types ♥.
[NOTE: commandlinefu is showing the first column as HTML entities, but in reality the literal character is shown.] $ ugrep heart ☙ U+2619 REVERSED ROTATED FLORAL HEART BULLET ♡ U+2661 WHITE HEART SUIT ♥ U+2665 BLACK HEART SUIT ❣ U+2763 HEAVY HEART EXCLAMATION MARK ORNAMENT ❤ U+2764 HEAVY BLACK HEART ❥ U+2765 ROTATED HEAVY BLACK HEART BULLET ❦ U+2766 FLORAL HEART ❧ U+2767 ROTATED FLORAL HEART BULLET ⺖ U+2E96 CJK RADICAL HEART ONE ⺗ U+2E97 CJK RADICAL HEART TWO ⼼ U+2F3C KANGXI RADICAL HEART
No need for further filedes or substitution for splitting. Simply use read a b
Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?
You must be signed in to comment.
commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.
Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.
» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10
Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):
Subscribe to the feed for:
sudo apt-get install libunicode-string-perl
Then I had to run updatedb to let locate know where CharName.pm lives (well, actually it was getting late and I was lazy, so I let the updatedb cron job take care of this for me). I will suggest making a shell function of this:ugrep() { exec 5< <(grep -i "$*" $(locate CharName.pm));while read <&5;do h=${REPLY%% *};/usr/bin/printf "\u$h\tU+%s\t%s\n" "$h" "${REPLY##$h }";done; }
The other thing that got me is that in unicode land, an umlaut is called a diaeresis. Anyway, nicely done.xmodmap -e "keysym Menu = Multi_key"
That will let you hit (one at a time) the Menu key, a letter key, and then a double quote to get an umlaut. For example, Multi_key o " → ?. (And to get the right pointing arrow I just typed, I used Multi_key, minus, greater than).locate -l 1 CharName.pm
Otherwise, great command.