Look up a unicode character by name

egrep -i "^[0-9a-f]{4,} .*$*" $(locate CharName.pm) | while read h d; do /usr/bin/printf "\U$(printf "%08x" 0x$h)\tU+%s\t%s\n" $h "$d"; done

[Update! Thanks to a tip from ioggstream, I've fixed both of the bugs mentioned below.] You, yes, 𝙔𝙊𝙐, can be the terror of the Internet! Why use normal, boring bullet points in your text, when you could use a ROTATED HEAVY BLACK HEART BULLET (❥)!? (Which would also be an awesome band name, by the way). This script makes it easy to find unusual characters from the command line. You can then cut and paste them or, if you're using a GTK application, type Control+Shift+U followed by the code point number (e.g., 2765) and then a SPACE. USAGE: Put this script in a file (I called mine "ugrep") and make it executable. Run it from the command line like so, ugrep heart The output will look like this, ☙ U+2619 REVERSED ROTATED FLORAL HEART BULLET ♡ U+2661 WHITE HEART SUIT ♥ U+2665 BLACK HEART SUIT ❣ U+2763 HEAVY HEART EXCLAMATION MARK ORNAMENT ❤ U+2764 HEAVY BLACK HEART ❥ U+2765 ROTATED HEAVY BLACK HEART BULLET ❦ U+2766 FLORAL HEART ❧ U+2767 ROTATED FLORAL HEART BULLET ⺖ U+2E96 CJK RADICAL HEART ONE ⺗ U+2E97 CJK RADICAL HEART TWO ⼼ U+2F3C KANGXI RADICAL HEART You can, of course, use regular expressions. For example, if you are looking for the "pi" symbol, you could do this: ugrep '\bpi\b' REQUIREMENTS: Although this is written in Bash, it assumes you have Perl installed because it greps through the Perl Unicode character name module (/usr/lib/perl5/Unicode/CharName.pm). Note that it would not have made more sense to write this in Perl, since the CharName.pm module doesn't actually include a subroutine for looking up a character based on the description. (Weird.) BUGS: In order to fit this script in the commandlinefu limits, a couple bugs were added. ① Astral characters beyond the BMP (basic multilingual plane) are not displayed correctly, but see below. ② Perl code from the perl module being grepped is sometimes extraneously matched. MISFEATURES: Bash's printf cannot, given a Unicode codepoint, print the resulting character to the terminal. GNU's coreutils printf (usually "/usr/bin/printf") can do so, but it is brokenly pedantic about how many hexadecimal digits follow the escape sequence and will actually die with an error if you give the wrong number. This is especially annoying since Unicode code points are usually variable length with implied leading zeros. The CharNames.pm file represents BMP characters as 4 hexits, but astral characters as 5. In the actual version of this script that I use, I've kludged around this misfeature by zero-padding to 8 hexits like so, /usr/bin/printf "\U$(printf "%08x" 0x$hex)" TIP 1: The author recommends "xsel" for command line cut-and-paste. For example, ugrep biohazard | xsel TIP 2: In Emacs, instead of running this command in a subshell, you can type Unicode code points directly by pressing Control-Q first, but you'll likely want to change the default input from octal to hexadecimal. (setq read-quoted-char-radix 16). TIP 3: Of course, if you're using X, and you want to type one of the more common unusual characters, it's easiest of all to do it with your Compose (aka Multi) key. For example, hitting [Compose] <3 types ♥.

Sample Output

[NOTE: commandlinefu is showing the first column as HTML entities, but in reality the literal character is shown.]
$ ugrep heart
&#9753;	U+2619	REVERSED ROTATED FLORAL HEART BULLET
&#9825;	U+2661	WHITE HEART SUIT
&#9829;	U+2665	BLACK HEART SUIT
&#10083;	U+2763	HEAVY HEART EXCLAMATION MARK ORNAMENT
&#10084;	U+2764	HEAVY BLACK HEART
&#10085;	U+2765	ROTATED HEAVY BLACK HEART BULLET
&#10086;	U+2766	FLORAL HEART
&#10087;	U+2767	ROTATED FLORAL HEART BULLET
&#11926;	U+2E96	CJK RADICAL HEART ONE
&#11927;	U+2E97	CJK RADICAL HEART TWO
&#12092;	U+2F3C	KANGXI RADICAL HEART

By: hackerb9

2010-12-31 16:47:59

egrep locate read bash perl grep unicode code points GNU coreutils

1 Alternatives + Submit Alt

Look up a unicode character by name

No need for further filedes or substitution for splitting. Simply use read a b
This is sample output - yours may be different.
1

grep -i "$*" /usr/lib/perl5/Unicode/CharName.pm | while read a b; do /usr/bin/printf "\u$a\tU+%s\t%s\n" "$b"; done

ioggstream · 2011-01-04 11:30:16 6

What Others Think

For some reason this module wasn't included with Ubuntu's perl, so I had to install it using sudo apt-get install libunicode-string-perl Then I had to run updatedb to let locate know where CharName.pm lives (well, actually it was getting late and I was lazy, so I let the updatedb cron job take care of this for me). I will suggest making a shell function of this:

ugrep() { exec 5< <(grep -i "$*" $(locate CharName.pm));while read <&5;do h=${REPLY%% *};/usr/bin/printf "\u$h\tU+%s\t%s\n"  "$h"  "${REPLY##$h }";done; }

The other thing that got me is that in unicode land, an umlaut is called a diaeresis. Anyway, nicely done.

bartonski · 694 weeks and 4 days ago

Thanks for the compliment, Bartonski, and the suggestion. I don't use shell functions because I don't like all my programs munged together in a single file. It's cleaner and easier to just toss shell fragments like this into a ~/bin directory and chmod 755 them. Also, although infinitesimal, each shell function in a .profile slows down the start up time of every login shell. If this was a command I use all the time and it was important that it not fork, for example "ls", I would use a shell function. But, for the other 99.9% of commands, I use script files so it will only be loaded when needed.

hackerb9 · 694 weeks and 3 days ago

P.S. If you find yourself needing umlauts, you'll definitely be happier once you enable a Compose key. Both Gnome and KDE have an option to do so in their keyboard control panels. If you don't use either of those, you can change it the "old fashioned" way, xmodmap -e "keysym Menu = Multi_key" That will let you hit (one at a time) the Menu key, a letter key, and then a double quote to get an umlaut. For example, Multi_key o " → ?. (And to get the right pointing arrow I just typed, I used Multi_key, minus, greater than).

hackerb9 · 694 weeks and 3 days ago

Somehow I have multiple copies of CharName.pm on my system, so I needed to limit the results to one: locate -l 1 CharName.pm Otherwise, great command.

billmakesbooks · 691 weeks and 1 day ago

Awesome command! By the way, if you want to have even more Unicode characters available via XCompose, check out this project to add more key bindings: https://github.com/kragen/xcompose

hfs · 641 weeks and 4 days ago

seofox · 116 weeks and 2 days ago

seofox · 115 weeks and 6 days ago

seofox · 115 weeks and 3 days ago

https://absolute-iran.com https://mymakeupstory-iran.com https://londonhousedoctor.com https://ayams.ir https://foroosh-page.ir https://digitalboost.ir https://like7.ir https://karetehran.ir https://seo7x.ir https://qoou.ir https://ayams.com https://tosifyan.ir https://karekaraj.ir https://sefidii.ir https://seo7x.com https://drinstagram.com https://tabliigh.ir https://ghoyesiah.ir https://denizpet.ir https://rezabahrami.net https://damunsabt.com https://damunsabt.ir

instakade1 · 104 weeks ago

rahimhh21 · 86 weeks and 4 days ago

Perfecthomepugs · 77 weeks and 4 days ago

themeoff.ir

themeoff · 74 weeks and 1 day ago

pugpuppies95 · 39 weeks and 6 days ago

Selamat datang di situs TIMUR188 Bagi kalian yang gemar bermain game online, kalian telah berada di tempat yang tepat. TIMUR188 adalah situs game online yang gacor dan terpercaya. Di sini, kalian dapat menikmati berbagai jenis permainan seru dan mendapatkan pengalaman bermain yang tak terlupakan. Untuk memulai bermain di TIMUR188, kalian hanya perlu mendaftar dan membuat akun secara gratis. Proses pendaftaran sangat mudah dan cepat, sehingga kalian tidak perlu repot-repot menunggu lama untuk bisa bermain. Setelah memiliki akun, kalian bisa melakukan deposit dengan mudah melalui berbagai metode pembayaran yang disediakan. TIMUR188 juga menyediakan layanan customer service yang siap membantu kalian 24 jam non-stop. Jika kalian mengalami kendala atau memiliki pertanyaan seputar situs ini, kalian bisa langsung menghubungi customer service melalui live chat atau kontak yang disediakan. TIMUR188 selalu siap memberikan solusi terbaik untuk kalian. Selain kelancaran akses, keamanan juga menjadi prioritas utama di situs ini. TIMUR188 menggunakan sistem keamanan yang canggih dan terpercaya untuk melindungi data pribadi dan transaksi kalian. Kalian dapat bermain dengan tenang tanpa khawatir data kalian akan disalahgunakan oleh pihak yang tidak bertanggung jawab. Jadi, tunggu apalagi? Bergabunglah sekarang juga di situs TIMUR188 dan nikmati pengalaman bermain game online yang gacor dan terpercaya. Dapatkan keseruan dan keuntungan yang tidak akan kalian temukan di situs lain. Jangan lewatkan kesempatan emas ini! Selamat bermain Salap Jp dari TIMUR188

projack · 12 weeks and 6 days ago

SENSA838 merupakan penyedia situs slot maxwin dan slot gacor terbaik hari ini, yang menyediakan rtp live slot online yang selalu update setiap hari sehingga meningkatkan tingkat winrate kalian Untuk memulai bermain di SENSA838, kalian hanya perlu mendaftar dan membuat akun secara gratis. Proses pendaftaran sangat mudah dan cepat, sehingga kalian tidak perlu repot-repot menunggu lama untuk bisa bermain. Setelah memiliki akun, kalian bisa melakukan deposit dengan mudah melalui berbagai metode pembayaran yang disediakan.

missmawar222 · 6 weeks and 1 day ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands

Similar Commands

Add strikethrough to text

Find UTF-8 text files misinterpreted as ISO 8859-1 due to Byte Order Mark (BOM) of the Unicode Standard.

A formatting test for David Winterbottom: improving commandlinefu for submitters

prepare unicode text saved from Microsoft Excel 2003 for unix console

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

» all commands
» commands with 3 up-votes (commandlinefu3)
» commands with 10 up-votes (commandlinefu10)