Detect encoding of a text file

file -i <textfile>
This command gives you the charset of a text file, which would be handy if you have no idea of the encoding.
Sample Output
file -i data.sql
data.sql: text/x-c charset=iso-8859-1

By: juvenn
2009-09-08 01:33:19

These Might Interest You

  • Just for fun, I searched a simple way to encrypt some text. Simple base64 encoding seemed a good start so I decided to "amplify" encoding using repeted base64 encoding. Of course, this is not really secure but can be useful to hide datas to most part of humans ;). Do not hesitate to provide better solutions or else. Show Sample Output

    str=password; for i in `seq 1 10`; do echo -e "$str\n"; str="$(base64 <<< $str)"; done
    n3wborn · 2011-10-04 18:01:54 0
  • converts encoding of a file to unix utf-8 useful for data files that contain what would be usable ascii text but are encoded as mpeg or some other encoding that prevents you from doing common manipulations like 'sed' Show Sample Output

    ex some_file "+set ff=unix fileencoding=utf-8" "+x"
    nottings · 2009-02-19 16:23:21 1

  • 4
    iconv -f utf8 -t utf16 /path/to/file
    imsaar · 2009-12-01 21:02:58 0
  • Takes all the .3gp files in the directory, rotates them by 90 degrees, and saves them in the lossless ffv1 encoding. If this rotates in the wrong direction, you may want transponse=1 Re-encoding to ffv1 may result in a significant increase in file size, as it is a lossless format. Other applications may not recognize ffv1 if they don't use ffmpeg code. "huffyuv" might be another option for lossless saving of your transformations. The audio may be re-encoded as well, if the encoding used by your 3gp file doesn't work in a avi container.

    mkdir rotated; for v in *.3gp; do ffmpeg -i $v -vf transpose=2 -vcodec ffv1 rotated/${v/3gp/avi} ; done
    keturn · 2012-02-04 18:20:04 0

What Others Think

this command get the wrong charset from time to time.For example when I create a text file encoded with GBK (an charset for simplified Chinese) and test it with file command , it gets iso-8859-1,which is encoding for western text. I found this problem when I try to code a shell scripts for converting the charset of a text file without knowing the current charset in advance. Maybe we should try enca.
digglife · 371 weeks ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this? is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.


Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: