commandlinefu.com is the place to record those command-line gems that you return to again and again.
You can sign-in using OpenID credentials, or register a traditional username and password.
Subscribe to the feed for:
There are 10 alternatives - vote for the best!
PDF files are simultaneously wonderful and heinous. They are wonderful in being ubiquitous and mostly being cross platform. They are heinous in being very difficult to work with from the command line, search, grep, use only the text inside the PDF, or use outside of proprietary products.
xpdf is a wonderful set of PDF tools. It is on many linux distros and can be installed on OS X. While primarily an open PDF viewer for X, xpdf has the tool "pdftotext" that can extract formated or unformatted text from inside a PDF that has text. This text stream can then be further processed by grep or other tool. The '-' after the file name directs output to stdout rather than to a text file the same name as the PDF.
Make sure you use version 3.02 of pdftotext or later; earlier versions clipped lines.
The lines extracted from a PDF without the "-layout" option are very long. More paragraphs. Use just to test that a pattern exists in the file. With "-layout" the output resembles the lines, but it is not perfect.
xpdf is available open source at http://www.foolabs.com/xpdf/
This is a good alternative to pdf2text for Ubuntu. To install it:
sudo apt-get install python-pdfminer
If you can do better, submit your command here.
You must be signed in to comment.