GREP a PDF file.

pdftotext [file] - | grep 'YourPattern'
PDF files are simultaneously wonderful and heinous. They are wonderful in being ubiquitous and mostly being cross platform. They are heinous in being very difficult to work with from the command line, search, grep, use only the text inside the PDF, or use outside of proprietary products. xpdf is a wonderful set of PDF tools. It is on many linux distros and can be installed on OS X. While primarily an open PDF viewer for X, xpdf has the tool "pdftotext" that can extract formated or unformatted text from inside a PDF that has text. This text stream can then be further processed by grep or other tool. The '-' after the file name directs output to stdout rather than to a text file the same name as the PDF. Make sure you use version 3.02 of pdftotext or later; earlier versions clipped lines. The lines extracted from a PDF without the "-layout" option are very long. More paragraphs. Use just to test that a pattern exists in the file. With "-layout" the output resembles the lines, but it is not perfect. xpdf is available open source at http://www.foolabs.com/xpdf/

27
By: drewk
2010-02-14 21:42:35

3 Alternatives + Submit Alt

What Others Think

While reading the content under the topic GREP a PDF file I got a clear idea about the xpdf file which is explained in such a way that everything is very well explained. Thanks for the information you have shared here diamonds rings for cheap
Kaitlyn · 28 weeks and 5 days ago
How to grep a PDF national margarita day file and such details are given here for your kind information. These details given here will be helpful for all those people who are working in the IT field and studying in this field.
leonamargret · 4 weeks and 1 day ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: