Convert PDF to Text on Linux

Convert PDF to Text on Linux

Some people might encounter the experience that they can’t view a pdf file they want from a terminal when no GUI is available. This is when pdftotext comes in handy. This utility allows you to export a pdf file to a plain text format and view from any text editor. It also allows you to export only certain parts of the pdf file. Pdftotext should be installed by default on Ubuntu hardy. But just in case it is not on your machine, you can install it easily:
user@www$ sudo apt-get install poppler-utils
Now you are ready to convert the pdf files. Convert a file myfile.pdf to myfile.txt:
user@www$ pdftotext myfile.pdf myfile.txt
You can also omit the last parameter:
user@www$ pdftotext myfile.pdf
and pdftotext should be smart enough to figure out the new file name is myfile.txt by default. Specifies to convert from page 2 onwards:
user@www$ pdftotext -f 2 myfile.pdf
Specifies to convert up to page 3:
user@www$ pdftotext -l 3 myfile.pdf
Set the end of line format to either unix, dos or mac:
user@www$ pdftotext -eol unix myfile.pdf
To see more help for pdftotext:
user@www$ man pdftotext

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!