PDFs to TXTs in Ubuntu

Technology Software

By Technology Last updated Saturday, May/26/2029

Obtaining pdftotext

Obtain the appropriate packages and command "pdftotext" from the Ubuntu libraries via the command:

sudo apt-get install poppler-utils

Ensure that the package installs correctly before attempting to use it.

pdftotext Man Page

Learn how the pdftotext command works and familiarize yourself with the command line options available. Look at the man page for the command entering "man pdftotext" at the command line shell prompt, and hit "Enter". The command line options consist of letters, prefixed by a dash, such as "-l", and they all provide different functions.

The standard command for pdftotext is "pdftotext <pdffile> <textfile>" (without quotes) where <pdffile> is the name of the PDF file to extract, such as "report.pdf" and <textfile> is the name of the text output file, such as "report.txt". You can use any name of your choice.

Batch PDF Conversion

Test the command by trying it on a few PDF files individually. If it is okay you may want to try using it on a number of PDF files in shell scripts to automate the process. An example of a typical script is shown below:

for i in *.pdf

do

pdftotext $i $i.txt

done

This script takes all of the PDF files in the current directory and exports them with their name to a text file, so "report.pdf" would become "report.pdf.txt"

Protected PDF Files

Some PDFs are protected either with passwords or set up to prevent export of text from the document. This is an attempt to protect copyright and if this is the case perhaps you had better reconsider the conversion from a legal perspective. If you have the password for a PDF file, this can be passed in the command line options for "pdftotext".

Source...

Linux Unix Apache BSD Debian Oracle Open Source Websphere Email Servers Cisco Samba BEA Weblogic technology Microsoft Access Excel ffice Powerpoint Word Oracle

Get the latest news, exclusives, sport, celebrities, showbiz, politics, business and lifestyle from The iFocus,Stay informed and read the latest news today from The iFocus, the definitive source.

PDFs to TXTs in Ubuntu

Obtaining pdftotext

pdftotext Man Page

Batch PDF Conversion

Protected PDF Files