PDF Text Extractor

Pull all the text out of a PDF and copy or download it as a .txt file.

Ad placeholder (leaderboard)

The PDF Text Extractor opens any PDF in your browser and pulls out every line of selectable text, ready to copy to your clipboard or download as a clean .txt file. It is built for the everyday job of getting words out of a PDF — quoting a contract, moving a report into a document, feeding text to a translator or an AI assistant, or searching a paper you can only read but not select. Because everything happens locally, even confidential PDFs stay on your machine: there is no upload, no sign-up and no server involved at any point.

Unlike a naive “dump the bytes” converter, this tool reconstructs readable, ordered text. PDFs do not store paragraphs — they store thousands of tiny positioned glyph runs, often in a scrambled internal order. The extractor reads each run’s coordinates, groups them into lines, sorts lines top-to-bottom and runs left-to-right, and inserts spaces where there are real gaps. The result reads the way the page looks, including multi-column layouts and headings, rather than a jumble of fragments.

How it works

  1. Open a PDF. The file is read into memory in your browser. A bundled PDF engine parses the document structure — no network request is made.
  2. Pick a scope. Extract the whole document or a page expression like 1-3,5,8-. You can also toggle Preserve layout (keep line and column structure), Page markers (insert --- Page N --- separators) and Re-join hyphens (stitch words split across line wraps).
  3. Get your text. The combined text appears in a panel with a live word and character count, plus a per-page length breakdown that flags any image-only pages. One click copies everything; another downloads a .txt. Your last set of options is remembered for next time.

Example

Suppose you have a 12-page invoice PDF and only need the line-item table on pages 4 to 6. Type 4-6 in the Pages box, leave Preserve layout on so the columns stay aligned, and the tool returns just those three pages of text. The header shows something like 3 of 12 pages · 512 words · 3,140 chars. Click Copy all text and paste it straight into a spreadsheet or email. If page 6 turns out to be a scanned signature image, it is reported as empty in the per-page breakdown so you know nothing was silently dropped.

For a research paper, turning on Re-join hyphens converts wrapped words such as micro-\nscope back into microscope, which makes the downloaded text searchable and clean for citation. Every figure and every character is produced in your browser — nothing is sent anywhere.

Ad placeholder (rectangle)