Text
Extract Text (OCR)
Pull text out of images and scanned PDFs — entirely in your browser. Nothing is uploaded.
Scanned contracts, photographed receipts, screenshots with text you can't select — OCR turns them back into copyable, searchable words. This tool runs Tesseract.js inside a Web Worker entirely in your browser, so no image or document ever leaves your device. The LSTM-based engine handles printed text in nine languages with high accuracy, and the recognised text can be copied directly or downloaded as a plain .txt file.
How to Use
- 1
Drop an image (PNG, JPEG, WebP) or a scanned PDF onto the upload area.
- 2
Choose the language of the text in your document from the dropdown.
- 3
Click "Extract Text" — recognition runs locally and typically takes a few seconds per page.
- 4
Copy the extracted text directly, or click "Download .txt" to save it as a file.
Frequently Asked Questions
How accurate is the recognition?
For clear, high-resolution printed text, accuracy typically exceeds 95%. Blurry or low-res scans produce more errors. For scanned PDFs, render them first using the PDF to JPG tool at 2× or 3× scale to maximise accuracy.
Does it handle handwriting?
Tesseract is optimised for printed text. Very neat, large handwriting may partially recognise, but accuracy on handwriting is generally much lower than on printed documents.
Why does it take a moment before it starts?
The language model is about 4 MB and is downloaded once on first use. Subsequent runs use the cached version and start immediately.
Which languages are supported?
English, French, German, Spanish, Portuguese, Italian, Simplified Chinese, Japanese, and Arabic. Select your language from the dropdown before extracting.