Convert Scanned PDF to Searchable PDF with OCR in Java

Scanned to Searchable PDF Java

Sometimes the PDF files are created using pictures from a scanner or camera device. In certain situations, you might need to convert a scanned PDF file to a searchable PDF file with OCR, so that you can work with text contents in the PDF file. In accordance with that, this article covers how to convert a scanned PDF to a searchable PDF by OCR feature programmatically using Java.

Scanned PDF to Searchable PDF by OCR – Java API Installation

You can optically recognize the text in a PDF file with the OCR feature using Aspose.OCR for Java API. Simply install the API by downloading the JAR file from the New Releases section, or using the Maven specifications below:

Repository

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java API</name>
    <url>http://repository.aspose.com/repo/</url>
</repository>

Dependency

<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-ocr</artifactId>
    <version>21.12</version>
</dependency>

Convert Scanned PDF to Searchable PDF Programmatically using Java

You can recognize the contents of a scanned PDF file with OCR. This enables you to convert a scanned PDF file to a Searchable PDF document with the following steps:

  1. Create a AsposeOcr class object.
  2. Recognize the data from scanned PDF with RecognizePdf method.
  3. Set page numbers for OCR recognition using the DocumentRecognitionSettings class.
  4. Save output OCR result as a searchable PDF file.

The following code snippet elaborates how to convert a scanned PDF to a searchable PDF file programmatically in Java:

Get Free Evaluation License

You can evaluate the feature to recognize text in scanned PDF with OCR operations without any limitations by requesting a free temporary license.

Conclusion

In this article, you have learned how to convert a scanned PDF file to a searchable PDF document with the OCR feature programmatically in Java. Furthermore, you can take a look at other OCR-related features of the API by visiting the documentation. Please feel free to write to us at the forum in case of any concerns.

See Also

Recognize Text by Performing OCR on Image from URL with Java