While parsing the PDF files, you may need to extract images along with text from the documents. In order to perform this operation programmatically, this article covers how to extract images from PDF documents using Java. The steps by step guide along with API references and code sample demonstrate the complete image extraction procedure.
Java API to Extract Images from PDF
In order to extract images from PDF, we’ll use Aspose.PDF for Java. It is a powerful PDF manipulation API that provides a wide range of features to create and process PDF files. You can either download the API or install it using the following Maven configuration.
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>21.5</version>
</dependency>
Extract Images from a PDF Document
The following are the steps to extract images from a PDF document using Java.
- Load the PDF document using the Document class.
- Iterate through the page collection of the document returned by Document.getPages() method.
- For each Page, loop through the collection of XImage it has using Page.getResources().getImages() method.
- Create an object of FileOutputStream to save each image.
The following code sample shows how to extract images from a PDF document.
Get a Free License
You can use Aspose.PDF for Java without evaluation limitations using a temporary license.
Conclusion
In various cases, images are required to be extracted from PDF documents. To achieve this, in this article, you have learned how to extract images from PDF files using Java. You can explore more about the Java PDF API using the documentation. Also, you can post your queries on our forum.