Images are commonly used to represent important information in Word documents. The inclusion of images alongside text makes the content more appealing. In certain cases, you may need to extract the images embedded within the Word documents programmatically. To achieve that, this article covers how to extract images from Word documents using Java.
Java API to Extract Images from Word Documents
Aspose.Words for Java is a powerful and feature-rich API for creating, manipulating, and converting MS Word documents. Therefore, we will use this API to extract images from MS Word DOCX/DOC documents. You can download the API’s JAR or install it into your Java application using the following Maven configurations.
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>21.11</version>
<type>pom</type>
</dependency>
How to Extract Images from a Word Document
The images in a Word document are represented using shape objects. Therefore, to retrieve images, you will have to process every shape in the document. The following are the steps to extract images from a Word DOCX document in Java.
- First, load the Word file using Document class.
- Then, get all the shapes into an NodeCollection<Shape> object using Document.getChildNodes(NodeType.SHAPE, Boolean) method.
- Loop through the retreieved shapes.
- In each iteration, check if the shape has an image using Shape.hasImage() method.
- Finaly, extract the image and save it using Shape.getImageData().save(string) method.
The following code sample shows how to extract images from a DOCX document in Java.
Get a Free API License
Get a free temporary license to use Aspose.Words for Java without evaluation limitations.
Info: You may be interested in another Java API (Aspose.Slides for Java) that allows you to convert presentations (into PDFs, word documents, etc.) and import images or other documents into presentations.
Conclusion
In this article, you have learned how to extract images from a Word document using Java. Moreover, the code sample has shown how to extract the images from a DOCX file and save them to the desired location. Besides, Aspose.Words for Java provides a wide range of features for document manipulation. To explore those features, you can visit the documentation. Also, you can ask your questions via our forum.