Extract Images from Word Documents in Python

extract images from word documents in python

A picture is worth a thousand words. This is the reason images are an integral part of documents, specifically Word documents. The images are used to make the content more attractive and eye-catching. When parsing Word documents, you may come across the scenario where you need to extract images. To achieve this programmatically, this article covers how to extract images from Word documents in Python.

Info: If you ever need to get a Word document from a PowerPoint presentation, you can use Aspose Presentation to Word Document converter.

Python Library to Extract Images from Word Documents

Aspose.Words for Python is a powerful and feature-rich library that is used to create and manipulate Word documents. We will use this library to extract images from DOCX or DOC files. You can install it in your Python applications from PyPI using the following pip command.

pip install aspose-words

Extracting Images from Word Documents in Python

The images in Word documents are represented by the shape nodes. Therefore, to retrieve images from a document, you will have to parse the shapes. The following steps show how to extract images from a Word document in Python.

  • First, load the Word document using Document class.
  • Then, retrieve all the shapes into an object using Document.get_child_nodes(NodeType.SHAPE, True) method.
  • Loop through the shapes and for each shape, perform the following operations:
    • Cast the shape into Shape type using as_shape() method.
    • Check if shape has image using Shape.has_image() method.
    • Save the shape as an image using Shape.image_data.save(string) method.

The following code sample shows how to extract images from a DOCX document in Python.

Get a Free API License

You can get a temporary license to use Aspose.Words for Python without evaluation limitations.

Conclusion

Images are commonly used in Word documents to make the content more appealing. In various cases, images are also required to be extracted from the documents along with the text. Therefore, in this article, you have learned how to extract images from Word documents in Python. Besides this, you can explore the documentation of Aspose.Words for Python. In case you would have any questions, feel free to let us know via our forum.

See Also