We can automate MS Word to create new Word documents (DOC or DOCX), edit or modify the existing ones, or convert them into other formats without using Microsoft Office. Python MS Word automation allows performing all the actions programmatically that we can perform through the user interface of MS Word. In this article, we will learn how to automate MS Word to create, edit, or convert Word documents using Python.
This article covers all the basic features required for generating and manipulating Word documents programmatically using Python. This article includes the following topics:
- Python MS Word Automation API to Create, Edit, or Convert Word Documents
- Create Word Documents
- Edit or Modify Word Documents
- Find and Replace Text in Word Documents
- Convert Word Documents
- Parse Word Documents
Python MS Word Automation API to Create, Edit, or Convert Word Documents
For automating Word, we will be using Aspose.Words for Python API. It is a complete and feature-rich Word automation solution to create, edit, or analyze Word documents programmatically. The Document class of the API represents a Word document. The API provides the DocumentBuilder class that offers various methods to insert text, images, and other content in the document. This class also allows specifying the font, paragraph, and section formatting. The Run class of the API represents a run of characters with the same font formatting. Please install the library in your Python application from PyPI using the following pip command.
pip install aspose-words
Create Word Documents using Python
We can create Word documents programmatically by following the steps given below:
- Firstly, create an instance of the Document class.
- Next, create an instance of the DocumentBuilder class with the Document object as an argument.
- After that, insert/write elements to add some text, paragraphs, tables, or images using the DocumentBuilder object.
- Finally, call the save() method with the output file path as an argument to save the created file.
The following code sample shows how to create a Word document (DOCX) using Python.
Edit or Modify Word Documents using Python
In the previous section, we created a Word document. Now, let’s edit it and change the content of the document. We can edit Word documents by following the steps given below:
- Firstly, load an existing Word document using the Document class.
- Next, access the specific section by its index.
- Then, access the first paragraph content as an object of the Run class.
- After that, set the text to update for the accessed paragraph.
- Finally, call the save() method with the output file path to save the updated file.
The following code sample shows how to edit a Word document (DOCX) using Python.
Find and Replace Text in Word Documents using Python
We can also find any text and replace it with a new text by following the steps given below:
- Firstly, load a Word document using the Document class.
- Next, create an instance of the FindReplaceOptions class.
- After that, call the replace() method. It takes the search string, the replace string, and the FindReplaceOptions object as arguments.
- Finally, call the save() method with the output file path to save the updated file.
The following code sample shows how to find and replace specific text in a Word document (DOCX) using Python.
Convert Word Documents using Python
We can convert Word documents to other formats such as PDF, XPS, EPUB, HTML, JPG, PNG, etc. Please follow the steps given below to convert a Word document to an HTML web page:
- Firstly, load a Word document using the Document class.
- Next, create an instance of the HtmlSaveOptions class with the Document object as an argument.
- After that, specify the css_style_sheet_type, export_font_resources, resource_folder, and alias properties.
- Finally, call the save() method with the output file path and HtmlSaveOptions object as arguments to save the converted HTML file.
The following code sample shows how to convert a Word document (DOCX) to HTML using Python.
Similarly, we can also convert Word documents to other supported formats. Please read more about how to convert Word to EPUB, Word to PDF, Word document to Markdown, Word to JPG, or PNG images in the documentation.
Parse Word Documents using Python
We can parse Word documents and extract the content as plain text by following the steps given below:
- Firstly, load a Word document using the Document class.
- Next, extract and print the text.
- Finally, call the save() method to save the Word document as a text file. This method takes the path of the output file as an argument.
The following code sample shows how to parse a Word document (DOCX) using Python.
Get a Free License
You can get a free temporary license to try the library without evaluation limitations.
Conclusion
In this article, we have learned how to:
- automate MS Word using Python;
- create and edit Word documents programmatically;
- parse or convert DOCX files;
- find and replace text in Word documents using Python.
Besides, you can learn more about Aspose.Words for Python API using the documentation. In case of any ambiguity, please feel free to contact us on the forum.