Convert Word Document to HTML using Python

Word to HTML Python

Word to HTML conversion is required in various cases, such as for embedding the document’s content on the web pages. In this article, you will learn how to convert MS Word DOCX or DOC documents to HTML using Python. Moreover, you will learn how to control the conversion of Word to HTML dynamically using different options.

Python Word to HTML Converter API

In order to convert Word documents to HTML, we will use Aspose.Words for Python. It is a powerful and feature-rich API for creating and manipulating Word documents. Also, it provides a high-fidelity conversion of Word documents to other formats. Aspose.Words for Python is available on PyPI and you can install it using the following pip command.

pip install aspose-words

Convert a Word Document to HTML in Python

The following are the steps to convert a Word document to an HTML file using Python.

  • Load the Word document using Document class.
  • Create an object of HtmlSaveOptions class.
  • Enable export of font resources using HtmlSaveOptions.export_font_resources property.
  • Convert Word document to HTML using Document.save() method.

The following code sample shows how to convert a DOCX file to HTML in Python.

Customize Word to HTML Conversion in Python

Aspose.Words for Python also provides different options to customize the Word to HTML conversion. For example, you can convert documents with round-trip information, specify the folder to save the resource files, and so on.

Convert a Word Document with Round-trip Information

HTML doesn’t support all the features provided by MS Word, therefore, to mimic the Word document in HTML we need to save additional information termed as round-trip information. The following are the steps to turn on the export of round-trip information in Word to HTML conversion.

  • Load the Word document using Document class.
  • Create an object of HtmlSaveOptions class and set HtmlSaveOptions.export_roundtrip_information property to true.
  • Convert Word document to HTML using Document.save() method and pass HTML file’s name and HtmlSaveOptions as parameters.

The following code sample shows how to export round-trip information in Word to HTML conversion.

Word to HTML: Specify a Folder for Resources

You can also specify a folder where you want to store all the resources such as images, CSS files, and fonts. For this, you can use HtmlSaveOptions.export_font_resources property. You can also specify separate folders for fonts and images using HtmlSaveOptions.fonts_folder and HtmlSaveOptions.images_folder properties, respectively. The following are the steps to use a separate folder to save resources in Word to HTML conversion.

  • Load the Word document using Document class.
  • Create an object of HtmlSaveOptions class and set HtmlSaveOptions.export_font_resources property to true.
  • Specify the name of the resource folder using HtmlSaveOptions.resource_folder property.
  • Convert Word document to HTML using Document.save() method and pass HTML file’s name and HtmlSaveOptions as parameters.

The following code sample shows how to specify a resource folder in Word to HTML conversion.

Get a Free API License

You can get a temporary license in order to use Aspose.Words for Python without evaluation limitations.

Info: You may be interested in another Python API (Aspose.Slides for Python via NET) that allows you to convert presentations to images and import images into presentations.

Conclusion

In this article, you have learned how to convert Word documents to HTML using Python. Moreover, you have seen how to customize the Word to HTML conversion dynamically. Besides, you can explore other features of Aspose.Words for Python using the documentation. Also, you can ask your questions via our forum.

See Also