Microsoft Word file formats DOC/DOCX are famous because the word processor supports a variety of features to organize and explain information. Likewise, HTML file format is helpful to show information in web applications. In this article, you will be learning Word files (DOC/DOCX) to HTML or HTML5 conversion using Java. Following are the use cases that you will be exploring here:
- Convert Word (DOC/DOCX) to HTML using Java
- Convert DOCX to HTML5 using Java
- Convert Password-Protected Word file to HTML using Java
- Convert Word to MHTML using Java
Java DOCX to HTML or HTML5 Converter – Installation
First things first, you can easily configure Aspose.Words for Java API in your applications. You can download the JAR file from new Releases section where all APIs are updated almost every month. Moreover, all of the Java APIs, offered by Aspose, are hosted over the Maven repository. Likewise, Aspose.Words for Java dependency can be defined in your Maven project with the following configurations:
Repository
<repositories>
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
</repositories>
Dependency
<dependencies>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>20.6</version>
<classifier>jdk17</classifier>
</dependency>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>20.6</version>
<classifier>javadoc</classifier>
</dependency>
</dependencies>
Now we are all set for DOCX to HTML conversion in a Java application.
Convert Word (DOC/DOCX) to HTML using Java
You can convert Word to HTML by following the steps below:
- Load source Word file with DOC or DOCX extension
- Save the file as output HTML
The code sample below shows how to convert DOCX to HTML using Java:
Input DOCX file Preview
Output HTML file Preview
So you can notice the high fidelity of document rendering with these screenshots. The API is capable of converting text, images, tables, and much more.
Convert DOCX to HTML5 using Java
HTML5 is the latest version of HTML. We have noted repeated requests for supporting HTML5 in Aspose.Words API. Therefore, DOCX to HTML5 conversion is supported and you can convert files by following steps:
- Firstly, load input DOCX file
- Set HtmlSaveOptions while setting SaveFormat
- Set enumeration value of HtmlVersion.HTML_5
- Save output file
The code snippet below shows how to convert DOCX to HTML5 in Java:
Convert Password-Protected Word file to HTML using Java
DOC or DOCX files are sometimes password protected or encrypted using a password. You can also convert such files to HTML. However, you will need the password while loading the word file. You can follow the steps below for DOCX to HTML conversion:
- Firstly, initialize an object of LoadOptions class
- Set the password
- Load the encrypted DOCX file
- Convert DOCX to HTML
Likewise, the following code sample shows how to convert password protected DOCX file to HTML using Java:
Convert Word to MHTML using Java
MHTML files are single files that contain embedded contents and media. You can convert word files (DOC/DOCX) to MHTML with following steps:
- Load input DOCX file
- Save output MHTML file using SaveFormat.MHTML
The code snippet below is based on this steps. Therefore, it shows how to convert DOCX to MHML with Java:
Conclusion
Concludingly, we have learned conversion of word documents without needing Microsoft Word. For example, DOCX to HTML, MHTML, or HTML5 as per your requirements. Likewise, we have observed with screenshots that the conversion is performed with high fidelity and compatibility between the file formats. So you can try the API in your own Java environment. However, if you face any problem while setting up or testing the API then you can get in touch with us via Free Support Forums!