PDF is one of the most commonly used formats for sending the document out to third parties. The reason behind this popularity is PDF’s compatibility across multiple platforms regardless of any hardware/software requirements. However, in some cases, you would want to convert the PDF document into an editable document format. PDF to DOC or DOCX format could be the priority conversion option in such cases. In order to automate the conversion process, this article showcases how to convert PDF to Word programmatically in Java.
So in this article, you will get to know how to:
- Convert PDF to DOC using Java.
- Convert PDF to DOCX format using Java.
- Customize PDF to Word (DOC/DOCX) conversion.
Java PDF to Word Converter Library
Thanks to Aspose.PDF for Java – a PDF manipulation Java API that provides easy ways to convert PDF files to a variety of other formats including PDF to DOC and PDF to DOCX. You can download and add API’s JAR file to your project or reference it using the following Maven configurations:
Repository
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>https://repository.aspose.com/repo/</url>
</repository>
Dependency
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>19.12</version>
</dependency>
Convert PDF to DOC using Java
Once you have referenced Aspose.PDF for Java in your application, you can convert any PDF document to DOC format in a couple of lines of code. The following are the steps required to perform this conversion.
- Create an instance of the Document class and initialize it with the input PDF file’s path.
- Call Document.save() method with the output DOC file’s name and SaveFormat.Doc arguments.
The following code sample shows how to convert PDF to DOC in Java.
Input PDF Document
Output Word Document
Convert PDF to DOCX using Java
DOCX is a well-known format for Word documents and in contrast to the DOC format, the structure of DOCX was based on the binary as well as the XML files. In case you want to convert PDF to DOCX format, you can tell the API to do so using the SaveFormat.DocX argument in Document.save() method.
The following code sample shows how to convert PDF to DOCX in Java.
Convert PDF to Word with Additional Options
Aspose.PDF for Java also provides some additional options that you can use in PDF to Word conversion, such as the output format, image resolution, distance between text lines and so on. DocSaveOptions class is used for this purpose and the following is the list of options you can use:
- setFormat(int value) – To set the output format (Doc, Docx, etc.).
- setAddReturnToLineEnd(boolean value) – To add the paragraph or line breaks.
- setImageResolutionX(int value) – To set the X resolution for the images.
- setImageResolutionY(int value) – To set the Y resolution for the images.
- setMaxDistanceBetweenTextLines(float value) – To group text lines into paragraphs.
- setMode(int value) – To set recognition mode.
- setRecognizeBullets(boolean value) – To switch the recognition of bullets on.
- setRelativeHorizontalProximity(float value) – To set the width of space between different text elements in the input PDF file.
The following code sample shows how to use DocSaveOptions class in PDF to DOCX conversion using Java.
Conclusion
In this article, you have learned how easy it is to convert PDF documents to Word formats using Java. You can either convert PDF to DOC or PDF to DOCX based on your requirements. Furthermore, additional features to customize the PDF to Word DOC/DOCX conversion have also been discussed. You can learn more about converting PDF to other formats from the documentation.