We can collect and organize notes in the form of text, drawings, screen clippings, and audio commentaries in the OneNote document. We may occasionally need to extract text or images from OneNote documents programmatically in Java applications. Such extraction allows us to reuse the extracted text or images separately. In this article, we will learn how to extract text or images from OneNote documents using Java.
The following topics shall be covered in this article:
- Java API to Extract Text or Images from OneNote
- Extract All the Text from OneNote using Java
- Get Text from Specific Pages of OneNote in Java
- Extract Images from OneNote using Java
Java API to Extract Text or Images from OneNote
For extracting text and images from the OneNote document, we will be using the Aspose.Note for Java API. It allows creating, reading, and converting OneNote documents programmatically without using MS OneNote. Please either download the JAR of the API or add the following pom.xml configuration in a Maven-based Java application.
<repository>
<id>AsposeJavaAPI</id>
<name>Aspose Java API</name>
<url>http://repository.aspose.com/repo/</url>
</repository>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-note</artifactId>
<version>22.1</version>
<classifier>jdk17</classifier>
</dependency>
Extract All the Text from OneNote Document using Java
We can easily extract all the text from the OneNote document by following the steps given below:
- Firstly, load a OneNote file using the Document class.
- After that, call the GetChildNodes method with RichText.class as an argument to extract text.
- Finally, show the extracted text.
The following code sample shows how to extract all the text from a OneNote file using Java.
Get Text from Specific Pages of OneNote Document in Java
We can extract text from specific pages of the OneNote document by following the steps given below:
- Firstly, load a OneNote file using the Document class.
- Next, call the GetChildNodes method with Page.class as an argument to extract pages.
- Then, get a specific page by its index from list of pages.
- After that, get a list of text items for the page using the GetChildNodes method with RichText.class as an argument.
- Finally, show the extracted text.
The following code sample shows how to extract text from a specific page of a OneNote file using Java.
We may iterate over all the pages one by one and extract the text for each page as shown in the code sample given below:
Extract Images from OneNote Document using Java
We can also extract images from the OneNote document by following the steps given below:
- Firstly, load a OneNote file using the Document class.
- After that, get a list of images using the GetChildNodes method with the Image.class as an argument.
- Finally, show the image properties and save to local disk.
The following code sample shows how to extract images from a OneNote file using Java.
Get a Free License
You can get a free temporary license to try the library without evaluation limitations.
Conclusion
In this article, we have learned how to extract text from the whole OneNote document or from a specific page of the document. We have also seen how to extract images from OneNote documents programmatically. Besides, you can learn more about Aspose.Note for Java API using the documentation. In case of any ambiguity, please feel free to contact us on the forum.