Extract Text or Images from OneNote Documents using Java

Extract Text or Images from OneNote Documents using Java

We can collect and organize notes in the form of text, drawings, screen clippings, and audio commentaries in the OneNote document. We may occasionally need to extract text or images from OneNote documents programmatically in Java applications. Such extraction allows us to reuse the extracted text or images separately. In this article, we will learn how to extract text or images from OneNote documents using Java.

The following topics shall be covered in this article:

Java API to Extract Text or Images from OneNote

For extracting text and images from the OneNote document, we will be using the Aspose.Note for Java API. It allows creating, reading, and converting OneNote documents programmatically without using MS OneNote. Please either download the JAR of the API or add the following pom.xml configuration in a Maven-based Java application.

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java API</name>
    <url>http://repository.aspose.com/repo/</url>
</repository>
<dependency>
    <groupId>com.aspose</groupId>
    <artifactId>aspose-note</artifactId>
    <version>22.1</version>
    <classifier>jdk17</classifier>
</dependency>

Extract All the Text from OneNote Document using Java

We can easily extract all the text from the OneNote document by following the steps given below:

  1. Firstly, load a OneNote file using the Document class.
  2. After that, call the GetChildNodes method with RichText.class as an argument to extract text.
  3. Finally, show the extracted text.

The following code sample shows how to extract all the text from a OneNote file using Java.

Extract All the Text from OneNote Document using Java
Extract All the Text from OneNote Document using Java

Get Text from Specific Pages of OneNote Document in Java

We can extract text from specific pages of the OneNote document by following the steps given below:

  1. Firstly, load a OneNote file using the Document class.
  2. Next, call the GetChildNodes method with Page.class as an argument to extract pages.
  3. Then, get a specific page by its index from list of pages.
  4. After that, get a list of text items for the page using the GetChildNodes method with RichText.class as an argument.
  5. Finally, show the extracted text.

The following code sample shows how to extract text from a specific page of a OneNote file using Java.

Extract Text from a Specific Page of OneNote Document in Java

We may iterate over all the pages one by one and extract the text for each page as shown in the code sample given below:

Get Text from Specific Pages of OneNote Document in Java
Get Text from all the Pages one by one in Java

Extract Images from OneNote Document using Java

We can also extract images from the OneNote document by following the steps given below:

  1. Firstly, load a OneNote file using the Document class.
  2. After that, get a list of images using the GetChildNodes method with the Image.class as an argument.
  3. Finally, show the image properties and save to local disk.

The following code sample shows how to extract images from a OneNote file using Java.

Extract Images from OneNote Document using Java
Extract Images from OneNote Document using Java

Get a Free License

You can get a free temporary license to try the library without evaluation limitations.

Conclusion

In this article, we have learned how to extract text from the whole OneNote document or from a specific page of the document. We have also seen how to extract images from OneNote documents programmatically. Besides, you can learn more about Aspose.Note for Java API using the documentation. In case of any ambiguity, please feel free to contact us on the forum.

See Also