XML is a well-known markup language similar to HTML. However, its uses spread across multiple domains such as data management, web, Office tools, documents, etc. In particular cases, the PDF documents are converted into XML files to get the structured representation of the content. Thus, the tag-based representation of the PDF documents can be processed more conveniently for different purposes. Accordingly, in this article, you will learn how to convert a PDF file to XML programmatically in C#.
C# .NET API to Convert PDF to XML
For PDF to XML conversion, we will use Aspose.PDF for .NET. It is a popular API that allows you to create and process PDF files from within .NET applications. Furthermore, it provides a high fidelity converter to convert PDF files to other formats. You can download the API’s binaries or install it using NuGet.
PM> Install-Package Aspose.PDF
Convert PDF to XML in C#
Aspose.PDF for .NET allows the conversion of PDF documents to the following XML standards:
- MobiXML
- PdfXML
Let’s see how to convert a PDF to each of the above-mentioned XML formats using Aspose.PDF for .NET.
PDF to MobiXML in C#
The following are the steps to convert a PDF to MobiXML format in C#.
- Load the PDF document using the Document class.
- Convert PDF to XML using Document.Save(string, SaveFormat) method and pass SaveFormat.MobiXml as second parameter.
The following code sample shows how to convert a PDF to XML with MobiXML format in C#.
PDF to PdfXML in C#
To convert a PDF to PdfXML format, you need to pass SaveFormat.PdfXml as the second parameter of Document.Save(string, SaveFormat) method. The following code sample shows how to convert a PDF to PdfXML format in C#.
Get a Free License
You can get a free temporary license in order to use Aspose.PDF for .NET without evaluation limitations.
Conclusion
In this article, you have learned how to convert a PDF document to XML in C#. Furthermore, we have explicitly covered how to convert a PDF to MobiXML or PdfXML format programmatically. In addition, you can explore more about .NET PDF API using the documentation. In case you would have any questions or queries, you can contact us via our forum.