Convert PDF Documents to HTML using C#

Aspose.Pdf for .NET logo

For the last few releases, we have been specifically working on improving our PDF to HTML conversion feature and in this release, Aspose.PDF for .NET 9.1.0, we have made few more great improvements to this feature. When converting large PDF files to HTML format, you sometimes want to determine the progress of the conversion. For example, it might be important to show the progress of conversion (number of pages converted). To accomplish this requirement, we have introduced a property CustomProgressHandler in HtmlSaveOptions. Please visit the following topic of detailed information on determining PDF to HTML conversion progress.

PDF to HTML – Place Text Position Information as Inline Style

With this latest release, now the HTML-results contain inline-styles with attributes “position:absolute”, similar to:

<div style="position:absolute; top:3.2953em; left:12em;">
<span class="pages_12345_06 pages_12345_07 pages_12345_08" style="word-spacing:-0.27em;">

PDF to HTML – Avoid Saving Images in SVG Format

Recently we received a requirement to completely remove SVG images from the PDF to HTML conversion process. In order to accomplish this requirement, a new member of the enumeration HtmlSaveOptions.RasterImagesSavingModes has been introduced. The complete instructions can be found in the PDF to HTML – Avoid Saving Images in SVG Format article.

Control Image Quality when Adding Stamp

In this new release, we have also introduced a feature for controlling image quality when adding a stamp. In order to accomplish this requirement, we’ve added the Quality property to the ImageStamp class. It indicates the quality of image in percents (valid values are 0..100). For further information, please read Adding Image Stamp to PDF file.

Insert Metadata with Prefix

When adding metadata to PDF files, you can create/register a new metadata namespace with a prefix.

Document pdfDocument = new Document("input.pdf");
pdfDocument.Metadata.RegisterNamespaceUri("xmp", "http://ns.adobe.com/xap/1.0/"); // xmlns prefix was removed
pdfDocument.Metadata["xmp:ModifyDate"] = DateTime.Now;
pdfDocument.Save("updated.pdf");

For further information, please take a look at Set XMP Metadata in PDF File.

Render Table on New Page

We used to have a feature for rendering a table on a new page when using the Aspose.Pdf.Generator namespace. Starting with this release, we also have introduced a new property named IsInNewPage in the BaseParagraph class to provide an option to render a table in new page. For more information, please visit Render Table in New Page.

Improved Signing Feature

We have introduced a new class named DocMDPSignature and the enumeration DocMDPAccessPermissions to provide the feature to digitally sign PDF files. Also please note that we have added the IsCertified property to the PdfFileSignature class. For more information about this topic, please take a look at Digitally Sign PDF Files.

Along with the new features and enhancements listed above, we have made numerous improvements in PDF to HTML conversion, HTML to PDF conversion, image to PDF and PDF to image conversion, stamping PDF files, manipulating text inside PDF documents, PDF to XPS conversion, image and text extraction, and various other features.

Download and start exploring the exciting new features of Aspose.PDF for .NET 9.1.0.