Persits Software, Inc. Web Site
 Navigator:  Home |  Manual |  Chapter 13: HTML to PDF Conversion
Appendix A: Pre-defined Color Names Chapter 12: Miscellaneous Features   <Information in this chapter is preliminary and subject to change>

  Chapter 13: HTML to PDF Conversion

13.1 ImportFromUrl Method
13.2 Authentication
13.3 Error Log
13.4 Page Breaks

13.1 ImportFromUrl Method

Starting with version 1.6, AspPDF is capable of converting HTML documents to PDF via PdfDocument's ImportFromUrl method. This method opens an HTML document from a given URL, splits it into pages and renders it onto an empty or existing PDF document. The document can then be further edited, if necessary, and saved to disk, memory or an HTTP stream as usual.

ImportFromUrl's support for various HTML tags and constructs is not quite as extensive as that of major browsers, but still considerably stronger than the limited HTML functionality of Canvas.DrawText available in older version of AspPDF. ImportFromUrl recognizes tables, images, lists, cascading style sheets, etc.

Release note: Initially, HTML to PDF functionality was implemented via PdfManager's OpenUrl method but starting with Service Release 1.6.0.5, OpenUrl is replaced by the more versatile ImportFromUrl. The latter supports writing HTML onto existing documents and is capable of returning debug information. OpenUrl is deprecated and will not be supported in future releases. As of now, ImportFromUrl is still a work in progress. Use it at your own risk.

ImportFromUrl accepts four parameters, all but the first one optional: the input URL, a parameter list, and a username/password pair.

The URL parameter can be an HTTP or HTTPS address, such as http://www.server.com/path/file.html, or a local physical path such as c:\path\file.html. Note that if you want to open a dynamically generated document such as an .asp or aspx file, you need to invoke it via HTTP even if this file is local to your own script.

The following simple code snippet creates a PDF document out of the Persits Software site persits.com:

VBScript
Set Pdf = Server.CreateObject("Persits.Pdf")
Set Doc = Pdf.CreateDocument
Doc.ImportFromUrl "http://www.persits.com"

Filename = Doc.Save( Server.MapPath("importfromurl.pdf"), False )

C#
IPdfManager objPdf = new PdfManager();
IPdfDocument objDoc = objPdf.CreateDocument( Missing.Value );
objDoc.ImportFromUrl( "http://www.persits.com", Missing.Value, Missing.Value, Missing.Value );

String strFilename = objDoc.Save( Server.MapPath("importfromurl.pdf"), false );

Click on the links below to run this code sample:

http://localhost/asppdf/manual_13/13_importfromurl.asp
http://localhost/asppdf/manual_13/13_importfromurl.aspx  Why is this link not working?

The ImportFromUrl method's 2nd argument is a PdfParam object or parameter string specifying additional parameters controlling the HTML to PDF conversion process. For example, to create a document in a landscape orientation, the Landscape parameter must be set to true, for example:

Doc.ImportFromUrl "http://www.persits.com", "landscape=true"

When new pages have to be added to the document during the conversion process, the default page size is U.S. Letter. This can be changed via the PageWidth and PageHeight parameters.

When rendering HTML content on a page, AspPDF leaves 0.75" margins around the content area. That can be changed via the LeftMargin, RightMargin, TopMargin and BottomMargin parameters.

The full list of ImportFromUrl parameters can be found here.

13.2 Authentication

13.2.1 Basic Authentication

The 3rd and 4th arguments of the ImportFromUrl method are a username and password that can be used if the URL being opened is protected via Basic Authentication, as follows:

Doc.ImportFromUrl "http://www.server.com/script.asp", "landscape=true", "jsmith", "pwd"

13.2.2 .NET Forms Authentication

Under .NET, the Username and Password arguments can instead be used to pass an authentication cookie in case both the script calling ImportFromUrl and a file being converted to PDF are protected by the same user account under .NET Forms authentication. To pass a cookie to ImportFromUrl, the cookie name prepended with the prefix "Cookie:" is passed via the Username argument, and the cookie value via the Password argument. The following example illustrates this technique.

Suppose you need to implement a "Click here for a PDF version of this page" feature in a .NET-based web application. The application is protected with .NET Forms Authentication:

<authentication mode="Forms">
  <forms name="MyAuthForm" loginUrl="login.aspx" protection="All">
    <credentials passwordFormat = "SHA1">
      <user name="JSmith" password="13A23E365BFDBA30F788956BC2B8083ADB746CA3"/>
      
... other users
    </credentials>
  </forms>
</authentication>

The page that needs to be converted to PDF, say report.aspx, contains the button "Download PDF version of this report" that invokes another script, say convert.aspx, which calls AspPDF's ImportFromUrl. Both scripts reside in the same directory under the same protection.

If convert.aspx simply calls objDoc.ImportFromUrl( "http://localhost/dir/report.aspx", ... ), the page that ends up being converted will be login.aspx and not report.aspx, because AspPDF itself has not been authenticated against the user database and naturally will be forwarded to the login screen.

To solve this problem, we just need to pass the authentication cookie whose name is MyAuthForm (the same as the form name) to ImportFromUrl. The following code (placed in convert.aspx) does the job:

C#
<%@ Import Namespace="System.Web" %>
<%@ Import Namespace="System.Reflection" %>
<%@ Import Namespace="ASPPDFLib" %>

<script runat=server language="C#">

private void Page_Load(object sender, System.EventArgs e)
{
  IPdfManager objPDF;
  objPDF = new PdfManager();

  String strCookieName = "", strCookieValue = "";

  ' Search for our authentication cookie
  for( int i = 0; i < Request.Cookies.Count; i++ )
  {
    if( Request.Cookies[i].Name == "MyAuthForm" )
    {
      strCookieName = Request.Cookies[i].Name;
      strCookieValue = Request.Cookies[i].Value;
      break;
    }
  }

  IPdfDocument objDoc = objPDF.CreateDocument(Missing.Value);
  objDoc.ImportFromUrl( "http://localhost/dir/report.aspx", Missing.Value,
    "Cookie:" + strCookieName, strCookieValue );

  objDoc.SaveHttp( "attachment;filename=report.pdf" );
}

</script>

Note that the cookie name is prepended with the prefix "Cookie:" before being passed to ImportFromUrl.

13.3 Error Log

ImportFromUrl throws an exception if the specified URL cannot be found or invalid, and no HTML to PDF conversion takes place. However, if the main URL is valid but some of the dependent information (fonts, image URLs, CSS files, etc.) cannot be found, the conversion will go on uninterrupted, although the resultant PDF document may not look as expected.

To simplify debugging, ImportFromUrl can be used in a debug mode. If the parameter Debug=true is used, ImportFromUrl returns a log of non-fatal errors encountered during the conversion process. A log entry consists of the entry type, such as "Image", "CSS", etc., error message, and relevant data, such as the invalid URL, unknown font name, etc. Log entries are separated by two pairs of CR/LF characters.

The following code snippet invokes ImportFromUrl in the debug mode and displays the error log:

Log = Doc.ImportFromUrl( "http://www.server.com/script.asp", "debug=true" )
Response.Write Log

A typical log string may look as follows:

Image: Error opening URL. HTTP Status Code: 404
Data: http://www.persits.com/image.gif

Font: Font name cannot be found.
Data: Arrial

13.4 Page Breaks
HTML allows page breaks for printing purposes via the CSS properties page-break-before and page-break-after. The ImportFromUrl method recognizes these properties for the purpose of page breaking in a limited set of HTML tags. The value for these two properties must be set to "always", other values will have no effect. Just like with any CSS property, inline syntax or a separate style sheet can be used. For example:

<BR style="page-break-before: always">

The property page-break-before: always can be applied to the following tags:

<BR>
<IMG>
<HR>
<TABLE>
<DIV>

The property page-break-after: always can be applied to the following tags:

<BR>
<IMG>
<HR>

Appendix A: Pre-defined Color Names Chapter 12: Miscellaneous Features
Search AspPDF.com

  This site is owned and maintained by Persits Software, Inc. Copyright © 2003. All Rights Reserved.