Using the Open
Source iTextSharp to fill PDF Form Fields
In this article describes a quick and simple approach to programmatically
completing a PDF document using the iTextSharp DLL. The article also discusses how one might go about using the iTextSharp
DLL to discover and map the fields available within an existing PDF if the
programmer has only the
PDF but, does not have Adobe Designer or even a list of
the names of the fields present in the PDF.
Figure 1:
Resulting PDF after Filling in Fields Programmatically
iTextSharp is a C# port of a Java library written to support the
creation and manipulation of PDF documents. The project is available for download through
SourceForge.net. With the iTextSharp DLL, it is possible to not only
populate fields in an existing PDF document, but also to dynamically create
PDFs. The examples here are limited to a description of the procedures
associated with the completion of a PDF. The download will contain examples of
PDF creation in both Visual Basic and C#.
The examples contained
herein are dependent upon the availability of the iTextSharp DLL. Use the link
provided previously in order to download the DLL locally to your development
machine. In order to demonstrate filling out a PDF using the iTextSharp DLL, I
downloaded a copy of the W-4 PDF form from the IRS website. The form contains
controls and may be filled out programmatically so it serves as a good example.
PDF
documents that do not contain controls, i.e. those meant to be printed and
filled in with a pencil, cannot be completed using this approach. Of course, if
you have access to Adobe tools (Adobe Professional, Adobe Designer), you can
always create your own PDFs with controls or can add controls to existing PDFs.
Further, although not demonstrated here, you can also use iTextSharp to create
a PDF document with embedded controls.
Getting Started
In
order to get started, fire up the Visual Studio 2005 IDE and open the attached
solution. The solution consists of a single Windows Forms project with a single
form. I have also included a PDF that will be used for demonstration purposes;
this form is the IRS W-4 form completed by US taxpayers. However, any PDF with
embedded controls (text boxes, check boxes, etc.) is fair game for this
approach. Note that a reference to the iTextSharp DLL has been included in the
project.
All
of the project code is contained within the single Windows Form. The form
itself contains only a docked textbox used to display all of the field names
from an existing PDF document. The completed PDF is generated and stored in the
local file system; the PDF is not opened for display by the application.
The
application uses the existing PDF as a template and from that template, it
creates and populates the new PDF. The template PDF itself is never populated
and it is used only to define the format and contents of the completed PDF.
Figure 2:
Solution Explorer
The Code: Main Form
As was previously mentioned, all of the code used in
the demonstration application is contained entirely in the project’s single Web
Form. The following section will describe the contents of the code file
The file begins with the appropriate library imports needed to
support the code. Note that the iTextSharp libraries have been included into
the project. The namespace and class declaration are in the default
configuration.
The next section of
code contains the button click and the
ListFieldNames() and FillForm() Methods
load. During Button click event, two functions
are called. Those functions are used to display all of the fields present in
the template PDF and to create a new PDF populated with a set of field values.
The next section of
code contained in the demo application defines a function used to collect the
names of all of the fields from the target PDF. The field names are displayed
in a text box contained in the application’s form.
Figure 3 shows the field names collected from the target PDF
using the
ListFieldNames
function call. In order to map these
fields to specific fields in the PDF, one need only copy this list and pass
values to each of the fields to identify them. For example, if the form
contains ten fields, setting the value (shown next) to a sequential number will
result in the display of the numbers 1 to 10 in each of the fields. One can
then track that field value back to the field name using this list as the basis
for the map. Once the fields have been identified, the application can be
written to pass the correct values to the related field.
Checkbox controls may be a
little more challenging to figure out. I tried passing several values to the
checkbox controls before lining up a winner. In this example, I tried pass 0, 1, true, false,
etc. to the field before figuring out that
yes
sets the check.
Figure 3: The
Available PDF Fields
The next section of code in the demo project is used to fill in
the mapped field values. The process is simple enough. The first thing that
happens is that that the template file and new file locations are defined and
passed to string variables. Once the paths are defined, the code creates an
instance of the PDF reader which is used to read the template file, and a PDF
stamper which is used to fill in the form fields in the new file. Once the
template and target files are set up, the last thing to do is create an
instance of
AcroFields
,
which is populated with all of the fields contained in the target PDF. After
the form fields have been captured, the rest of the code is used to fill in
each field using the field’s SetField
function.
In this example, the first
worksheet and the W-4 itself are populated with meaningful values whilst the
second worksheet is populated with sequential numbers that are then used to map
those fields to their location on the PDF. After the PDF has been filled out,
the application reads values from the PDF (the first and last names) in order
to generate a message indicating that the W-4 for this person was completed and
stored.
To finish up the PDF, it is necessary to determine whether or
not additional edits will be permitted to the PDF after it has been
programmatically completed. This task is accomplished by setting the
FormFlattening
value to true orfalse. If
the value is set to false, the resulting PDF will be available for edits, but if the
value is set to true, the PDF will be locked against further edits. Once the form
has been completed, the PDF stamper is closed and the function terminated.
That wraps up the discussion
of the form-based demo project.
Summary
This article described an approach to populating a PDF document
with values programmatically. This functionality was accomplished using the
iTextSharp DLL. Further, the article described an approach for mapping the
fields contained in a PDF and may be useful if one is dealing with a PDF
authored elsewhere and if the programmer does not have access to Adobe
Professional or Adobe Designer.
The iTextSharp library is a
powerful DLL that supports authoring PDFs, as well as using them in the manner
described in this document. However, when authoring a PDF, it seems that it
would be far easier to produce a nice document using the visual environment
made available through the use of Adobe tools. Having said that, if one is
dynamically creating PDFs with variable content, the iTextSharp library does
provide the tools necessary to support such an effort. With the library, one
can create and populate a PDF on the fly.
Download POC