My Blog

Fill/Retrieve data from PDF Form Fields using VB.Net From an Excel File

15
  • Tired of filling a PDF report every day? OR
  • Tired of collecting information from a pdf everyday?

Here are ways to ease your pain

We will separate the article in 3 sections:

  • Filling PDF Form Fields
  • Extracting the PDF Form Field names
  • Retrieving data from PDF Form Fields

Filling PDF Form Fields


Imagine if you need to fill a Daily Report and mail it to your boss over the email every day? In the beginning it might sound adventurous, but then you start hating the monotonous routine.

Here is one quick way to fill the Form Fields of PDF using VB.Net from data stored in Excel.

To begin with let’s design our form. Place 2 textboxes, 4 command buttons and one OpenFileDialog on a form. Once done the form will look similar to what is show below.

Now let’s say you have a PDF which looks like below. I created this PDF just to demonstrate on how to fill the PDF.

Once the data is filled, your PDF will look like this:

Next is preparing your Excel Database.

Open Excel and Type the Data as shown below. Once done, save it as “C:\PDF_FORM_DATA.XLSX“.

This is how your Excel file might look.

We are now all set to write the code in VB.net.

Double click on the “Browse” Button which we will be use to select the PDF file and paste the following code:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    With OpenFileDialog1
        .DefaultExt = ".PDF"
        .DereferenceLinks = True
        .Filter = "PDF files (*.PDF)|*.PDF"
        .Multiselect = False
        .Title = "Select a PDF file to open"
        .ValidateNames = True

        If .ShowDialog = Windows.Forms.DialogResult.OK Then
            Try
                TextBox1.Text = .FileName
            Catch fileException As Exception
                Throw fileException
            End Try
        End If
    End With
End Sub

Next Double click on the “Browse” Button which we will be use to select the Excel file and paste this code:

Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
    With OpenFileDialog1
        .DefaultExt = ".XLSX"
        .DereferenceLinks = True
        .Filter = "Excel files (*.XLSX)|*.XLSX"
        .Multiselect = False
        .Title = "Select an Excel file to open"
        .ValidateNames = True

        If .ShowDialog = Windows.Forms.DialogResult.OK Then
            Try
                TextBox2.Text = .FileName
            Catch fileException As Exception
                Throw fileException
            End Try
        End If
    End With
End Sub

Now Double click on the “Cancel” Button and paste this code:

Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
    Me.Close()
End Sub

Finally Double click on the “Update” Button and paste this code:

Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
    Dim pdfTemplate As String = TextBox1.Text
    '~~> Change the Output FileName here
    Dim PDFUpdatedFile As String = "C:PDF_FORM_DATA (UPDATED).pdf"

    Dim readerPDF As New PdfReader(pdfTemplate)
    Dim stamperPDF As New PdfStamper(readerPDF, _
    New FileStream(PDFUpdatedFile, FileMode.Create))

    Dim pdfFormFields As AcroFields = stamperPDF.AcroFields

    '~~> Define your Excel Objects
    Dim xlApp As New Excel.Application
    Dim xlWorkBook As Excel.Workbook
    Dim xlWorkSheet As Excel.Worksheet

    '~~> Show/Hide Excel
    xlApp.Visible = True

    '~~> Opens an exisiting Workbook
    xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

    '~~> Set the relevant sheet that we want to work with
    xlWorkSheet = xlWorkBook.Sheets("Sheet1")

    '~~> Update pdf FormFields
    pdfFormFields.SetField(xlWorkSheet.Range("A2").Value.ToString, xlWorkSheet.Range("B2").Value.ToString)
    pdfFormFields.SetField(xlWorkSheet.Range("A3").Value.ToString, xlWorkSheet.Range("B3").Value.ToString)
    pdfFormFields.SetField(xlWorkSheet.Range("A4").Value.ToString, xlWorkSheet.Range("B4").Value.ToString)

    '~~> To remove editting options from the output Form, set it to FALSE
    '~~> To leave then editting options open in the output Form, set it to TRUE
    stamperPDF.FormFlattening = True

    '~~> close the pdf
    stamperPDF.Close()

    '~~> Close the Excel file without saving
    xlWorkBook.Close(False)
    '~~> Quit the Excel Application
    xlApp.Quit()

    '~~> Clean Up
    releaseObject(xlApp)
    releaseObject(xlWorkBook)
End Sub

Add the code below at the very top:

Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Imports Excel = Microsoft.Office.Interop.Excel

…and add one Sub at the bottom for clean up.

'~~> Release the objects
Private Sub releaseObject(ByVal obj As Object)
    Try
        System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
        obj = Nothing
    Catch ex As Exception
        obj = Nothing
    Finally
        GC.Collect()
    End Try
End Sub

The last thing that we need to do is setting References for Excel and itextsharp.dll

itextsharp.dll is freely available on the web and it is free to use for Non Commercial Applications. Once you have downloaded the file, save it at a location of your choice. Once downloaded, click on the menu “Project” → “Add Reference“. Navigate to the “Browse” tab and select the DLL and click on “OK“.

Next set a reference to the Excel Object Library. See this link for more information.

Your final code will look like this. Now run the code and select the relevant files. Once done click on update. The new file with the updated Data will be saved in “C:\” as PDF_FORM_DATA (UPDATED).PDF.

You can change the file name and path of the updated file in the code itself.

FINAL CODE:
Imports iTextSharp
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports iTextSharp.text.xml
Imports System.IO
Imports Excel = Microsoft.Office.Interop.Excel

Public Class Form1
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        With OpenFileDialog1
            .DefaultExt = ".PDF"
            .DereferenceLinks = True
            .Filter = _
             "PDF files (*.PDF)|*.PDF"
            .Multiselect = False
            .Title = "Select a PDF file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox1.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        With OpenFileDialog1
            .DefaultExt = ".XLSX"
            .DereferenceLinks = True
            .Filter = _
             "Excel files (*.XLSX)|*.XLSX"
            .Multiselect = False
            .Title = "Select an Excel file to open"
            .ValidateNames = True

            If .ShowDialog = Windows.Forms.DialogResult.OK Then
                Try
                    TextBox2.Text = .FileName
                Catch fileException As Exception
                    Throw fileException
                End Try
            End If
        End With
    End Sub

    Private Sub Button4_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button4.Click
        Me.Close()
    End Sub

    Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
        Dim pdfTemplate As String = TextBox1.Text
        '~~> Change the Output FileName here
        Dim PDFUpdatedFile As String = "C:PDF_FORM_DATA (UPDATED).pdf"

        Dim readerPDF As New PdfReader(pdfTemplate)
        Dim stamperPDF As New PdfStamper(readerPDF, _
        New FileStream(PDFUpdatedFile, FileMode.Create))

        Dim pdfFormFields As AcroFields = stamperPDF.AcroFields

        '~~> Define your Excel Objects
        Dim xlApp As New Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet

        '~~> Show/Hide Excel
        xlApp.Visible = True

        '~~> Opens an existing Workbook
        xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

        '~~> Set the relevant sheet that we want to work with
        xlWorkSheet = xlWorkBook.Sheets("Sheet1")

        '~~> Update pdf FormFields
        pdfFormFields.SetField(xlWorkSheet.Range("A2").Value.ToString, xlWorkSheet.Range("B2").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A3").Value.ToString, xlWorkSheet.Range("B3").Value.ToString)
        pdfFormFields.SetField(xlWorkSheet.Range("A4").Value.ToString, xlWorkSheet.Range("B4").Value.ToString)

        '~~> To remove editing options from the output Form, set it to FALSE
        '~~> To leave then editing options open in the output Form, set it to TRUE
        stamperPDF.FormFlattening = True

        '~~> close the pdf
        stamperPDF.Close()

        '~~> Close the Excel file without saving
        xlWorkBook.Close(False)
        '~~> Quit the Excel Application
        xlApp.Quit()

        '~~> Clean Up
        releaseObject(xlApp)
        releaseObject(xlWorkBook)
    End Sub

    '~~> Release the objects
    Private Sub releaseObject(ByVal obj As Object)
        Try
            System.Runtime.InteropServices.Marshal.ReleaseComObject(obj)
            obj = Nothing
        Catch ex As Exception
            obj = Nothing
        Finally
            GC.Collect()
        End Try
    End Sub
End Class

Extracting PDF Form Field names


Now Let’s assume that you don’t know what the form fields are. In such a case you can extract the field names from the PDF using the below code.

Insert one more textbox in the above form and also add one more command button. Set the multi-line property of the textbox to “TRUE“.

Your form now should look like this:

Double click on the “Get Form Fields” Button and paste this code:

Private Sub Button5_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button5.Click
    Dim pdfTemplate As String = TextBox1.Text
    Dim readerPDF As New PdfReader(pdfTemplate)
    Dim PDFfld As Object

    For Each PDFfld In readerPDF.AcroFields.Fields
        If TextBox3.Text = "" Then
            TextBox3.Text = PDFfld.key.ToString()
        Else
            TextBox3.Text = TextBox3.Text & Environment.NewLine & PDFfld.key.ToString()
        End If
    Next

    TextBox3.SelectionStart = 0
End Sub

Now when you select the pdf file and click on the “Get Form Fields” button, you will notice the textbox populates with the Form Field names.


Retrieving data from PDF Form Fields


Now let’s add one more button to the above form and call it “Get Form Data“. Your form should look like this:

Double click on the “Get Form Data” Button and paste this code. I will be using the same Excel file that we created above to output the data. I would output the data in Col C.

Private Sub Button6_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button6.Click
    Dim pdfTemplate As String = TextBox1.Text
    Dim readerPDF As New PdfReader(pdfTemplate)
    Dim pdfFormFields As AcroFields = readerPDF.AcroFields

    '~~> Define your Excel Objects
    Dim xlApp As New Excel.Application
    Dim xlWorkBook As Excel.Workbook
    Dim xlWorkSheet As Excel.Worksheet
    Dim LastRow As Integer, i As Integer

    '~~> Show/Hide Excel
    xlApp.Visible = True

    '~~> Opens an exisiting Workbook
    xlWorkBook = xlApp.Workbooks.Open(TextBox2.Text)

    '~~> Set the relevant sheet that we want to work with
    xlWorkSheet = xlWorkBook.Sheets("Sheet1")

    '~~> Get the last row in Col A which has Form Fields
    LastRow = xlWorkSheet.Range("A" & xlApp.Rows.Count).End(Excel.XlDirection.xlUp).Row

    '~~> Loop through Col A and use the Form Fields to extract text
    For i = 2 To LastRow
        xlWorkSheet.Range("C" & i).Value = pdfFormFields.GetField(xlWorkSheet.Range("A" & i).Value)
    Next

    '~~> Close the Excel file without saving
    xlWorkBook.Close(True)
    '~~> Quit the Excel Application
    xlApp.Quit()

    '~~> Clean Up
    releaseObject(xlApp)
    releaseObject(xlWorkBook)
End Sub

This is how the Excel file looks after you run the above code:

  1. Eddy Jawed
    Eddy Jawed02-27-2012

    Hi Siddhart,

    Thanks for this helpful tutorial. Just wondering if you got the dll and the sample pdf form to download too?

    • Siddharth Rout
      Siddharth Rout02-27-2012

      Lemme check. I think I should have it. Will mail it to you later today if I find it :)

  2. Rathi
    Rathi03-01-2012

    hi Siddharth.. This is very Helpful.. Had a query.. The form fields given in excel are related to form fields of PDF Labels or propetry names of control in PDF..?? Please help!!!

    • Siddharth Rout
      Siddharth Rout03-02-2012

      The values in Col A in Excel are the Property Names of the Labels in PDF

  3. Gloria Antonakos
    Gloria Antonakos03-04-2012

    Please message me with some pointers on how you made this website look like this, Id appreciate it.

    • Siddharth Rout
      Siddharth Rout03-04-2012

      Gloria, This is a wordpress template which I used and it is free. This template was designed by zwwooooo (http://zww.me/) and is called zBench. See right at the bottom of this page for a link.

  4. Rathi
    Rathi03-08-2012

    Property Names of Label in PDF in the sense For Example if i have – 1. Name (as Label) and Text box besides it to fill, here should i consider only Name as property name of Label in my excel sheet or 1.Name?

  5. Rathi
    Rathi03-08-2012

    After the conversion, i always get the error when opening the pdf that something like “This is document is changed as compared to original document”. Is this due to any security in the form ?

  6. George Soros
    George Soros09-24-2012

    I can´t add itextsharp.dll on Excel 2007. Does this work only on Excel 2010? Thanks!

    • Siddharth Rout
      Siddharth Rout09-24-2012

      George, This is a VB.Net Post :)

      • George Soros
        George Soros09-24-2012

        That´s why! Lol! Do you know how could I do the same using Excel´s VBA?
        Thanks!

        • Siddharth Rout
          Siddharth Rout09-24-2012

          Unfortunately you cannot use it with VBA :( See this link http://support.itextpdf.com/node/52

          • George Soros
            George Soros09-24-2012

            Siddharth, thanks a lot for you prompt answer!
            I will follow your posts and implement a vb.net solution to do this.
            Just one question: after making it work locally, how hard will it be to make it work with other computers and users?
            Thanks again!

            • Siddharth Rout
              Siddharth Rout09-24-2012

              You can use an installer to distribute your application which will install the dll in the users pc.

              • George Soros
                George Soros09-24-2012

                Perfect! Thanks a lot!