UiPath Data Scrape PDFs into Excel Example |Video upload date:  · Duration: PT9M2S  · Language: EN

Compact UiPath guide for scraping tables from PDFs and exporting to Excel with practical steps and quick tips.

Yes you can make stubborn PDF tables behave and land neatly in Excel without crying into the keyboard. This guide walks through a simple UiPath workflow that extracts table data from PDFs and writes a tidy DataTable into an Excel workbook. Expect Read PDF Text or Read PDF With OCR for raw content, the Data Scraping wizard for structured rows, and Write Range to lock it into a file.

What you need before you begin

  • UiPath Studio with a new process ready
  • Install UiPath.PDF.Activities and UiPath.Excel.Activities and the OCR package if you plan to handle scans
  • A PDF viewer that UiPath can interact with and a sample PDF that actually looks like a table

Quick overview of the workflow

At a high level the flow is:

  • Open the PDF or load it into a viewer
  • Use Read PDF Text when text is selectable or Read PDF With OCR for scanned pages
  • Run the Data Scraping wizard and capture the table into a DataTable
  • Clean and convert columns in the DataTable
  • Use Excel Application Scope and Write Range to save the sheet

Step 1 Create the project and add the packages

Create a new process and install the PDF and Excel activities. If your PDFs are scans add an OCR provider too. Clear naming helps when the project grows beyond a hopeful demo, so name things like ReadPdfActivity and ExcelWriteActivity to avoid future confusion.

Step 2 Choose how to read the PDF

If the PDF has selectable text use Read PDF Text for speed and accuracy. If it is a scanned image use Read PDF With OCR and pick an engine that matches the document language. This choice affects accuracy and runtime cost so pick wisely.

Step 3 Use the Data Scraping wizard to capture the table

Open the PDF in a viewer that UiPath can interact with and launch the Data Scraping wizard. Click the first visible cell and then the next one so the wizard can detect the repeating pattern. The wizard will try to capture all similar rows and return a DataTable. Save the result to a variable named extractedTable or another memorable name so the workflow reads like documentation.

Step 4 Clean the DataTable

Tables from PDFs are messy in a sophisticated way. Use simple cleaning steps to fix whitespace and convert types. Common operations include:

  • Trim string columns with row("ColumnName") = row("ColumnName").ToString.Trim
  • Convert numeric text with Convert.ToInt32 or Int32.Parse after trimming
  • Use a For Each row in extractedTable to apply conversions or use LINQ and DataTable select if you prefer fewer activities

For stubborn formatting quirks a short Invoke Code or a few Assign activities will get things into shape without drama.

Step 5 Write the DataTable to Excel

Add an Excel Application Scope and then a Write Range activity. Set AddHeaders to true so your column names survive the trip. Overwrite an existing sheet during testing so you do not accumulate mystery files named output 1 output 2 and so on.

Run and verify

Run the workflow and open the workbook. Check that numeric columns are numeric and that dates and amounts parsed correctly. If rows are missing check whether the Data Scraping wizard matched the visual pattern or whether you need OCR or a different viewer.

Troubleshooting tips and common pitfalls

  • If the Data Scraping wizard misses rows try increasing the visible area or scroll the viewer so the wizard can detect the pattern across multiple pages
  • If OCR gives garbage try a different OCR engine or improve image quality with preprocessing before OCR
  • If conversions fail log the raw values and inspect for stray characters like non breaking spaces or currency symbols

Final thoughts

This is a practical pattern for UiPath RPA when you need PDF to Excel automation. You will alternate between mild triumph and gentle cursing but the result is worth it. Keep your DataTable named clearly keep your conversions simple and write to Excel with confidence. If the PDF fights back there are always more robust parsing approaches but this workflow covers most sane cases.

I know how you can get Azure Certified, Google Cloud Certified and AWS Certified. It's a cool certification exam simulator site called certificationexams.pro. Check it out, and tell them Cameron sent ya!

This is a dedicated watch page for a single video.