UiPath Extract PDF Text & Save to Excel Example |Video upload date:  · Duration: PT17M20S  · Language: EN

Extract text from PDFs with UiPath and export structured data to Excel in a few practical steps with tips and error handling

Why this workflow matters and why you will do it anyway

If you have a pile of PDFs and a keyboard that hates manual copy and paste then this guide is for you. We will use UiPath to read PDF content and push structured data into Excel. This saves time and makes your boss stop asking if you enjoy repetitive work. Keywords you will see along the way include UiPath, PDF extraction, Read PDF Text, Read PDF With OCR, DataTable, Regular Expressions, and Excel Application Scope.

Setup and pick the right activities

Start a fresh UiPath project and open Manage Packages. Install UiPath.PDF.Activities and UiPath.Excel.Activities so that Read PDF Text and Excel Application Scope show up in your activities panel. That is the backbone of this automation.

Read PDF Text when the PDF is searchable

Use Read PDF Text for native PDFs that contain selectable text. It is fast and reliable when the text is already present. Set the file path and inspect the output string variable before you get fancy with parsing.

Read PDF With OCR when the page is a scanned mess

Use Read PDF With OCR for scanned images or when the text behaves like a mystery. Pick an OCR engine that fits your language and performance needs and test with a few sample pages. Be prepared to tweak scale and retries if your scans look like someone used a photocopier over a thunderstorm.

Parsing strategies and extracting fields

Parsing depends on how consistent your documents are. For neat, fixed layout files a simple Split and Trim approach will do wonders. For invoices and semi structured reports regular expressions will be your friend. Test patterns interactively to avoid broken rows.

Common extraction patterns

  • Split lines by Environment.NewLine or by page markers for multi page files
  • Use Trim and Replace to remove extra whitespace and stray characters
  • Apply Regular Expressions to capture dates amounts and identifiers

Sample regex patterns that often help

date pattern \d{2}/\d{2}/\d{4}
amount pattern \$\d+(?:\.\d{2})?
invoice id pattern [A-Z0-9\-]{5,}

Build a DataTable and populate rows

Create a DataTable using Build Data Table or an Assign activity with a new System.Data.DataTable. Add columns that match the Excel layout you want. During your file loop use Add Data Row to append each parsed record. Keep the parsing and the row creation separate to make debugging less painful.

Save to Excel with Excel Application Scope

Wrap the Excel operations in Excel Application Scope then use Write Range to dump the DataTable into a worksheet. If you do not care about Excel formatting write a CSV with Write CSV and be grateful for the speed. Remember that Excel files can get locked if multiple robots or users touch the same file at once.

Error handling and testing

Wrap Read PDF Text Read PDF With OCR and Excel writes in Try Catch blocks and log any exceptions. Build a small test set that includes clean examples bad scans and edge cases that break your parsing. Test incrementally and inspect intermediate variables to see where things go sideways.

Troubleshooting tips and performance notes

  • If OCR misses text try a different engine or preprocess images to increase contrast
  • Use a regex tester when crafting patterns to avoid accidental greedy matches
  • Trim and validate fields before adding them to the DataTable to prevent malformed Excel rows
  • For bulk processing consider batching writes or using CSV to avoid many small Excel opens
  • Log file names and row counts so you can prove the robot did not invent data

That is the practical roadmap. Install the packages choose the right reader parse reliably build a DataTable and write to Excel. Add error handling and tests and you will have an RPA process that extracts PDF text and exports tidy data to Excel without eating your weekend. Congratulations you have automated at least one tedious task today.

I know how you can get Azure Certified, Google Cloud Certified and AWS Certified. It's a cool certification exam simulator site called certificationexams.pro. Check it out, and tell them Cameron sent ya!

This is a dedicated watch page for a single video.