If you have a pile of PDFs and a keyboard that hates manual copy and paste then this guide is for you. We will use UiPath to read PDF content and push structured data into Excel. This saves time and makes your boss stop asking if you enjoy repetitive work. Keywords you will see along the way include UiPath, PDF extraction, Read PDF Text, Read PDF With OCR, DataTable, Regular Expressions, and Excel Application Scope.
Start a fresh UiPath project and open Manage Packages. Install UiPath.PDF.Activities and UiPath.Excel.Activities so that Read PDF Text and Excel Application Scope show up in your activities panel. That is the backbone of this automation.
Use Read PDF Text for native PDFs that contain selectable text. It is fast and reliable when the text is already present. Set the file path and inspect the output string variable before you get fancy with parsing.
Use Read PDF With OCR for scanned images or when the text behaves like a mystery. Pick an OCR engine that fits your language and performance needs and test with a few sample pages. Be prepared to tweak scale and retries if your scans look like someone used a photocopier over a thunderstorm.
Parsing depends on how consistent your documents are. For neat, fixed layout files a simple Split and Trim approach will do wonders. For invoices and semi structured reports regular expressions will be your friend. Test patterns interactively to avoid broken rows.
Sample regex patterns that often help
date pattern \d{2}/\d{2}/\d{4}
amount pattern \$\d+(?:\.\d{2})?
invoice id pattern [A-Z0-9\-]{5,}
Create a DataTable using Build Data Table or an Assign activity with a new System.Data.DataTable. Add columns that match the Excel layout you want. During your file loop use Add Data Row to append each parsed record. Keep the parsing and the row creation separate to make debugging less painful.
Wrap the Excel operations in Excel Application Scope then use Write Range to dump the DataTable into a worksheet. If you do not care about Excel formatting write a CSV with Write CSV and be grateful for the speed. Remember that Excel files can get locked if multiple robots or users touch the same file at once.
Wrap Read PDF Text Read PDF With OCR and Excel writes in Try Catch blocks and log any exceptions. Build a small test set that includes clean examples bad scans and edge cases that break your parsing. Test incrementally and inspect intermediate variables to see where things go sideways.
That is the practical roadmap. Install the packages choose the right reader parse reliably build a DataTable and write to Excel. Add error handling and tests and you will have an RPA process that extracts PDF text and exports tidy data to Excel without eating your weekend. Congratulations you have automated at least one tedious task today.
I know how you can get Azure Certified, Google Cloud Certified and AWS Certified. It's a cool certification exam simulator site called certificationexams.pro. Check it out, and tell them Cameron sent ya!
This is a dedicated watch page for a single video.