Screen Scrape Multiple PDF Files with UiPath Example |Video upload date:  · Duration: PT7M57S  · Language: EN

Step by step guide to screen scrape multiple PDF files with UiPath using OCR selectors loops and export to CSV for reliable data extraction

If you have a pile of PDFs and the patience of a saint you could open them one by one and copy paste like a heroic but inefficient worker ant. Or you could let UiPath do the heavy lifting and pretend you always planned this level of automation. This guide shows how to screen scrape multiple PDF files with UiPath using OCR, selectors and a loop based workflow to extract clean data for downstream use.

Prepare your UiPath project and sample files

Create a new project and drop all your sample PDF files into a single folder. Add a DataTable variable to hold extracted rows. Name it something sensible so your future self does not rage quit. Include a few PDF variations so your workflow learns to be less brittle.

Common variables to create

  • DataTable resultsTable for aggregated rows
  • String filePath to hold the current file path
  • Dictionary or individual variables for parsed fields

Pick the right read method

Not all PDFs are created equal. Pick the method that matches the content and avoid brute force OCR when native text is available.

When to use what

  • Use Read PDF Text for text based PDFs. It reads actual text and is faster with fewer errors.
  • Use Get OCR Text or Read PDF With OCR for scanned or image based PDFs where there is no selectable text.
  • Use screen scraping with Click and Type Into when you must mimic a human interaction or when selectors are needed to anchor elements on screen.

Tips on OCR quality

Pick a language specific OCR model to reduce errors. Tweak the OCR engine settings and consider a light image preprocess step if the scans are noisy. OCR is great until it invents words that never existed.

Capture selectors and anchors like a pro

When labels remain stable use Anchor Base to reliably find fields. Good selectors make your workflow resilient to layout changes. Bad selectors will make you curse the UI and the PDF creator.

Parse the raw text and map fields

Once you have raw text use regex or simple string splitting to extract structured fields. For tables detect consistent delimiters or use coordinate based scraping if the table is fixed on the page. Normalize date formats, trim stray whitespace and handle missing values up front.

Loop through files and build the DataTable

Use a For Each activity to iterate files from the input folder. Inside the loop perform the read, parse and normalize steps. Append each parsed row to the DataTable. Use Try Catch around risky steps and log file names that fail so human review can save the day.

  • Standardize field names and formats inside the loop for easy aggregation
  • Validate extracted rows and drop empty entries
  • Log warnings for files with low OCR confidence or parsing anomalies

Save aggregated results

When the loop completes use Write CSV or Write Range to export the DataTable to disk. Excel friendly formats make downstream processing simpler. If you need a review step write a temporary CSV with a subset of rows first so you can inspect a sample without opening the full data set.

Troubleshooting and common pitfalls

  • Prefer native PDF text reading whenever possible because OCR will invent text and ruin your day
  • Watch out for inconsistent label wording across files that breaks simple selectors
  • If a table shifts by a few pixels use relative anchors or coordinate adjustments rather than brittle absolute positions
  • Keep a log of failed files and the error reason so you do not have to re hunt for problems later

Recap. Set up a project folder, choose the correct read method, capture robust selectors, apply OCR only when needed, loop over files, normalize and validate extracted fields, then export your clean data for downstream automation. Follow these steps and you will have more time for things that do not involve copy paste and existential PDFs.

I know how you can get Azure Certified, Google Cloud Certified and AWS Certified. It's a cool certification exam simulator site called certificationexams.pro. Check it out, and tell them Cameron sent ya!

This is a dedicated watch page for a single video.