Demo
2 min

Query and extract structured data from documents

See how Merlin IDP extracts structured data from unstructured files using queries, classification, and confidence scoring.

Video thumbnail

Why it matters

Manual data entry and document parsing slow teams down. With Merlin Intelligent Document Processing (IDP), you can extract key information from PDFs, images, and other files using natural language queries—while receiving confidence intervals on each result. That means you can automate what’s reliable and flag what needs human review. No custom models. No scraping scripts.

What you’ll see   

  • Merlin IDP tokenize uploaded documents for structured extraction

  • How queries run against document contents and return scored responses

  • How Merlin Analysis classifies document type based on file content

  • How results are written into a spreadsheet for downstream use

Transcript

Native AI capabilities within the Tray platform include Merlin Intelligent Document Processing.

We want to understand some unstructured data. Take this invoice, and we want to turn it into structured data, like this spreadsheet.

Let's upload a invoice file, just like the one we saw, to a Tray form, run some queries against it using Tray's Intelligent Document Processing capability, called Merlin IDP.

So we'll start with the Tray form. I've already got my invoice file that I just showed loaded here into the form. And I've written some queries here related to the contents of a typical invoice.

If I hit submit, we can see that the form says we've submitted successfully. And I can come back to this spreadsheet and see the population of those different queries.

The initial form serves as the kickoff between the automation for this example. But this could be a file uploaded to a specific folder or an email attachment coming in that kicks us off. And we're using a spreadsheet for this demonstration, but this can be entered directly into a system of record.

Merlin IDP uses machine learning and provides a consistent payload with confidence intervals, so you can automate what you trust and bring a human in the loop when you deem a review necessary.

So now we've seen these these queries run in real time. They come in with the results directly from the document. They come in with those specific confidence intervals as we mentioned, you can see the classification is being done here as well. Let's take a look at the workflow that's actually powering this.

So, upon file submission, we can see that the logs ran here at the bottom, and the file came in. Then we use the file helper connector to create a version of the file that we can process with Merlin IDP.

We used Merlin to grab the markdown content, so creating some, textual data from unstructured data that we could use. We used Merlin Analysis to classify the document based on the markdown contents.

That's here. And we got the result invoice. We split out the queries from the original field, the text field of the form.

And then we ran those queries directly against the file with Merlin IDP again. And that's where we got the actual queries back. Then we pass it over to another automation, a sub automation called a callable workflow. And that is what actually wrote into the Google Sheet.

Merlin IDP is helpful for data extraction against multiple file types and is preconfigured in our retrieval augmented generation accelerators.

Let's explore what's possible, together.

Contact us