Artisan IMG > Merlin Extract (Beta) (merlin-extract) (69693183-2c08-4dcb-8d26-6a23338d23ed)

Merlin Extract (Beta)

The Merlin Extract Connector is a versatile tool designed to extract specific information from various document like PDFs, TIFF, PNG and JPEG. Using advanced natural language processing, it enables users to input queries in natural language to retrieve targeted data efficiently.

During the beta phase, the native AI capabilities of this connector are charged as a single task.

This connector is pending deprecation - all functionality is now migrated to the Merlin IDP connector.

Overview
Copy

The Merlin Extract Connector is a versatile tool designed to extract specific information from various document like PDFs, TIFF, PNG and JPEG. Using advanced natural language processing, it enables users to input queries in natural language to retrieve targeted data efficiently.

Operations
Copy

Extract text (image/pdf)
Copy

The workflow below demonstrates Merlin Extract connector's Extract text operation.

The operation extracts text from a single image (JPEG or PNG) or a multi-page file (PDF or TIFF).

For demonstration purposes, we use a sample invoice file in PDF format.

There is a 20-page limit for processing PDFs with the Merlin Extract Connector. If you need to process larger PDFs, please reach out in the community or contact your account representative for assistance.

The operation accepts the following parameters:

  1. File name: The name of the file to be processed.

  2. File url: The web address where the file is located.

  3. File mime_type: Specifies the Multipurpose Internet Mail Extensions (MIME) type of the file to help the AI determine how to handle it. Supported MIME types include:

    • JPEG: image/jpeg

    • PNG: image/png

    • PDF: application/pdf

    • TIFF: image/tif

  4. File expire: Indicates when the file or the URL access will expire.

  5. Queries: Allows users to specify particular information they want to extract from the document, making the function more flexible and targeted in its text extraction.

In this example we are using the following queries to extract the required information:

  1. What is the Price for Proposal Design?

  2. What is the invoice date?

  3. What is the subtotal of an invoice?

The extracted data can be processed further based on your requirements. In this example, we are storing the extracted data in a Google Sheet.