Operations (sample payloads)

Main operations
Copy

Extract data
Copy

Extract text from a single image (jpg/png) or multi page PDF/Tiff document.

Sample Input

1
{
2
"file": {
3
"name": "invoice.pdf",
4
"url": "https://example.com/files/invoice.pdf",
5
"mime_type": "application/pdf",
6
"expires": 1623456789
7
},
8
"features": {
9
"layout": true,
10
"forms": true,
11
"tables": true,
12
"signatures": true
13
},
14
"queries": [
15
"What is the total amount?",
16
"Who is the invoice issued to?"
17
],
18
"include_bounding_boxes": true
19
}

Sample Output

1
{
2
"totalPages": 2,
3
"pages": [
4
{
5
"number": 1,
6
"lines": [
7
{
8
"confidence": 0.98,
9
"items": [
10
{
11
"confidence": 0.99,
12
"text": "INVOICE",
13
"bounding_box": {
14
"top": 50,
15
"left": 100,
16
"width": 200,
17
"height": 40
18
}
19
}
20
]
21
}
22
],
23
"layout": {
24
"title": {
25
"confidence": 0.99,
26
"text": "INVOICE",
27
"bounding_box": {
28
"top": 50,
29
"left": 100,
30
"width": 200,
31
"height": 40
32
}
33
},
34
"header": {
35
"confidence": 0.95,
36
"text": "ABC Company",
37
"bounding_box": {
38
"top": 10,
39
"left": 10,
40
"width": 150,
41
"height": 30
42
}
43
},
44
"footer": {
45
"confidence": 0.9,
46
"text": "Page 1 of 2",
47
"bounding_box": {
48
"top": 750,
49
"left": 450,
50
"width": 100,
51
"height": 20
52
}
53
},
54
"page_number": {
55
"confidence": 0.99,
56
"text": "1",
57
"bounding_box": {
58
"top": 750,
59
"left": 500,
60
"width": 20,
61
"height": 20
62
}
63
},
64
"section_headers": [
65
{
66
"confidence": 0.97,
67
"text": "Bill To:",
68
"bounding_box": {
69
"top": 150,
70
"left": 50,
71
"width": 100,
72
"height": 30
73
}
74
}
75
],
76
"lists": [
77
{
78
"confidence": 0.95,
79
"items": [
80
{
81
"confidence": 0.96,
82
"text": "Item 1",
83
"bounding_box": {
84
"top": 300,
85
"left": 50,
86
"width": 100,
87
"height": 20
88
}
89
},
90
{
91
"confidence": 0.97,
92
"text": "Item 2",
93
"bounding_box": {
94
"top": 330,
95
"left": 50,
96
"width": 100,
97
"height": 20
98
}
99
}
100
],
101
"bounding_box": {
102
"top": 300,
103
"left": 50,
104
"width": 100,
105
"height": 60
106
}
107
}
108
]
109
}
110
}
111
],
112
"form_items": [
113
{
114
"pageNumber": 1,
115
"confidence": 0.98,
116
"key": {
117
"confidence": 0.99,
118
"text": "Invoice Number:",
119
"bounding_box": {
120
"top": 100,
121
"left": 50,
122
"width": 150,
123
"height": 30
124
}
125
},
126
"value": {
127
"confidence": 0.99,
128
"text": "INV-001",
129
"bounding_box": {
130
"top": 100,
131
"left": 200,
132
"width": 100,
133
"height": 30
134
}
135
}
136
}
137
],
138
"tables": [
139
{
140
"pageNumber": 1,
141
"confidence": 0.97,
142
"header": [
143
{
144
"confidence": 0.98,
145
"text": "Description",
146
"rowIndex": 0,
147
"columnIndex": 0,
148
"rowSpan": 1,
149
"columnSpan": 1,
150
"bounding_box": {
151
"top": 400,
152
"left": 50,
153
"width": 200,
154
"height": 30
155
}
156
},
157
{
158
"confidence": 0.98,
159
"text": "Amount",
160
"rowIndex": 0,
161
"columnIndex": 1,
162
"rowSpan": 1,
163
"columnSpan": 1,
164
"bounding_box": {
165
"top": 400,
166
"left": 250,
167
"width": 100,
168
"height": 30
169
}
170
}
171
],
172
"rows": [
173
{
174
"cells": [
175
{
176
"confidence": 0.99,
177
"text": "Product A",
178
"rowIndex": 1,
179
"columnIndex": 0,
180
"rowSpan": 1,
181
"columnSpan": 1,
182
"bounding_box": {
183
"top": 430,
184
"left": 50,
185
"width": 200,
186
"height": 30
187
}
188
},
189
{
190
"confidence": 0.99,
191
"text": "$100.00",
192
"rowIndex": 1,
193
"columnIndex": 1,
194
"rowSpan": 1,
195
"columnSpan": 1,
196
"bounding_box": {
197
"top": 430,
198
"left": 250,
199
"width": 100,
200
"height": 30
201
}
202
}
203
]
204
}
205
],
206
"bounding_box": {
207
"top": 400,
208
"left": 50,
209
"width": 300,
210
"height": 60
211
}
212
}
213
],
214
"queries": [
215
{
216
"pageNumber": 1,
217
"query": "What is the total amount?",
218
"confidence": 0.95,
219
"text": "The total amount is $100.00",
220
"bounding_box": {
221
"Width": 0.3,
222
"Height": 0.05,
223
"Left": 0.6,
224
"Top": 0.8
225
}
226
},
227
{
228
"pageNumber": 1,
229
"query": "Who is the invoice issued to?",
230
"confidence": 0.93,
231
"text": "The invoice is issued to XYZ Corporation",
232
"bounding_box": {
233
"Width": 0.4,
234
"Height": 0.05,
235
"Left": 0.1,
236
"Top": 0.2
237
}
238
}
239
],
240
"signatures": [
241
{
242
"pageNumber": 2,
243
"confidence": 0.9,
244
"bounding_box": {
245
"top": 700,
246
"left": 400,
247
"width": 150,
248
"height": 50
249
}
250
}
251
]
252
}

DDL operations
Copy

Extract HTML (DDL)
Copy

Note that DDL operations can only be called directly by Connectors API, or when using CustomJS in the Embedded solution editor for e.g. DDL-dependent data mapping

Extract text structure as HTML from a single image (jpg/png) or multi page PDF/Tiff document.

Sample Input

1
{
2
"file": {
3
"name": "sample_document.pdf",
4
"url": "https://example.com/files/sample_document.pdf",
5
"mime_type": "application/pdf",
6
"expires": 1623456789
7
},
8
"skipElements": {
9
"layout_header": false,
10
"layout_footer": true,
11
"layout_page_number": true,
12
"layout_title": false,
13
"layout_section_header": false,
14
"layout_table": false,
15
"layout_figure": false
16
}
17
}

Sample Output

1
{
2
"html": "<html><body><h1>Sample Document Title</h1><p>This is the first paragraph of the sample document.</p><h2>Section 1</h2><p>Here's some content for section 1.</p><table><tr><th>Column 1</th><th>Column 2</th></tr><tr><td>Data 1</td><td>Data 2</td></tr></table><h2>Section 2</h2><p>Here's some content for section 2.</p><img src='sample_image.jpg' alt='Sample image'><p>This is the last paragraph of the sample document.</p></body></html>"
3
}

Extract markdown (DDL)
Copy

Note that DDL operations can only be called directly by Connectors API, or when using CustomJS in the Embedded solution editor for e.g. DDL-dependent data mapping

Extract text structure as Markdown from a single image (jpg/png) or multi page PDF/Tiff document.

Sample Input

1
{
2
"file": {
3
"name": "sample_document.pdf",
4
"url": "https://example.com/files/sample_document.pdf",
5
"mime_type": "application/pdf",
6
"expires": 1623456789
7
},
8
"skipElements": {
9
"layout_header": false,
10
"layout_footer": true,
11
"layout_page_number": true,
12
"layout_title": false,
13
"layout_section_header": false,
14
"layout_table": false,
15
"layout_figure": false
16
}
17
}

Sample Output

1
{
2
"markdown": "# Sample Document Title\n\n## Introduction\n\nThis is a sample document to demonstrate the extraction of text structure as Markdown.\n\n### Section 1\n\nHere's some content for the first section.\n\n- Bullet point 1\n- Bullet point 2\n- Bullet point 3\n\n### Section 2\n\nHere's a table:\n\n| Column 1 | Column 2 | Column 3 |\n|----------|----------|----------|\n| Data 1 | Data 2 | Data 3 |\n| Data 4 | Data 5 | Data 6 |\n\n## Conclusion\n\nThis concludes our sample document."
3
}