Aquaforest PDF connector contains a group of actions that performs different PDF operations like splitting, text extraction, barcode extraction and OCR operations for Office 365 and Power Automate.
Status: Preview |
Tier: Premium |
Version: 1.0 |
Name |
Summary |
Get barcode value |
|
Get-pdf-information ([Optional]GetPDFInfoRequest getPDFInfoRequest) |
Get PDF properties |
Get-data-from-pdf ([Optional]GetDataFromPDFRequest getDataFromPDFRequest) |
Get data from PDF |
Get text from PDF |
|
OCR PDF or images |
|
Split PDF by barcode |
|
Split PDF by text match |
|
Extract-by-barcode ([Optional]ExtractBarcode extractBarcode) |
Extract PDF pages by barcode |
Extract PDF pages by text |
|
Split-by-page-range ([Optional]SplitPdfByPageDefinition splitPdfByPage) |
Split PDF by page |
GetDataSchema (string operation, [Optional]string expectedKeys) |
Gets the item schema of the selected list |
Name |
Summary |
Name |
Summary |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Summary: Get barcode value
Description: Get Barcode From PDF. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.
Syntax:
AquaforestPDF.Get-barcode-value ([Optional]GetBarcode getBarcode)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
getBarcode |
|
|
False |
Returns:
Type:ApiRenameByBarcodePost200ApplicationJsonResponse
Summary: Get PDF properties
Description: Gets the information about a PDF file
Syntax:
AquaforestPDF.Get-pdf-information ([Optional]GetPDFInfoRequest getPDFInfoRequest)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
getPDFInfoRequest |
|
|
False |
Returns:
Type:GetPDFInfoResponse
Summary: Get data from PDF
Description: This action will extract important data from PDF files in the form of Key/Value pairs.
Syntax:
AquaforestPDF.Get-data-from-pdf ([Optional]GetDataFromPDFRequest getDataFromPDFRequest)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
getDataFromPDFRequest |
|
|
False |
Returns:
Type:GetPDFDataDynamicResponseSchema
Summary: Get text from PDF
Description: Get Text From PDF files based on the text location and regular expressions. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.
Syntax:
AquaforestPDF.Get-text-value ([Optional]GetText getText)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
getText |
|
|
False |
Returns:
Type:ApiGetTextValueJsonResponse
Summary: OCR PDF or images
Description: Generate searchable PDF from an image PDF or scanned images. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.
Syntax:
AquaforestPDF.Ocr-file-to-pdf ([Optional]ocr_data ocr_data)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
ocr_data |
|
Parameters for OCR operation |
False |
Returns:
Type:ocr_response
Description: Response data for OCR operation
Summary: Split PDF by barcode
Description: Splits PDF files based on barcode matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for documentation. In addition the Aquaforest Zonal Extraction Tool is available at [https://www.aquaforest.com/en/zone/get-pdf-zone.html].
Syntax:
AquaforestPDF.Split-by-barcode ([Optional]SplitBarcode splitBarcode)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
splitBarcode |
|
|
False |
Returns:
Type:ApiSplitPost200ApplicationJsonResponse
Summary: Split PDF by text match
Description: Splits PDF files based on text matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for documentation. In addition the Aquaforest Zonal Extraction Tool is available at [https://www.aquaforest.com/en/zone/get-pdf-zone.html].
Syntax:
AquaforestPDF.Split-by-text ([Optional]SplitText splitText)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
splitText |
|
|
False |
Returns:
Type:ApiSplitPost200ApplicationJsonResponse
Summary: Extract PDF pages by barcode
Description: Extract PDF files based on barcode matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.
Syntax:
AquaforestPDF.Extract-by-barcode ([Optional]ExtractBarcode extractBarcode)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
extractBarcode |
|
|
False |
Returns:
Type:ApiExtractPost200ApplicationJsonResponse
Summary: Extract PDF pages by text
Description: Extract PDF files based on text matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.
Syntax:
AquaforestPDF.Extract-by-text ([Optional]ExtractText extractText)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
extractText |
|
|
False |
Returns:
Type:ApiExtractPost200ApplicationJsonResponse
Summary: Split PDF by page
Description: Splits PDF files based on split options defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for documentation.
Syntax:
AquaforestPDF.Split-by-page-range ([Optional]SplitPdfByPageDefinition splitPdfByPage)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
splitPdfByPage |
|
|
False |
Returns:
Type:ApiSplitPost200ApplicationJsonResponse
Summary: Gets the item schema of the selected list
Description: Gets the schema of the selected list
Syntax:
AquaforestPDF.GetDataSchema (string operation, [Optional]string expectedKeys)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
operation |
string
|
|
True |
|
expectedKeys |
string
|
|
False |
Returns:
Summary:
Description:
Properties:
Name |
Type |
Summary |
ErrorMessage |
string Error |
If the value of Is Successful is false, we will return an Error Message |
IsSuccessful |
boolean Is Successful |
This will return true if at least one page was extracted |
LicenceInfo |
string License Info |
Information about your API subscription key |
SplittedFile |
array of (SplittedFileItem) Extract Output Files |
Array of Extracted Files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
SplitFileContent |
string(byte) (File Content) |
A base 64 string representing the File Content |
SplitFileName |
string (File Name) |
A string containing the generated File Name |
pageNumber |
string (Page Number) |
The page range containing the page number where the extraction occurred |
Summary:
Description:
Properties:
Name |
Type |
Summary |
ErrorMessage |
string Error Message |
If the value of Is Successful is false, we will return an Error Message |
IsSuccessful |
boolean Is Successful |
If the Text was matched successfully |
LicenceInfo |
string License Info |
Information about your API subscription key |
TextResult |
string Text Result |
A string generated from applying the extracted text to the Text Result Template provided. Note if the page count is greater than one will concatenate all the pages using the Page Separator. |
TextResults |
array of (TextResultsItem) Results |
An array containing a list of pages and the extracted text values |
Summary:
Description:
Properties:
Name |
Type |
Summary |
pageNumber |
string (Page Number) |
The page where the text was found |
valueExtracted |
string (Page Text) |
A string generated from applying the extracted text to the Text Result Template provided. |
zoneValues |
array of (string) (Zone Values) |
An array containing the text extracted from each zone. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
BarcodeResult |
string Barcode |
A string generated from applying the extracted text to the barcode Result Template provided. Note if the page count is greater than one will concatenate all the pages using the Page Separator. |
BarcodeResults |
array of (BarcodeResultsItem) Results |
An array containing a list of pages and the extracted barcode values |
ErrorMessage |
string Error Message |
If the value of Is Successful is false, we will return an Error Message |
IsSuccessful |
boolean Is Successful |
If a barcode was detected |
LicenceInfo |
string License Info |
Information about your API subscription key |
Summary:
Description:
Properties:
Name |
Type |
Summary |
pageNumber |
string (Page Number) |
The page where the barcode was found |
valueExtracted |
string (Page Barcode) |
A string generated from applying the extracted barcode value to the barcode Result Template provided. |
zoneValues |
array of (string) (Zone Values) |
An array containing the barcode extracted from each zone. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
ErrorMessage |
string Error Message |
If the value of Is Successful is false, we will return an Error Message |
IsSuccessful |
boolean Is Successful |
This will return true if at least one split page was matched. |
LicenceInfo |
string License Info |
Information about your API subscription key |
SplittedFile |
array of (SplittedFileItem) Split Output Files |
Array containing each of the split files together with details like the generated file name and page number. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
SplitFileContent |
string(byte) (File Content) |
A base 64 string representing the File Content |
SplitFileName |
string (File Name) |
A string containing the generated File Name |
pageNumber |
string (Page Range) |
The page range containing the page numbers of the split operation |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file |
fileNameTemplate |
string File Name Template |
Template for the output file if barcode is found |
noTextFileName |
string No File Template |
Template for the output file if no barcode is found |
sourceFileName |
string File Name |
The name of the source file |
zones |
array of (ZonesItem) Barcode |
List of variables that can be used to extract barcode information from PDF files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
barcodeFormats |
array of (string) Type |
Specify the types of Barcode you want to identify |
location |
string Location |
Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html] |
regex |
string Pattern |
If a regular expression is provided here, we will match any extracted barcode to it and return the match. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file |
fileNameTemplate |
string File Name Template |
Template for the output file if the text matches are found |
noTextFileName |
string No File Template |
Template for the output file if no text match is found |
sourceFileName |
string File Name |
The name of the source file |
zones |
array of (ZonesItem) Text |
List of variables that can be used to extract text information from PDF files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
expression |
array of (string) Value |
Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above. |
location |
string Location |
Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html] |
position |
string Select |
Use this to refine the text you extract more, select an option that matches your requirements Values: [text in zone, word after value, word before value, all text in line after value, all text in line before value, all text in zone after value, all text in zone before value] |
regex |
string Pattern |
If a regular expression is provided here, we will match any extracted text to it and return the match. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
barcodeResultTemplate |
string Barcode Result Template |
Template for the output text result if a barcode is found |
fileContent |
string(byte) File Content |
The content of the source file |
noBarcodeTemplate |
string No Barcode Template |
Template for the output text result if no barcode is found |
pageSeparator |
string Page Separator |
Provide a page separator so that you can know where the page breaks are. |
pagerange |
string Pages |
Provide a page range you want to extract text from, this can be a single page number (1), multiple page numbers separated by commas (1,2,3), a page range (1-4) or a mixture of all (1,2,4-7). |
sourceFileName |
string File Name |
The name of the source file |
zones |
array of (ZonesItem) Barcode |
List of variables that can be used to extract barcode information from PDF files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
barcodeFormats |
array of (string) Type |
Specify the types of Barcode you want to identify |
location |
string Location |
Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html] |
pagenumber |
integer Page (Deprecated) |
This property is deprecated, we advise you to use the Pages property. The Pages property applies to all zones and allows you select the pages you want to process. |
regex |
string Pattern |
If a regular expression is provided here, we will match any extracted text to it and return the match. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
advancedFlags |
array of (AdvancedFlagsItem) Advanced Settings |
This is used to pass advanced parameters to this action. Do not use this property unless directed by our support team. |
confidenceScore |
number Confidence Score |
Set a higher confidence score to filter out values with lower confidence. You can set any value between 0 and 1. We recommend starting from 0.5 |
dateAsISO |
string Date Conversion |
Select which format to return date value as Values: [Do not convert date values, ISO conversion (DMY input assumed), ISO conversion (MDY input assumed)] |
expectedKeys |
string Expected Keys |
Provide one key name per line to make values available to later actions without parsing JSON. |
fileContent |
string(byte) File Content |
The content of the source file |
pageLimit |
integer Page Limit |
Maximum number of pages to be processed |
pageRange |
string Page Range |
A string representation of the page numbers you want to process. E.g 1,3-4 |
stripCurrencySymbol |
boolean Strip Currency Symbol |
Set this to true if you want the symbols and strings to be removed before we return currency values |
synonym |
boolean Match Synonym |
Set this to true if you want us to return all the keys that are synonyms to the expected key. |
synonymDictionary |
string Synonym Dictionary |
You can provide a JSON array of “entry” objects, where each object contains a list of synonyms in an array. For instance, if you want “Invoice No” and “Invoice Number” (case-insensitive) to be interpreted as the same key, use the following JSON: [{'entry': [ 'Invoice No', 'invoice number' ]}] |
trimSymbols |
boolean Trim Symbols |
Set this to true if you want us to remove all leading and trailing symbols from the keys found before we match them to an expected key. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
settingName |
string Name |
Enter the name of the setting here |
settingValue |
string Value |
Enter the value of the setting here. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
Summary:
Description:
Properties:
Name |
Type |
Summary |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file |
pageLimit |
integer Page Limit |
Maximum number of pages to be processed, this is only used to check if pages contain hidden text or to check if PDF is searchable |
Summary:
Description:
Properties:
Name |
Type |
Summary |
AllowAssembly |
boolean Allow Assembly |
Allow rotation, insertion or deletion of pages. |
AllowDegradedPrinting |
boolean Allow Degraded Printing |
Allow low-quality printing. |
AllowExtractContents |
boolean Allow Extract Contents |
Allow extraction of text and graphics. |
AllowExtractForAccessibility |
boolean Allow Extract For Accessibility |
Allow extraction of text and graphics in support of accessibility. |
AllowFillInForm |
boolean Allow Fill In Form |
Allow filling of form fields. |
AllowModifyAnnotations |
boolean Allow Modify Annotations |
Allow modification of annotations. |
AllowModifyContents |
boolean Allow Modify Contents |
Allow modification of contents. |
AllowPrinting |
boolean Allow Printing |
Allow high-quality printing. |
Author |
string Author |
Who created the document. |
CreationDate |
string Creation Date |
This is the date and time the PDF was created. |
Creator |
string Creator |
The originating application or library. |
ErrorMessage |
string Error message |
If the value of Is Successful is false, we will return an Error Message |
FileSize |
number File Size (bytes) |
The size of the file in bytes |
HasHiddenText |
boolean Has Hidden Text |
This will return true if the PDF file has an OCR layer. |
IsEncrypted |
boolean Is Encrypted |
This will return true if this document is encrypted or not. |
IsSearchable |
boolean Is Searchable |
This will return true if the PDF file is searchable. |
IsSuccessful |
boolean Is Successful |
Returns true if the action was successful. |
Keywords |
string Keywords |
Keywords can be comma separated. |
LicenceInfo |
string License Info |
Json summary of your subscription quota. |
ModifiedDate |
string Modified Date |
This property represents the date and time the PDF was last modified |
NumberofPages |
integer Number of Pages |
The number of pages in the PDF file. |
PDFversion |
number PDF Version |
The version of the PDF specification the document was built against. |
Producer |
string Producer |
The product that created the PDF. In the early days of PDF people would use a Creator application like Microsoft Word to write a document, print it to a PostScript file and then the Producer would be Acrobat Distiller, the application that converted the PostScript file to a PDF. Nowadays Creator and Producer are often the same or one field is left blank. |
Subject |
string Subject |
What is the document about. |
Title |
string Title |
The title of the document. |
Trapped |
string Trapped |
This property is a Boolean value that indicates whether the document has been trapped. Trapping is a pre-press process which introduces color areas into color separations in order to obscure potential register errors. |
XmpMetadata |
string XMP Metadata |
The Extensible Metadata Platform (XMP) is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for digital documents and data sets. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file |
noTextTemplate |
string No Text Match Template |
Template for the text to be returned if a match is not found |
pageSeparator |
string Page Separator |
Provide a page separator so that you can know where the page breaks are. |
pagerange |
string Pages |
Provide a page range you want to extract text from, this can be a single page number (1), multiple page numbers separated by commas (1,2,3), a page range (1-4) or a mixture of all (1,2,4-7). |
sourceFileName |
string File Name |
The name of the source file |
textResultTemplate |
string Text Result Template |
Template for the text to be returned if a match is found |
zones |
array of (ZonesItem) Text |
List of variables that can be used to extract text information from PDF files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
expression |
array of (string) Value |
Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above. |
location |
string Location |
Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html] |
pagenumber |
integer Page (Deprecated) |
This property is deprecated, we advise you to use the Pages property. The Pages property applies to all zones and allows you select the pages you want to process. |
position |
string Select |
Use this to refine the text you extract more, select an option that matches your requirements Values: [text in zone, word after value, word before value, all text in line after value, all text in line before value, all text in zone after value, all text in zone before value] |
regex |
string Pattern |
If a regular expression is provided here, we will match any extracted text to it and return the match. |
Summary:
Description: Parameters for OCR operation
Properties:
Name |
Type |
Summary |
aquaforestImageTimeout |
integer(int32) AquaforestImageTimeout |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
author |
string Author |
Set a custom Author in the output PDF document properties. |
autorotate |
boolean Auto-rotate |
Auto rotate the image – this will ensure all text oriented normally |
binarize |
integer(int32) Binarize |
This value should generally only be used under guidance from technical support. It can control the way that color images are processed and force binarization with a particular threshold. A value of 200 has been shown to generally give good results in testing, but this should be confirmed with "typical" customer documents. By setting this to -1 an alternative method is used which will attempt to separate the text from any background images or colors. This can give improved OCR results for certain documents such as newspaper and magazine pages. |
blackPixelLimit |
number(float) Black pixel limit |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
blankPageThreshold |
integer(int32) Blank page threshold |
Use this to set the minimum number of "On Pixels" that must be present in the image for a page not to be considered blank. A value of -1 will turn off blank page detection. |
boxSize |
integer(int32) Box size |
This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (in pixels and defined by this property). This option is currently only applied for bi-tonal images. |
convertToTiff |
boolean ConvertToTiff |
Each page in the PDF document is rasterized to a TIFF image. |
createProcess |
boolean CreateProcess |
Set this to true if you want to launch process through pinvoke. |
creationDate |
string Creation Date |
Set a custom creation date in the output PDF document properties. The date string must be in the format 'yyyy-MM-dd HH:mm:ss'. |
deskew |
boolean Deskew |
Deskew (straighten) the image. |
despeckle |
integer(int32) Despeckle |
This removes all disconnected elements within the image that have height or width in pixels less than the specified figure. The maximum value is 9 and the default value is 0. |
dictionaryLookup |
integer(int32) DictionaryLookup |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
dotmatrix |
boolean Dotmatrix |
Set this to true to improve recognition of dot-matrix fonts. Default value is false. If set to true for non dot-matrix fonts then the recognition can be poor. |
enableDebugOutput |
boolean Enable debug output |
Enables debug output. |
enableMrc |
boolean Compress PDF (MRC) |
This enables Mixed Raster Compression which can dramatically reduce the output size of PDFs comprising color scans. Note that this option is only suitable when the source is not a PDF or using ConvertToTiff. |
enablePDFAOutput |
boolean PDF/A Output |
Whether or not to output as PDF/A. |
errorMode |
integer(int32) Error mode |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
fileContent |
string(byte) Source file content |
Content of the file to OCR |
fileNameWithExtension |
string Source file name with extension |
The source file name with extension or just the extension (with a leading period '.') |
flipDetect |
integer(int32) Flip detect |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
grayscaleQuality |
integer(int32) Grayscale quality |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
heuristics |
integer(int32) Heuristics |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
jbig2EncFlags |
string(string) Jbig2EncFlags |
These are the flags that will be passed to the application used to generate JBIG2 versions of images used in PDF generation (assuming this compression is enabled). This option should generally only be used under guidance from technical support. |
language |
string(enum) Language |
Selecting one of the option below sets the language to be used for the OCR processing. The default language is English. Values: [English, German, French, Russian, Swedish, Spanish, Italian, Russian_English, Ukrainian, Serbian, Croatian, Polish, Danish, Portuguese, Dutch, Czech, Roman, Hungar, Bulgar, Slovenian, Latvian, Lithuanian, Estonian, Turkish] |
libTiffSavePageAsBmp |
boolean LibTiffSavePageAsBmp |
Sometimes if there is an image which is 1bpp and has LZW compression, the pre-processing can cause the colour of the image to be inverted (black to white and white to black). Set this to true to avoid this. |
maxDeskew |
number(float) Maximum deskew |
Maximum angle by which a page will be deskewed. This option should generally only be used under guidance from technical support (support@aquaforest.com). |
minDeskewConfidence |
number(float) Minimum deskew confidence |
This option should generally only be used under guidance from technical support (support@aquaforest.com). |
modifiedDate |
string Modified Date |
Set a custom modified date in the output PDF document properties. The date string must be in the format 'yyyy-MM-dd HH:mm:ss'. |
morph |
string(string) Morph |
Morphological options that will be applied to the binarized image before OCR. If set to empty none is applied. Common options include those listed below but for more options please contact support@aquaforest.com. |
mrcBackgroundFactor |
integer(int32) MrcBackgroundFactor |
Sampling size for the background portion of the image. The higher the number, the larger the size of the image blocks used for averaging which will result in a reduction in size but also quality. Default value is 3 |
mrcForegroundFactor |
integer(int32) MrcForegroundFactor |
Sampling size for the foreground portion of the image. The higher the number, the larger the size of the image blocks used for averaging which will result in a reduction in size but also quality. Default value is 3 |
mrcQuality |
integer(int32) MrcQuality |
JPEG quality setting (percentage value 1 - 100) for use in saving the background and foreground images. Default value is 75 |
mrcTimeout |
integer(int32) MrcTimeout |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
noPictures |
boolean NoPictures |
By default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as "graphic" or "picture" areas but that actually do contain useful text. Setting NoPictures to True will cause it to ignore areas identified as pictures whilst setting it to False will force OCR of areas identified as pictures. |
ocrProcessSetupTimeout |
integer(int32) OcrProcessSetupTimeout |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
ocrTimeout |
integer(int32) OcrTimeout |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
password |
string Password |
The password to open the source PDF file |
pdfToImageBpp |
string(enum) PdfToImageBpp |
The Bits Per Pixel to use for the rasterized PDF page when using engine 1. This only applies for documents that are processed using ConvertToTiff. The default value for this property is taken from the PDF page. Values: [Bpp_1, Bpp_24] |
pdfToImageCompression |
string(enum) PdfToImageCompression |
The compression to set to the images extracted or rasterized from each page of the source PDF file. These images are then OCRed to create the searchable PDF. The default value for this property is taken from each page in the source PDF file. Values: [CCITT4, LZW] |
pdfToImageDpi |
string(enum) PdfToImageDpi |
The DPI to set to the images rasterized from each page of the source PDF file. These images are then OCRed to create the searchable PDF. The default value for this property is taken from each page in the source PDF file. Values: [DPI_72, DPI_100, DPI_150, DPI_200, DPI_300, DPI_400, DPI_500, DPI_600] |
pdfToImageForceVectorCheck |
boolean PdfToImageForceVectorCheck |
This setting is useful when dealing with documents that contains vector objects (e.g. CAD drawings). By default, pages that contain only vector objects are rasterized. Pages that do not have any images but contain vector objects as well as electronic text are skipped from rasterization. However, sometimes there can be a page that contains vector objects (CAD drawings) but its title may be in electronic text. To force rasterizing pages like these, set this property to true. |
pdfToImageIncludeText |
boolean PdfToImageIncludeText |
When set to False this will prevent the conversion of real text (i.e. electronically generated as opposed to text that is part of a scanned image) from being rendered in the page images extracted from the PDF. This is because the text is already searchable and so generally does not require OCR. The value can be set to True however if the OCR is required on this real text. |
pdfToImageMaxRes |
integer(int32) PdfToImageMaxRes |
The maximum resolution of the rasterized images. If the resolution retrieved from the PDF page is bigger than this value, it will be set to this value. The default value for this property is 600. |
pdfToImageMinRes |
integer(int32) PdfToImageMinRes |
The minimum resolution of the rasterized images. If the resolution retrieved from the PDF page is lower than this value, it will be set to this value. The default value for this property is 200. |
pdfaVersion |
string(enum) PDF/A Version |
The PDF/A version. Values: [PDF_A1b, PDF_A2b, PDF_A3b] |
pipeClientConnectionTimeout |
integer(int32) PipeClientConnectionTimeout |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
removeBlankPage |
boolean RemoveBlankPage |
Remove blank pages when BlankPageThreshold is greater than -1 and ConvertToTiff is true. |
removeLines |
boolean RemoveLines |
Remove lines from images fpr better recognition. |
restartEngineEvery |
integer(int32) RestartEngineEvery |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
retainBookmarks |
boolean Retain bookmarks |
Retains any bookmarks from the source file in the output when using ConvertToTiff. |
retainCreationDate |
boolean Retain creation date |
Retains the creation date of the source file in the output PDF document properties. |
retainMetadata |
boolean Retain metadata |
Retains any metadata from the source file in the output when using ConvertToTiff. |
retainModifiedDate |
boolean Retain modified date |
Retains the modified date of the source file in the output PDF document properties. |
retainViewerPreferences |
boolean Retain viewer preferences |
Retains any PDF Viewer Preferences, Page Mode and Page Layout from source file in the output when using ConvertToTiff. |
savePredespeckle |
boolean SavePredespeckle |
This will use the original image (i.e. before applying pre-processing) in the output PDF. |
tables |
boolean Tables |
This option when set to true, tries to OCR within table cells. |
textLayerFilterHeight |
integer(int32) TextLayerFilterHeight |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterHeightInverted |
integer(int32) TextLayerFilterHeightInverted |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterPercentage |
number(float) TextLayerFilterPercentage |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterPercentageInverted |
number(float) TextLayerFilterPercentageInverted |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterRatio |
number(float) TextLayerFilterRatio |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterRatioInverted |
number(float) TextLayerFilterRatioInverted |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterWidth |
integer(int32) TextLayerFilterWidth |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerFilterWidthInverted |
integer(int32) TextLayerFilterWidthInverted |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
textLayerMaxBoxes |
integer(int32) TextLayerMaxBoxes |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
tidyUpMode |
integer(int32) Tidy-up mode |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
validatePDFA |
boolean Validate PDF/A |
Whether or not to validate the PDF/A document after conversion |
wordMatchThreshold |
number(float) Word match threshold |
Contact technical support (support@aquaforest.com) for guidance on using this property. |
Summary:
Description: Response data for OCR operation
Properties:
Name |
Type |
Summary |
ErrorMessage |
string Error message |
If the value of Is Successful is false, we will return an Error Message |
IsSuccessful |
boolean Is Successful |
Returns true if the OCR was successful. |
LicenceInfo |
string License Info |
Information about your API subscription key |
LogFileContent |
string(byte) Log file content |
The log contents of the operation |
OutputFileContent |
string(byte) Processed file content |
File generated by the Aquaforest PDF converter. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file |
fileNameTemplate |
string File Name Template |
Template for the output file if barcode is found |
noMatch |
string Pages with no Match |
Depending on the split option you choose above, some pages will have no barcode value. Choose what to do the these pages. Values: [Do not copy to output, Copy to output, Copy to output and rename] |
noTextFileName |
string No Barcode Match Template |
Template for the output file if no barcode is found |
sourceFileName |
string File Name |
The name of the source file |
splitOption |
string Output File Options |
Use this to refine the text you extract more, select an option that matches your requirements Values: [Barcode on first page, Barcode on last page, Remove barcode page] |
zones |
array of (ZonesItem) Barcode |
List of variables that can be used to extract barcode information from PDF files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
barcodeFormats |
array of (string) Type |
Specify the types of Barcode you want to identify |
location |
string Location |
Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html] |
regex |
string Pattern |
If a regular expression is provided here, we will match any extracted barcode to it and return the match. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file. |
fileNameTemplate |
string Output File Name |
Target file template which can include %UNIQUEn (unique number starting at 1, zero padded to n digits) and %FILENAME (original filename without the extension). |
sourceFileName |
string File Name |
The name of the source file. |
splitOption |
string Split Type |
Choose the split operation to use for each file. Values: [Split into Single Pages, Split by Page Range, Split by Repeating Range, Split by Top Level Bookmark] |
pageRange |
string Page Range |
Set of page ranges separated by commas that defines which pages from the original should be extracted. |
repeatEvery |
integer Repeat Every (Pages) |
Apply the page range to each set of Page Ranges pages within the document. For example if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages. |
retainBookmarks |
boolean Retain bookmarks |
Generated files will include bookmarks from the original file. |
retainMetadata |
boolean Retain metadata |
Generated files will include metadata(such as Author and Title) from the original file. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
fileContent |
string(byte) File Content |
The content of the source file |
fileNameTemplate |
string File Name Template |
Template for the output file if the text matches are found |
noMatch |
string Pages with no Match |
Depending on the split option you choose above, some pages will have no text value extracted. Choose what to do the these pages. Values: [Do not copy to output, Copy to output, Copy to output and rename] |
noTextFileName |
string No File Template |
Template for the output file if no text match is found |
sourceFileName |
string File Name |
The name of the source file |
splitOption |
string Output File Options |
Choose the location of the page with the barcode in the output files from the split operation. Values: [Page that matches text on first page, Page that matches text on last page, Remove the page that matches text] |
zones |
array of (ZonesItem) Text |
List of variables that can be used to extract text information from PDF files |
Summary:
Description:
Properties:
Name |
Type |
Summary |
expression |
array of (string) Value |
Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above. |
location |
string Location |
Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html] |
position |
string Select |
Use this to refine the text you extract more, select an option that matches your requirements Values: [text in zone, word after value, word before value, all text in line after value, all text in line before value, all text in zone after value, all text in zone before value] |
regex |
string Pattern |
If a regular expression is provided here, we will match any extracted text to it and return the match. |