Connectors Reference

Aquaforest PDF

Aquaforest PDF connector contains a group of actions that performs different PDF operations like splitting, text extraction, barcode extraction and OCR operations for Office 365 and Power Automate.

 

Status: Preview

Tier: Premium

Version: 1.0

 

Actions:

Name

Summary

Get-barcode-value ([Optional]GetBarcode getBarcode)

Get barcode value

Get-pdf-information ([Optional]GetPDFInfoRequest getPDFInfoRequest)

Get PDF properties

Get-data-from-pdf ([Optional]GetDataFromPDFRequest getDataFromPDFRequest)

Get data from PDF

Get-text-value ([Optional]GetText getText)

Get text from PDF

Ocr-file-to-pdf ([Optional]ocr_data ocr_data)

OCR PDF or images

Split-by-barcode ([Optional]SplitBarcode splitBarcode)

Split PDF by barcode

Split-by-text ([Optional]SplitText splitText)

Split PDF by text match

Extract-by-barcode ([Optional]ExtractBarcode extractBarcode)

Extract PDF pages by barcode

Extract-by-text ([Optional]ExtractText extractText)

Extract PDF pages by text

Split-by-page-range ([Optional]SplitPdfByPageDefinition splitPdfByPage)

Split PDF by page

GetDataSchema (string operation, [Optional]string expectedKeys)

Gets the item schema of the selected list

 

Triggers:

Name

Summary

 

Objects:

Name

Summary

ApiExtractPost200ApplicationJsonResponse

 

ApiGetTextValueJsonResponse

 

ApiRenameByBarcodePost200ApplicationJsonResponse

 

ApiSplitPost200ApplicationJsonResponse

 

ExtractBarcode

 

ExtractText

 

GetBarcode

 

GetDataFromPDFRequest

 

GetDataSchemaResponse

 

GetPDFDataDynamicResponseSchema

 

GetPDFInfoRequest

 

GetPDFInfoResponse

 

GetText

 

ocr_data

 

ocr_response

 

SplitBarcode

 

SplitPdfByPageDefinition

 

SplitText

 

 

Actions:

Get-barcode-value

Summary: Get barcode value

Description: Get Barcode From PDF. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.

 

Syntax:

AquaforestPDF.Get-barcode-value ([Optional]GetBarcode getBarcode)

 

Parameters:

Name

Type

Summary

Required

Related Action

getBarcode

GetBarcode

 

 

False

 

Returns:

          Type:ApiRenameByBarcodePost200ApplicationJsonResponse

 

Get-pdf-information

Summary: Get PDF properties

Description: Gets the information about a PDF file

 

Syntax:

AquaforestPDF.Get-pdf-information ([Optional]GetPDFInfoRequest getPDFInfoRequest)

 

Parameters:

Name

Type

Summary

Required

Related Action

getPDFInfoRequest

GetPDFInfoRequest

 

 

False

 

Returns:

          Type:GetPDFInfoResponse

 

Get-data-from-pdf

Summary: Get data from PDF

Description: This action will extract important data from PDF files in the form of Key/Value pairs.

 

Syntax:

AquaforestPDF.Get-data-from-pdf ([Optional]GetDataFromPDFRequest getDataFromPDFRequest)

 

Parameters:

Name

Type

Summary

Required

Related Action

getDataFromPDFRequest

GetDataFromPDFRequest

 

 

False

 

Returns:

          Type:GetPDFDataDynamicResponseSchema

 

Get-text-value

Summary: Get text from PDF

Description: Get Text From PDF files based on the text location and regular expressions. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.

 

Syntax:

AquaforestPDF.Get-text-value ([Optional]GetText getText)

 

Parameters:

Name

Type

Summary

Required

Related Action

getText

GetText

 

 

False

 

Returns:

          Type:ApiGetTextValueJsonResponse

 

Ocr-file-to-pdf

Summary: OCR PDF or images

Description: Generate searchable PDF from an image PDF or scanned images. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.

 

Syntax:

AquaforestPDF.Ocr-file-to-pdf ([Optional]ocr_data ocr_data)

 

Parameters:

Name

Type

Summary

Required

Related Action

ocr_data

ocr_data

 

Parameters for OCR operation

False

 

Returns:

          Type:ocr_response

          Description: Response data for OCR operation

 

Split-by-barcode

Summary: Split PDF by barcode

Description: Splits PDF files based on barcode matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for documentation. In addition the Aquaforest Zonal Extraction Tool is available at [https://www.aquaforest.com/en/zone/get-pdf-zone.html].

 

Syntax:

AquaforestPDF.Split-by-barcode ([Optional]SplitBarcode splitBarcode)

 

Parameters:

Name

Type

Summary

Required

Related Action

splitBarcode

SplitBarcode

 

 

False

 

Returns:

          Type:ApiSplitPost200ApplicationJsonResponse

 

Split-by-text

Summary: Split PDF by text match

Description: Splits PDF files based on text matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for documentation. In addition the Aquaforest Zonal Extraction Tool is available at [https://www.aquaforest.com/en/zone/get-pdf-zone.html].

 

Syntax:

AquaforestPDF.Split-by-text ([Optional]SplitText splitText)

 

Parameters:

Name

Type

Summary

Required

Related Action

splitText

SplitText

 

 

False

 

Returns:

          Type:ApiSplitPost200ApplicationJsonResponse

 

Extract-by-barcode

Summary: Extract PDF pages by barcode

Description: Extract PDF files based on barcode matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.

 

Syntax:

AquaforestPDF.Extract-by-barcode ([Optional]ExtractBarcode extractBarcode)

 

Parameters:

Name

Type

Summary

Required

Related Action

extractBarcode

ExtractBarcode

 

 

False

 

Returns:

          Type:ApiExtractPost200ApplicationJsonResponse

 

Extract-by-text

Summary: Extract PDF pages by text

Description: Extract PDF files based on text matches defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for more information.

 

Syntax:

AquaforestPDF.Extract-by-text ([Optional]ExtractText extractText)

 

Parameters:

Name

Type

Summary

Required

Related Action

extractText

ExtractText

 

 

False

 

Returns:

          Type:ApiExtractPost200ApplicationJsonResponse

 

Split-by-page-range

Summary: Split PDF by page

Description: Splits PDF files based on split options defined by the user. Visit [https://www.aquaforest.com/en/aquaforest-flow-doc.asp] for documentation.

 

Syntax:

AquaforestPDF.Split-by-page-range ([Optional]SplitPdfByPageDefinition splitPdfByPage)

 

Parameters:

Name

Type

Summary

Required

Related Action

splitPdfByPage

SplitPdfByPageDefinition

 

 

False

 

Returns:

          Type:ApiSplitPost200ApplicationJsonResponse

 

GetDataSchema

Summary: Gets the item schema of the selected list

Description: Gets the schema of the selected list

 

Syntax:

AquaforestPDF.GetDataSchema (string operation, [Optional]string expectedKeys)

 

Parameters:

Name

Type

Summary

Required

Related Action

operation

string

 

 

True

expectedKeys

string

 

 

False

 

Returns:

          Type:GetDataSchemaResponse

 


 

ApiExtractPost200ApplicationJsonResponse

Summary:

Description:

 

          Properties:

Name

Type

Summary

ErrorMessage

string

Error

If the value of Is Successful is false, we will return an Error Message

IsSuccessful

boolean

Is Successful

This will return true if at least one page was extracted

LicenceInfo

string

License Info

Information about your API subscription key

SplittedFile

array of (SplittedFileItem)

Extract Output Files

Array of Extracted Files

 

SplittedFileItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

SplitFileContent

string(byte)

(File Content)

A base 64 string representing the File Content

SplitFileName

string

(File Name)

A string containing the generated File Name

pageNumber

string

(Page Number)

The page range containing the page number where the extraction occurred

 


 

ApiGetTextValueJsonResponse

Summary:

Description:

 

          Properties:

Name

Type

Summary

ErrorMessage

string

Error Message

If the value of Is Successful is false, we will return an Error Message

IsSuccessful

boolean

Is Successful

If the Text was matched successfully

LicenceInfo

string

License Info

Information about your API subscription key

TextResult

string

Text Result

A string generated from applying the extracted text to the Text Result Template provided. Note if the page count is greater than one will concatenate all the pages using the Page Separator.

TextResults

array of (TextResultsItem)

Results

An array containing a list of pages and the extracted text values

 

TextResultsItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

pageNumber

string

(Page Number)

The page where the text was found

valueExtracted

string

(Page Text)

A string generated from applying the extracted text to the Text Result Template provided.

zoneValues

array of (string)

(Zone Values)

An array containing the text extracted from each zone.

 


 

ApiRenameByBarcodePost200ApplicationJsonResponse

Summary:

Description:

 

          Properties:

Name

Type

Summary

BarcodeResult

string

Barcode

A string generated from applying the extracted text to the barcode Result Template provided. Note if the page count is greater than one will concatenate all the pages using the Page Separator.

BarcodeResults

array of (BarcodeResultsItem)

Results

An array containing a list of pages and the extracted barcode values

ErrorMessage

string

Error Message

If the value of Is Successful is false, we will return an Error Message

IsSuccessful

boolean

Is Successful

If a barcode was detected

LicenceInfo

string

License Info

Information about your API subscription key

 

BarcodeResultsItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

pageNumber

string

(Page Number)

The page where the barcode was found

valueExtracted

string

(Page Barcode)

A string generated from applying the extracted barcode value to the barcode Result Template provided.

zoneValues

array of (string)

(Zone Values)

An array containing the barcode extracted from each zone.

 


 

ApiSplitPost200ApplicationJsonResponse

Summary:

Description:

 

          Properties:

Name

Type

Summary

ErrorMessage

string

Error Message

If the value of Is Successful is false, we will return an Error Message

IsSuccessful

boolean

Is Successful

This will return true if at least one split page was matched.

LicenceInfo

string

License Info

Information about your API subscription key

SplittedFile

array of (SplittedFileItem)

Split Output Files

Array containing each of the split files together with details like the generated file name and page number.

 

SplittedFileItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

SplitFileContent

string(byte)

(File Content)

A base 64 string representing the File Content

SplitFileName

string

(File Name)

A string containing the generated File Name

pageNumber

string

(Page Range)

The page range containing the page numbers of the split operation

 


 

ExtractBarcode

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file

fileNameTemplate

string

File Name Template

Template for the output file if barcode is found

noTextFileName

string

No File Template

Template for the output file if no barcode is found

sourceFileName

string

File Name

The name of the source file

zones

array of (ZonesItem)

Barcode

List of variables that can be used to extract barcode information from PDF files

 

ZonesItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

barcodeFormats

array of (string)

Type

Specify the types of Barcode you want to identify

location

string

Location

Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html]

regex

string

Pattern

If a regular expression is provided here, we will match any extracted barcode to it and return the match.

 


 

ExtractText

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file

fileNameTemplate

string

File Name Template

Template for the output file if the text matches are found

noTextFileName

string

No File Template

Template for the output file if no text match is found

sourceFileName

string

File Name

The name of the source file

zones

array of (ZonesItem)

Text

List of variables that can be used to extract text information from PDF files

 

ZonesItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

expression

array of (string)

Value

Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above.

location

string

Location

Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html]

position

string

Select

Use this to refine the text you extract more, select an option that matches your requirements  Values: [text in zone, word after value, word before value, all text in line after value, all text in line before value, all text in zone after value, all text in zone before value]

regex

string

Pattern

If a regular expression is provided here, we will match any extracted text to it and return the match.

 


 

GetBarcode

Summary:

Description:

 

          Properties:

Name

Type

Summary

barcodeResultTemplate

string

Barcode Result Template

Template for the output text result if a barcode is found

fileContent

string(byte)

File Content

The content of the source file

noBarcodeTemplate

string

No Barcode Template

Template for the output text result if no barcode is found

pageSeparator

string

Page Separator

Provide a page separator so that you can know where the page breaks are.

pagerange

string

Pages

Provide a page range you want to extract text from, this can be a single page number (1), multiple page numbers separated by commas (1,2,3), a page range (1-4) or a mixture of all (1,2,4-7).

sourceFileName

string

File Name

The name of the source file

zones

array of (ZonesItem)

Barcode

List of variables that can be used to extract barcode information from PDF files

 

ZonesItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

barcodeFormats

array of (string)

Type

Specify the types of Barcode you want to identify

location

string

Location

Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html]

pagenumber

integer

Page (Deprecated)

This property is deprecated, we advise you to use the Pages property. The Pages property applies to all zones and allows you select the pages you want to process.

regex

string

Pattern

If a regular expression is provided here, we will match any extracted text to it and return the match.

 


 

GetDataFromPDFRequest

Summary:

Description:

 

          Properties:

Name

Type

Summary

advancedFlags

array of (AdvancedFlagsItem)

Advanced Settings

This is used to pass advanced parameters to this action. Do not use this property unless directed by our support team.

confidenceScore

number

Confidence Score

Set a higher confidence score to filter out values with lower confidence. You can set any value between 0 and 1. We recommend starting from 0.5

dateAsISO

string

Date Conversion

Select which format to return date value as  Values: [Do not convert date values, ISO conversion (DMY input assumed), ISO conversion (MDY input assumed)]

expectedKeys

string

Expected Keys

Provide one key name per line to make values available to later actions without parsing JSON.

fileContent

string(byte)

File Content

The content of the source file

pageLimit

integer

Page Limit

Maximum number of pages to be processed

pageRange

string

Page Range

A string representation of the page numbers you want to process. E.g 1,3-4

stripCurrencySymbol

boolean

Strip Currency Symbol

Set this to true if you want the symbols and strings to be removed before we return currency values

synonym

boolean

Match Synonym

Set this to true if you want us to return all the keys that are synonyms to the expected key.

synonymDictionary

string

Synonym Dictionary

You can provide a JSON array of “entry” objects, where each object contains a list of synonyms in an array. For instance, if you want “Invoice No” and “Invoice Number” (case-insensitive) to be interpreted as the same key, use the following JSON: [{'entry': [ 'Invoice No', 'invoice number' ]}]

trimSymbols

boolean

Trim Symbols

Set this to true if you want us to remove all leading and trailing symbols from the keys found before we match them to an expected key.

 

AdvancedFlagsItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

settingName

string

Name

Enter the name of the setting here

settingValue

string

Value

Enter the value of the setting here.

 


 

GetDataSchemaResponse

Summary:

Description:

 

          Properties:

Name

Type

Summary


 

GetPDFDataDynamicResponseSchema

Summary:

Description:

 

          Properties:

Name

Type

Summary


 

GetPDFInfoRequest

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file

pageLimit

integer

Page Limit

Maximum number of pages to be processed, this is only used to check if pages contain hidden text or to check if PDF is searchable


 

GetPDFInfoResponse

Summary:

Description:

 

          Properties:

Name

Type

Summary

AllowAssembly

boolean

Allow Assembly

Allow rotation, insertion or deletion of pages.

AllowDegradedPrinting

boolean

Allow Degraded Printing

Allow low-quality printing.

AllowExtractContents

boolean

Allow Extract Contents

Allow extraction of text and graphics.

AllowExtractForAccessibility

boolean

Allow Extract For Accessibility

Allow extraction of text and graphics in support of accessibility.

AllowFillInForm

boolean

Allow Fill In Form

Allow filling of form fields.

AllowModifyAnnotations

boolean

Allow Modify Annotations

Allow modification of annotations.

AllowModifyContents

boolean

Allow Modify Contents

Allow modification of contents.

AllowPrinting

boolean

Allow Printing

Allow high-quality printing.

Author

string

Author

Who created the document.

CreationDate

string

Creation Date

This is the date and time the PDF was created.

Creator

string

Creator

The originating application or library.

ErrorMessage

string

Error message

If the value of Is Successful is false, we will return an Error Message

FileSize

number

File Size (bytes)

The size of the file in bytes

HasHiddenText

boolean

Has Hidden Text

This will return true if the PDF file has an OCR layer.

IsEncrypted

boolean

Is Encrypted

This will return true if this document is encrypted or not.

IsSearchable

boolean

Is Searchable

This will return true if the PDF file is searchable.

IsSuccessful

boolean

Is Successful

Returns true if the action was successful.

Keywords

string

Keywords

Keywords can be comma separated.

LicenceInfo

string

License Info

Json summary of your subscription quota.

ModifiedDate

string

Modified Date

This property represents the date and time the PDF was last modified

NumberofPages

integer

Number of Pages

The number of pages in the PDF file.

PDFversion

number

PDF Version

The version of the PDF specification the document was built against.

Producer

string

Producer

The product that created the PDF. In the early days of PDF people would use a Creator application like Microsoft Word to write a document, print it to a PostScript file and then the Producer would be Acrobat Distiller, the application that converted the PostScript file to a PDF. Nowadays Creator and Producer are often the same or one field is left blank.

Subject

string

Subject

What is the document about.

Title

string

Title

The title of the document.

Trapped

string

Trapped

This property is a Boolean value that indicates whether the document has been trapped. Trapping is a pre-press process which introduces color areas into color separations in order to obscure potential register errors.

XmpMetadata

string

XMP Metadata

The Extensible Metadata Platform (XMP) is an ISO standard, originally created by Adobe Systems Inc., for the creation, processing and interchange of standardized and custom metadata for digital documents and data sets.


 

GetText

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file

noTextTemplate

string

No Text Match Template

Template for the text to be returned if a match is not found

pageSeparator

string

Page Separator

Provide a page separator so that you can know where the page breaks are.

pagerange

string

Pages

Provide a page range you want to extract text from, this can be a single page number (1), multiple page numbers separated by commas (1,2,3), a page range (1-4) or a mixture of all (1,2,4-7).

sourceFileName

string

File Name

The name of the source file

textResultTemplate

string

Text Result Template

Template for the text to be returned if a match is found

zones

array of (ZonesItem)

Text

List of variables that can be used to extract text information from PDF files

 

ZonesItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

expression

array of (string)

Value

Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above.

location

string

Location

Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html]

pagenumber

integer

Page (Deprecated)

This property is deprecated, we advise you to use the Pages property. The Pages property applies to all zones and allows you select the pages you want to process.

position

string

Select

Use this to refine the text you extract more, select an option that matches your requirements  Values: [text in zone, word after value, word before value, all text in line after value, all text in line before value, all text in zone after value, all text in zone before value]

regex

string

Pattern

If a regular expression is provided here, we will match any extracted text to it and return the match.

 


 

ocr_data

Summary:

Description: Parameters for OCR operation

 

          Properties:

Name

Type

Summary

aquaforestImageTimeout

integer(int32)

AquaforestImageTimeout

Contact technical support (support@aquaforest.com) for guidance on using this property.

author

string

Author

Set a custom Author in the output PDF document properties.

autorotate

boolean

Auto-rotate

Auto rotate the image – this will ensure all text oriented normally

binarize

integer(int32)

Binarize

This value should generally only be used under guidance from technical support. It can control the way that color images are processed and force binarization with a particular threshold. A value of 200 has been shown to generally give good results in testing, but this should be confirmed with "typical" customer documents. By setting this to -1 an alternative method is used which will attempt to separate the text from any background images or colors. This can give improved OCR results for certain documents such as newspaper and magazine pages.

blackPixelLimit

number(float)

Black pixel limit

Contact technical support (support@aquaforest.com) for guidance on using this property.

blankPageThreshold

integer(int32)

Blank page threshold

Use this to set the minimum number of "On Pixels" that must be present in the image for a page not to be considered blank. A value of -1 will turn off blank page detection.

boxSize

integer(int32)

Box size

This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (in pixels and defined by this property). This option is currently only applied for bi-tonal images.

convertToTiff

boolean

ConvertToTiff

Each page in the PDF document is rasterized to a TIFF image.

createProcess

boolean

CreateProcess

Set this to true if you want to launch process through pinvoke.

creationDate

string

Creation Date

Set a custom creation date in the output PDF document properties. The date string must be in the format 'yyyy-MM-dd HH:mm:ss'.

deskew

boolean

Deskew

Deskew (straighten) the image.

despeckle

integer(int32)

Despeckle

This removes all disconnected elements within the image that have height or width in pixels less than the specified figure. The maximum value is 9 and the default value is 0.

dictionaryLookup

integer(int32)

DictionaryLookup

Contact technical support (support@aquaforest.com) for guidance on using this property.

dotmatrix

boolean

Dotmatrix

Set this to true to improve recognition of dot-matrix fonts. Default value is false. If set to true for non dot-matrix fonts then the recognition can be poor.

enableDebugOutput

boolean

Enable debug output

Enables debug output.

enableMrc

boolean

Compress PDF (MRC)

This enables Mixed Raster Compression which can dramatically reduce the output size of PDFs comprising color scans. Note that this option is only suitable when the source is not a PDF or using ConvertToTiff.

enablePDFAOutput

boolean

PDF/A Output

Whether or not to output as PDF/A.

errorMode

integer(int32)

Error mode

Contact technical support (support@aquaforest.com) for guidance on using this property.

fileContent

string(byte)

Source file content

Content of the file to OCR

fileNameWithExtension

string

Source file name with extension

The source file name with extension or just the extension (with a leading period '.')

flipDetect

integer(int32)

Flip detect

Contact technical support (support@aquaforest.com) for guidance on using this property.

grayscaleQuality

integer(int32)

Grayscale quality

Contact technical support (support@aquaforest.com) for guidance on using this property.

heuristics

integer(int32)

Heuristics

Contact technical support (support@aquaforest.com) for guidance on using this property.

jbig2EncFlags

string(string)

Jbig2EncFlags

These are the flags that will be passed to the application used to generate JBIG2 versions of images used in PDF generation (assuming this compression is enabled). This option should generally only be used under guidance from technical support.

language

string(enum)

Language

Selecting one of the option below sets the language to be used for the OCR processing. The default language is English.  Values: [English, German, French, Russian, Swedish, Spanish, Italian, Russian_English, Ukrainian, Serbian, Croatian, Polish, Danish, Portuguese, Dutch, Czech, Roman, Hungar, Bulgar, Slovenian, Latvian, Lithuanian, Estonian, Turkish]

libTiffSavePageAsBmp

boolean

LibTiffSavePageAsBmp

Sometimes if there is an image which is 1bpp and has LZW compression, the pre-processing can cause the colour of the image to be inverted (black to white and white to black). Set this to true to avoid this.

maxDeskew

number(float)

Maximum deskew

Maximum angle by which a page will be deskewed. This option should generally only be used under guidance from technical support (support@aquaforest.com).

minDeskewConfidence

number(float)

Minimum deskew confidence

This option should generally only be used under guidance from technical support (support@aquaforest.com).

modifiedDate

string

Modified Date

Set a custom modified date in the output PDF document properties. The date string must be in the format 'yyyy-MM-dd HH:mm:ss'.

morph

string(string)

Morph

Morphological options that will be applied to the binarized image before OCR. If set to empty none is applied. Common options include those listed below but for more options please contact support@aquaforest.com.

mrcBackgroundFactor

integer(int32)

MrcBackgroundFactor

Sampling size for the background portion of the image. The higher the number, the larger the size of the image blocks used for averaging which will result in a reduction in size but also quality. Default value is 3

mrcForegroundFactor

integer(int32)

MrcForegroundFactor

Sampling size for the foreground portion of the image. The higher the number, the larger the size of the image blocks used for averaging which will result in a reduction in size but also quality. Default value is 3

mrcQuality

integer(int32)

MrcQuality

JPEG quality setting (percentage value 1 - 100) for use in saving the background and foreground images. Default value is 75

mrcTimeout

integer(int32)

MrcTimeout

Contact technical support (support@aquaforest.com) for guidance on using this property.

noPictures

boolean

NoPictures

By default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as "graphic" or "picture" areas but that actually do contain useful text. Setting NoPictures to True will cause it to ignore areas identified as pictures whilst setting it to False will force OCR of areas identified as pictures.

ocrProcessSetupTimeout

integer(int32)

OcrProcessSetupTimeout

Contact technical support (support@aquaforest.com) for guidance on using this property.

ocrTimeout

integer(int32)

OcrTimeout

Contact technical support (support@aquaforest.com) for guidance on using this property.

password

string

Password

The password to open the source PDF file

pdfToImageBpp

string(enum)

PdfToImageBpp

The Bits Per Pixel to use for the rasterized PDF page when using engine 1. This only applies for documents that are processed using ConvertToTiff. The default value for this property is taken from the PDF page.  Values: [Bpp_1, Bpp_24]

pdfToImageCompression

string(enum)

PdfToImageCompression

The compression to set to the images extracted or rasterized from each page of the source PDF file. These images are then OCRed to create the searchable PDF. The default value for this property is taken from each page in the source PDF file.  Values: [CCITT4, LZW]

pdfToImageDpi

string(enum)

PdfToImageDpi

The DPI to set to the images rasterized from each page of the source PDF file. These images are then OCRed to create the searchable PDF. The default value for this property is taken from each page in the source PDF file.  Values: [DPI_72, DPI_100, DPI_150, DPI_200, DPI_300, DPI_400, DPI_500, DPI_600]

pdfToImageForceVectorCheck

boolean

PdfToImageForceVectorCheck

This setting is useful when dealing with documents that contains vector objects (e.g. CAD drawings). By default, pages that contain only vector objects are rasterized. Pages that do not have any images but contain vector objects as well as electronic text are skipped from rasterization. However, sometimes there can be a page that contains vector objects (CAD drawings) but its title may be in electronic text. To force rasterizing pages like these, set this property to true.

pdfToImageIncludeText

boolean

PdfToImageIncludeText

When set to False this will prevent the conversion of real text (i.e. electronically generated as opposed to text that is part of a scanned image) from being rendered in the page images extracted from the PDF. This is because the text is already searchable and so generally does not require OCR. The value can be set to True however if the OCR is required on this real text.

pdfToImageMaxRes

integer(int32)

PdfToImageMaxRes

The maximum resolution of the rasterized images. If the resolution retrieved from the PDF page is bigger than this value, it will be set to this value. The default value for this property is 600.

pdfToImageMinRes

integer(int32)

PdfToImageMinRes

The minimum resolution of the rasterized images. If the resolution retrieved from the PDF page is lower than this value, it will be set to this value. The default value for this property is 200.

pdfaVersion

string(enum)

PDF/A Version

The PDF/A version.  Values: [PDF_A1b, PDF_A2b, PDF_A3b]

pipeClientConnectionTimeout

integer(int32)

PipeClientConnectionTimeout

Contact technical support (support@aquaforest.com) for guidance on using this property.

removeBlankPage

boolean

RemoveBlankPage

Remove blank pages when BlankPageThreshold is greater than -1 and ConvertToTiff is true.

removeLines

boolean

RemoveLines

Remove lines from images fpr better recognition.

restartEngineEvery

integer(int32)

RestartEngineEvery

Contact technical support (support@aquaforest.com) for guidance on using this property.

retainBookmarks

boolean

Retain bookmarks

Retains any bookmarks from the source file in the output when using ConvertToTiff.

retainCreationDate

boolean

Retain creation date

Retains the creation date of the source file in the output PDF document properties.

retainMetadata

boolean

Retain metadata

Retains any metadata from the source file in the output when using ConvertToTiff.

retainModifiedDate

boolean

Retain modified date

Retains the modified date of the source file in the output PDF document properties.

retainViewerPreferences

boolean

Retain viewer preferences

Retains any PDF Viewer Preferences, Page Mode and Page Layout from source file in the output when using ConvertToTiff.

savePredespeckle

boolean

SavePredespeckle

This will use the original image (i.e. before applying pre-processing) in the output PDF.

tables

boolean

Tables

This option when set to true, tries to OCR within table cells.

textLayerFilterHeight

integer(int32)

TextLayerFilterHeight

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterHeightInverted

integer(int32)

TextLayerFilterHeightInverted

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterPercentage

number(float)

TextLayerFilterPercentage

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterPercentageInverted

number(float)

TextLayerFilterPercentageInverted

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterRatio

number(float)

TextLayerFilterRatio

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterRatioInverted

number(float)

TextLayerFilterRatioInverted

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterWidth

integer(int32)

TextLayerFilterWidth

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerFilterWidthInverted

integer(int32)

TextLayerFilterWidthInverted

Contact technical support (support@aquaforest.com) for guidance on using this property.

textLayerMaxBoxes

integer(int32)

TextLayerMaxBoxes

Contact technical support (support@aquaforest.com) for guidance on using this property.

tidyUpMode

integer(int32)

Tidy-up mode

Contact technical support (support@aquaforest.com) for guidance on using this property.

validatePDFA

boolean

Validate PDF/A

Whether or not to validate the PDF/A document after conversion

wordMatchThreshold

number(float)

Word match threshold

Contact technical support (support@aquaforest.com) for guidance on using this property.


 

ocr_response

Summary:

Description: Response data for OCR operation

 

          Properties:

Name

Type

Summary

ErrorMessage

string

Error message

If the value of Is Successful is false, we will return an Error Message

IsSuccessful

boolean

Is Successful

Returns true if the OCR was successful.

LicenceInfo

string

License Info

Information about your API subscription key

LogFileContent

string(byte)

Log file content

The log contents of the operation

OutputFileContent

string(byte)

Processed file content

File generated by the Aquaforest PDF converter.


 

SplitBarcode

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file

fileNameTemplate

string

File Name Template

Template for the output file if barcode is found

noMatch

string

Pages with no Match

Depending on the split option you choose above, some pages will have no barcode value. Choose what to do the these pages.  Values: [Do not copy to output, Copy to output, Copy to output and rename]

noTextFileName

string

No Barcode Match Template

Template for the output file if no barcode is found

sourceFileName

string

File Name

The name of the source file

splitOption

string

Output File Options

Use this to refine the text you extract more, select an option that matches your requirements  Values: [Barcode on first page, Barcode on last page, Remove barcode page]

zones

array of (ZonesItem)

Barcode

List of variables that can be used to extract barcode information from PDF files

 

ZonesItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

barcodeFormats

array of (string)

Type

Specify the types of Barcode you want to identify

location

string

Location

Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html]

regex

string

Pattern

If a regular expression is provided here, we will match any extracted barcode to it and return the match.

 


 

SplitPdfByPageDefinition

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file.

fileNameTemplate

string

Output File Name

Target file template which can include %UNIQUEn (unique number starting at 1, zero padded to n digits) and %FILENAME (original filename without the extension).

sourceFileName

string

File Name

The name of the source file.

splitOption

string

Split Type

Choose the split operation to use for each file.  Values: [Split into Single Pages, Split by Page Range, Split by Repeating Range, Split by Top Level Bookmark]

pageRange

string

Page Range

Set of page ranges separated by commas that defines which pages from the original should be extracted.

repeatEvery

integer

Repeat Every (Pages)

Apply the page range to each set of Page Ranges pages within the document.  For example if  2-4  is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.

retainBookmarks

boolean

Retain bookmarks

Generated files will include bookmarks from the original file.

retainMetadata

boolean

Retain metadata

Generated files will include metadata(such as Author and Title) from the original file.


 

SplitText

Summary:

Description:

 

          Properties:

Name

Type

Summary

fileContent

string(byte)

File Content

The content of the source file

fileNameTemplate

string

File Name Template

Template for the output file if the text matches are found

noMatch

string

Pages with no Match

Depending on the split option you choose above, some pages will have no text value extracted. Choose what to do the these pages.  Values: [Do not copy to output, Copy to output, Copy to output and rename]

noTextFileName

string

No File Template

Template for the output file if no text match is found

sourceFileName

string

File Name

The name of the source file

splitOption

string

Output File Options

Choose the location of the page with the barcode in the output files from the split operation.  Values: [Page that matches text on first page, Page that matches text on last page, Remove the page that matches text]

zones

array of (ZonesItem)

Text

List of variables that can be used to extract text information from PDF files

 

ZonesItem

Summary:

Description:

 

          Properties:

Name

Type

Summary

expression

array of (string)

Value

Provide one or more value(s) here to be used with the property above, we will return the first text value that matches the rule stated above.

location

string

Location

Area of the page - use the Zonal tool to obtain coordinates: [https://www.aquaforest.com/en/zone/get-pdf-zone.html]

position

string

Select

Use this to refine the text you extract more, select an option that matches your requirements  Values: [text in zone, word after value, word before value, all text in line after value, all text in line before value, all text in zone after value, all text in zone before value]

regex

string

Pattern

If a regular expression is provided here, we will match any extracted text to it and return the match.