Connectors Reference

Form Recognizer

Extracts information from forms and images into structured data based on a model created by a set of representative training forms.

 

Status: Preview

Tier: Standard

Version: v1.0-preview

 

Actions:

Name

Summary

GetCustomModels ()

Get Models

DeleteCustomModel (string modelId)

Delete Model

GetCustomModel (string modelId)

Get Model

AnalyzeWithCustomModel (string modelId, [Optional]string Keys, string Document, [Optional]string Content-type)

Analyze Form

GetExtractedKeys (string modelId)

Get Keys

TrainCustomModel ([Optional]TrainRequest trainRequest)

Train Model

 

Triggers:

Name

Summary

 

Objects:

Name

Summary

AnalyzeResult

 

ExtractedKeyValuePair

 

ExtractedPage

 

ExtractedTable

 

ExtractedTableColumn

 

ExtractedToken

 

FormDocumentReport

 

FormOperationError

 

KeysResult

 

ModelResult

 

ModelsResult

 

TrainRequest

 

TrainResult

 

 

Actions:

GetCustomModels

Summary: Get Models

Description: Get information about all trained models

 

Syntax:

FormRecognizer.GetCustomModels ()

 

Returns:

          Type:ModelsResult

          Description: Result of query operation to fetch multiple models.

 

DeleteCustomModel

Summary: Delete Model

Description: Delete model artifacts.

 

Syntax:

FormRecognizer.DeleteCustomModel (string modelId)

 

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

The identifier of the model to delete.

True

 

Returns:

 

GetCustomModel

Summary: Get Model

Description: Get information about a model.

 

Syntax:

FormRecognizer.GetCustomModel (string modelId)

 

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

This is your Model Identifier that is used to analyze your documents with.

True

 

Returns:

          Type:ModelResult

          Description: Result of a model status query operation.

 

AnalyzeWithCustomModel

Summary: Analyze Form

Description: The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason.

 

Syntax:

FormRecognizer.AnalyzeWithCustomModel (string modelId, [Optional]string Keys, string Document, [Optional]string Content-type)

 

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

This is your Model Identifier that is used to analyze the document.

True

Keys

string

(Keys to extract)

An optional list of known keys to extract the values for.

False

Document

string(binary)

 

 

True

Content-type

string

(Content type)

Content type of the document to analyze.

False

 

Returns:

          Type:AnalyzeResult

          Description: Analyze API call result.

 

GetExtractedKeys

Summary: Get Keys

Description: Use the API to retrieve the keys that were extracted by the specified model.

 

Syntax:

FormRecognizer.GetExtractedKeys (string modelId)

 

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

Model identifier.

True

 

Returns:

          Type:KeysResult

          Description: Result of an operation to get the keys extracted by a model.

 

TrainCustomModel

Summary: Train Model

Description: The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model.

 

Syntax:

FormRecognizer.TrainCustomModel ([Optional]TrainRequest trainRequest)

 

Parameters:

Name

Type

Summary

Required

Related Action

trainRequest

TrainRequest

 

Contract to initiate a train request.

False

 

Returns:

          Type:TrainResult

          Description: Response of the Train API call.

 


 

AnalyzeResult

Summary:

Description: Analyze API call result.

 

          Properties:

Name

Type

Summary

errors

array of (FormOperationError)

 

List of errors reported during the analyze operation.

pages

array of (ExtractedPage)

 

Page level information extracted in the analyzed document.

status

string

 

Status of the analyze operation.  Values: [success, partialSuccess, failure]


 

ExtractedKeyValuePair

Summary:

Description: Representation of a key-value pair as a list of key and value tokens.

 

          Properties:

Name

Type

Summary

key

array of (ExtractedToken)

 

List of tokens for the extracted key in a key-value pair.

value

array of (ExtractedToken)

 

List of tokens for the extracted value in a key-value pair.


 

ExtractedPage

Summary:

Description: Extraction information of a single page in a with a document.

 

          Properties:

Name

Type

Summary

clusterId

integer(int32)

 

Cluster identifier.

height

integer(int32)

 

Height of the page (in pixels).

keyValuePairs

array of (ExtractedKeyValuePair)

 

List of Key-Value pairs extracted from the page.

number

integer(int32)

 

Page number.

tables

array of (ExtractedTable)

 

List of Tables and their information extracted from the page.

width

integer(int32)

 

Width of the page (in pixels).


 

ExtractedTable

Summary:

Description: Extraction information about a table contained in a page.

 

          Properties:

Name

Type

Summary

columns

array of (ExtractedTableColumn)

 

List of columns contained in the table.

id

string

 

Table identifier.


 

ExtractedTableColumn

Summary:

Description: Extraction information of a column in a table.

 

          Properties:

Name

Type

Summary

entries

array of (array of (ExtractedToken))

 

Extracted text for each cell of a column. Each cell in the column can have a list of one or more tokens.

header

array of (ExtractedToken)

 

List of extracted tokens for the column header.


 

ExtractedToken

Summary:

Description: Canonical representation of single extracted text.

 

          Properties:

Name

Type

Summary

boundingBox

array of (number(double))

 

Bounding box of the extracted text. Represents the location of the extracted text as a pair of cartesian co-ordinates. The co-ordinate pairs are arranged by top-left, top-right, bottom-right and bottom-left endpoints box with origin reference from the bottom-left of the page.

confidence

number(double)

 

A measure of accuracy of the extracted text.

text

string

 

String value of the extracted text.


 

FormDocumentReport

Summary:

Description:

 

          Properties:

Name

Type

Summary

documentName

string

 

Reference to the data that the report is for.

errors

array of (string)

 

List of errors per page.

pages

integer(int32)

 

Total number of pages trained on.

status

string

 

Status of the training operation.  Values: [success, partialSuccess, failure]


 

FormOperationError

Summary:

Description: Error reported during an operation.

 

          Properties:

Name

Type

Summary

errorMessage

string

 

Message reported during the train operation.


 

KeysResult

Summary:

Description: Result of an operation to get the keys extracted by a model.

 

          Properties:

Name

Type

Summary

clusters

Clusters

 

Object mapping ClusterIds to Key lists.

 

Clusters

Summary:

Description: Object mapping ClusterIds to Key lists.

 

          Properties:

Name

Type

Summary

 


 

ModelResult

Summary:

Description: Result of a model status query operation.

 

          Properties:

Name

Type

Summary

createdDateTime

string(date-time)

 

Get or set the created date time of the model.

lastUpdatedDateTime

string(date-time)

 

Get or set the model last updated datetime.

modelId

string(uuid)

 

Get or set model identifier.

status

string

 

Get or set the status of model.  Values: [created, ready, invalid]


 

ModelsResult

Summary:

Description: Result of query operation to fetch multiple models.

 

          Properties:

Name

Type

Summary

models

array of (ModelResult)

 

Collection of models.


 

TrainRequest

Summary:

Description: Contract to initiate a train request.

 

          Properties:

Name

Type

Summary

source

string

 

Get or set source path.


 

TrainResult

Summary:

Description: Response of the Train API call.

 

          Properties:

Name

Type

Summary

errors

array of (FormOperationError)

 

Errors returned during the training operation.

modelId

string(uuid)

 

Identifier of the model.

trainingDocuments

array of (FormDocumentReport)

 

List of documents used to train the model and the train operation error reported by each.