Form Recognizer

Extracts information from forms and images into structured data based on a model created by a set of representative training forms.

Status: Preview

Tier: Standard

Version: v1.0-preview

Actions:

Name	Summary
GetCustomModels ()	Get Models
DeleteCustomModel (string modelId)	Delete Model
GetCustomModel (string modelId)	Get Model
AnalyzeWithCustomModel (string modelId, [Optional]string Keys, string Document, [Optional]string Content-type)	Analyze Form
GetExtractedKeys (string modelId)	Get Keys
TrainCustomModel ([Optional]TrainRequest trainRequest)	Train Model

Objects:

Name	Summary
AnalyzeResult
ExtractedKeyValuePair
ExtractedPage
ExtractedTable
ExtractedTableColumn
ExtractedToken
FormDocumentReport
FormOperationError
KeysResult
ModelResult
ModelsResult
TrainRequest
TrainResult

Actions:

GetCustomModels

Summary: Get Models

Description: Get information about all trained models

Syntax:

FormRecognizer.GetCustomModels ()

Returns:

Type:ModelsResult

Description: Result of query operation to fetch multiple models.

DeleteCustomModel

Summary: Delete Model

Description: Delete model artifacts.

Syntax:

FormRecognizer.DeleteCustomModel (string modelId)

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

The identifier of the model to delete.

True

Returns:

GetCustomModel

Summary: Get Model

Description: Get information about a model.

Syntax:

FormRecognizer.GetCustomModel (string modelId)

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

This is your Model Identifier that is used to analyze your documents with.

True

Returns:

Type:ModelResult

Description: Result of a model status query operation.

Description: The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason.

Syntax:

FormRecognizer.AnalyzeWithCustomModel (string modelId, [Optional]string Keys, string Document, [Optional]string Content-type)

Parameters:

Name	Type	Summary	Required	Related Action
modelId	string (Model ID)	This is your Model Identifier that is used to analyze the document.	True
Keys	string (Keys to extract)	An optional list of known keys to extract the values for.	False
Document	string(binary)		True
Content-type	string (Content type)	Content type of the document to analyze.	False

Returns:

Type:AnalyzeResult

Description: Analyze API call result.

GetExtractedKeys

Summary: Get Keys

Description: Use the API to retrieve the keys that were extracted by the specified model.

Syntax:

FormRecognizer.GetExtractedKeys (string modelId)

Parameters:

Name

Type

Summary

Required

Related Action

modelId

string

(Model ID)

Model identifier.

True

Returns:

Type:KeysResult

Description: Result of an operation to get the keys extracted by a model.

TrainCustomModel

Summary: Train Model

Description: The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model.

Syntax:

FormRecognizer.TrainCustomModel ([Optional]TrainRequest trainRequest)

Parameters:

Name

Type

Summary

Required

Related Action

trainRequest

TrainRequest

Contract to initiate a train request.

False

Returns:

Type:TrainResult

Description: Response of the Train API call.

AnalyzeResult

Summary:

Description: Analyze API call result.

Properties:

Name	Type	Summary
errors	array of (FormOperationError)	List of errors reported during the analyze operation.
pages	array of (ExtractedPage)	Page level information extracted in the analyzed document.
status	string	Status of the analyze operation. Values: [success, partialSuccess, failure]

ExtractedKeyValuePair

Summary:

Description: Representation of a key-value pair as a list of key and value tokens.

Properties:

Name

Type

Summary

key

array of (ExtractedToken)

List of tokens for the extracted key in a key-value pair.

value

array of (ExtractedToken)

List of tokens for the extracted value in a key-value pair.

ExtractedPage

Summary:

Description: Extraction information of a single page in a with a document.

Properties:

Name	Type	Summary
clusterId	integer(int32)	Cluster identifier.
height	integer(int32)	Height of the page (in pixels).
keyValuePairs	array of (ExtractedKeyValuePair)	List of Key-Value pairs extracted from the page.
number	integer(int32)	Page number.
tables	array of (ExtractedTable)	List of Tables and their information extracted from the page.
width	integer(int32)	Width of the page (in pixels).

ExtractedTable

Summary:

Description: Extraction information about a table contained in a page.

Properties:

Name

Type

Summary

columns

array of (ExtractedTableColumn)

List of columns contained in the table.

string

Table identifier.

ExtractedTableColumn

Summary:

Description: Extraction information of a column in a table.

Properties:

Name

Type

Summary

entries

array of (array of (ExtractedToken))

Extracted text for each cell of a column. Each cell in the column can have a list of one or more tokens.

header

array of (ExtractedToken)

List of extracted tokens for the column header.

ExtractedToken

Summary:

Description: Canonical representation of single extracted text.

Properties:

Name	Type	Summary
boundingBox	array of (number(double))	Bounding box of the extracted text. Represents the location of the extracted text as a pair of cartesian co-ordinates. The co-ordinate pairs are arranged by top-left, top-right, bottom-right and bottom-left endpoints box with origin reference from the bottom-left of the page.
confidence	number(double)	A measure of accuracy of the extracted text.
text	string	String value of the extracted text.

FormDocumentReport

Summary:

Description:

Properties:

Name	Type	Summary
documentName	string	Reference to the data that the report is for.
errors	array of (string)	List of errors per page.
pages	integer(int32)	Total number of pages trained on.
status	string	Status of the training operation. Values: [success, partialSuccess, failure]

FormOperationError

Summary:

Description: Error reported during an operation.

Properties:

Name

Type

Summary

errorMessage

string

Message reported during the train operation.

KeysResult

Summary:

Description: Result of an operation to get the keys extracted by a model.

Properties:

Name

Type

Summary

clusters

Clusters

Object mapping ClusterIds to Key lists.

Clusters

Summary:

Description: Object mapping ClusterIds to Key lists.

Properties:

Name

Type

Summary

ModelResult

Summary:

Description: Result of a model status query operation.

Properties:

Name	Type	Summary
createdDateTime	string(date-time)	Get or set the created date time of the model.
lastUpdatedDateTime	string(date-time)	Get or set the model last updated datetime.
modelId	string(uuid)	Get or set model identifier.
status	string	Get or set the status of model. Values: [created, ready, invalid]

ModelsResult

Summary:

Description: Result of query operation to fetch multiple models.

Properties:

Name

Type

Summary

models

array of (ModelResult)

Collection of models.

TrainRequest

Summary:

Description: Contract to initiate a train request.

Properties:

Name

Type

Summary

source

string

Get or set source path.

TrainResult

Summary:

Description: Response of the Train API call.

Properties:

Name	Type	Summary
errors	array of (FormOperationError)	Errors returned during the training operation.
modelId	string(uuid)	Identifier of the model.
trainingDocuments	array of (FormDocumentReport)	List of documents used to train the model and the train operation error reported by each.