Extracts information from forms and images into structured data based on a model created by a set of representative training forms.
Status: Preview |
Tier: Standard |
Version: v1.0-preview |
Name |
Summary |
Get Models |
|
Delete Model |
|
Get Model |
|
Analyze Form |
|
Get Keys |
|
Train Model |
Name |
Summary |
Name |
Summary |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Summary: Get Models
Description: Get information about all trained models
Syntax:
FormRecognizer.GetCustomModels ()
Returns:
Type:ModelsResult
Description: Result of query operation to fetch multiple models.
Summary: Delete Model
Description: Delete model artifacts.
Syntax:
FormRecognizer.DeleteCustomModel (string modelId)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
modelId |
string (Model ID) |
The identifier of the model to delete. |
True |
Returns:
Summary: Get Model
Description: Get information about a model.
Syntax:
FormRecognizer.GetCustomModel (string modelId)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
modelId |
string (Model ID) |
This is your Model Identifier that is used to analyze your documents with. |
True |
Returns:
Type:ModelResult
Description: Result of a model status query operation.
Summary: Analyze Form
Description: The document to analyze must be of a supported content type - 'application/pdf', 'image/jpeg' or 'image/png'. The response contains not just the extracted information of the analyzed form but also information about content that was not extracted along with a reason.
Syntax:
FormRecognizer.AnalyzeWithCustomModel (string modelId, [Optional]string Keys, string Document, [Optional]string Content-type)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
modelId |
string (Model ID) |
This is your Model Identifier that is used to analyze the document. |
True |
|
Keys |
string (Keys to extract) |
An optional list of known keys to extract the values for. |
False |
|
Document |
string(binary)
|
|
True |
|
Content-type |
string (Content type) |
Content type of the document to analyze. |
False |
Returns:
Type:AnalyzeResult
Description: Analyze API call result.
Summary: Get Keys
Description: Use the API to retrieve the keys that were extracted by the specified model.
Syntax:
FormRecognizer.GetExtractedKeys (string modelId)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
modelId |
string (Model ID) |
Model identifier. |
True |
Returns:
Type:KeysResult
Description: Result of an operation to get the keys extracted by a model.
Summary: Train Model
Description: The train request must include a source parameter that is either an externally accessible Azure Storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '{Mounts:Input}' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained are expected to be under the source. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg' and 'image/png'." Other content is ignored when training a model.
Syntax:
FormRecognizer.TrainCustomModel ([Optional]TrainRequest trainRequest)
Parameters:
Name |
Type |
Summary |
Required |
Related Action |
trainRequest |
|
Contract to initiate a train request. |
False |
Returns:
Type:TrainResult
Description: Response of the Train API call.
Summary:
Description: Analyze API call result.
Properties:
Name |
Type |
Summary |
errors |
array of (FormOperationError)
|
List of errors reported during the analyze operation. |
pages |
array of (ExtractedPage)
|
Page level information extracted in the analyzed document. |
status |
string
|
Status of the analyze operation. Values: [success, partialSuccess, failure] |
Summary:
Description: Representation of a key-value pair as a list of key and value tokens.
Properties:
Name |
Type |
Summary |
key |
array of (ExtractedToken)
|
List of tokens for the extracted key in a key-value pair. |
value |
array of (ExtractedToken)
|
List of tokens for the extracted value in a key-value pair. |
Summary:
Description: Extraction information of a single page in a with a document.
Properties:
Name |
Type |
Summary |
clusterId |
integer(int32)
|
Cluster identifier. |
height |
integer(int32)
|
Height of the page (in pixels). |
keyValuePairs |
array of (ExtractedKeyValuePair)
|
List of Key-Value pairs extracted from the page. |
number |
integer(int32)
|
Page number. |
tables |
array of (ExtractedTable)
|
List of Tables and their information extracted from the page. |
width |
integer(int32)
|
Width of the page (in pixels). |
Summary:
Description: Extraction information about a table contained in a page.
Properties:
Name |
Type |
Summary |
columns |
array of (ExtractedTableColumn)
|
List of columns contained in the table. |
id |
string
|
Table identifier. |
Summary:
Description: Extraction information of a column in a table.
Properties:
Name |
Type |
Summary |
entries |
array of (array of (ExtractedToken))
|
Extracted text for each cell of a column. Each cell in the column can have a list of one or more tokens. |
header |
array of (ExtractedToken)
|
List of extracted tokens for the column header. |
Summary:
Description: Canonical representation of single extracted text.
Properties:
Name |
Type |
Summary |
boundingBox |
array of (number(double))
|
Bounding box of the extracted text. Represents the location of the extracted text as a pair of cartesian co-ordinates. The co-ordinate pairs are arranged by top-left, top-right, bottom-right and bottom-left endpoints box with origin reference from the bottom-left of the page. |
confidence |
number(double)
|
A measure of accuracy of the extracted text. |
text |
string
|
String value of the extracted text. |
Summary:
Description:
Properties:
Name |
Type |
Summary |
documentName |
string
|
Reference to the data that the report is for. |
errors |
array of (string)
|
List of errors per page. |
pages |
integer(int32)
|
Total number of pages trained on. |
status |
string
|
Status of the training operation. Values: [success, partialSuccess, failure] |
Summary:
Description: Error reported during an operation.
Properties:
Name |
Type |
Summary |
errorMessage |
string
|
Message reported during the train operation. |
Summary:
Description: Result of an operation to get the keys extracted by a model.
Properties:
Name |
Type |
Summary |
clusters |
|
Object mapping ClusterIds to Key lists. |
Summary:
Description: Object mapping ClusterIds to Key lists.
Properties:
Name |
Type |
Summary |
Summary:
Description: Result of a model status query operation.
Properties:
Name |
Type |
Summary |
createdDateTime |
string(date-time)
|
Get or set the created date time of the model. |
lastUpdatedDateTime |
string(date-time)
|
Get or set the model last updated datetime. |
modelId |
string(uuid)
|
Get or set model identifier. |
status |
string
|
Get or set the status of model. Values: [created, ready, invalid] |
Summary:
Description: Result of query operation to fetch multiple models.
Properties:
Name |
Type |
Summary |
models |
array of (ModelResult)
|
Collection of models. |
Summary:
Description: Contract to initiate a train request.
Properties:
Name |
Type |
Summary |
source |
string
|
Get or set source path. |
Summary:
Description: Response of the Train API call.
Properties:
Name |
Type |
Summary |
errors |
array of (FormOperationError)
|
Errors returned during the training operation. |
modelId |
string(uuid)
|
Identifier of the model. |
trainingDocuments |
array of (FormDocumentReport)
|
List of documents used to train the model and the train operation error reported by each. |