Skip to content

Commit 969195d

Browse files
add prebuilt document to readme examples + print styles (#20996)
1 parent d23b8c9 commit 969195d

3 files changed

Lines changed: 86 additions & 12 deletions

File tree

sdk/formrecognizer/azure-ai-formrecognizer/README.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ The following section provides several code snippets covering some of the most c
191191

192192
* [Extract layout](#extract-layout "Extract Layout")
193193
* [Using Prebuilt Models](#using-prebuilt-models "Using Prebuilt Models")
194+
* [Using Prebuilt Document](#using-prebuilt-document "Using Prebuilt Document")
194195
* [Build a Model](#build-a-model "Build a model")
195196
* [Analyze Documents Using a Custom Model](#analyze-documents-using-a-custom-model "Analyze Documents Using a Custom Model")
196197
* [Manage Your Models](#manage-your-models "Manage Your Models")
@@ -310,6 +311,83 @@ You are not limited to receipts! There are a few prebuilt models to choose from,
310311
- Analyze invoices using the `prebuilt-invoice` model (fields recognized by the service can be found [here][service_recognize_invoice]).
311312
- Analyze identity documents using the `prebuilt-idDocuments` model (fields recognized by the service can be found [here][service_recognize_identity_documents]).
312313

314+
### Using Prebuilt Document
315+
Analyze entities, key-value pairs, tables, styles, and selection marks from documents using the general prebuilt document model provided by the Form Recognizer service.
316+
Select the Prebuilt Document model by passing `model="prebuilt-document"` into the `begin_analyze_documents` method:
317+
318+
```python
319+
from azure.ai.formrecognizer import DocumentAnalysisClient
320+
from azure.core.credentials import AzureKeyCredential
321+
322+
endpoint = "https://<my-custom-subdomain>.cognitiveservices.azure.com/"
323+
credential = AzureKeyCredential("<api_key>")
324+
325+
document_analysis_client = DocumentAnalysisClient(endpoint, credential)
326+
327+
with open("<path to your document>", "rb") as fd:
328+
document = fd.read()
329+
330+
poller = document_analysis_client.begin_analyze_document("prebuilt-document", document)
331+
result = poller.result()
332+
333+
print("----Entities found in document----")
334+
for entity in result.entities:
335+
print("Entity '{}' has category '{}' with sub-category '{}'".format(
336+
entity.content, entity.category, entity.sub_category
337+
))
338+
print("...with confidence {}\n".format(entity.confidence))
339+
340+
print("----Key-value pairs found in document----")
341+
for kv_pair in result.key_value_pairs:
342+
if kv_pair.key:
343+
print(
344+
"Key '{}' found within '{}' bounding regions".format(
345+
kv_pair.key.content,
346+
kv_pair.key.bounding_regions,
347+
)
348+
)
349+
if kv_pair.value:
350+
print(
351+
"Value '{}' found within '{}' bounding regions\n".format(
352+
kv_pair.value.content,
353+
kv_pair.value.bounding_regions,
354+
)
355+
)
356+
357+
print("----Tables found in document----")
358+
for table_idx, table in enumerate(result.tables):
359+
print(
360+
"Table # {} has {} rows and {} columns".format(
361+
table_idx, table.row_count, table.column_count
362+
)
363+
)
364+
for region in table.bounding_regions:
365+
print(
366+
"Table # {} location on page: {} is {}".format(
367+
table_idx,
368+
region.page_number,
369+
region.bounding_box,
370+
)
371+
)
372+
373+
print("----Styles found in document----")
374+
for style in result.styles:
375+
if style.is_handwritten:
376+
print("Document contains handwritten content: ")
377+
print(",".join([result.content[span.offset:span.offset + span.length] for span in style.spans]))
378+
379+
print("----Selection marks found in document----")
380+
for page in result.pages:
381+
for selection_mark in page.selection_marks:
382+
print(
383+
"...Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format(
384+
selection_mark.state,
385+
selection_mark.bounding_box,
386+
selection_mark.confidence,
387+
)
388+
)
389+
```
390+
313391
### Build a model
314392
Build a custom model on your own document type. The resulting model can be used to analyze values from the types of documents it was trained on.
315393
Provide a container SAS URL to your Azure Storage Blob container where you're storing the training documents.

sdk/formrecognizer/azure-ai-formrecognizer/samples/v3.2-beta/async_samples/sample_analyze_prebuilt_document_async.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,10 @@ async def analyze_document():
6363
)
6464
result = await poller.result()
6565

66-
for idx, style in enumerate(result.styles):
67-
print(
68-
"Document contains {} content".format(
69-
"handwritten" if style.is_handwritten else "no handwritten"
70-
)
71-
)
66+
for style in result.styles:
67+
if style.is_handwritten:
68+
print("Document contains handwritten content: ")
69+
print(",".join([result.content[span.offset:span.offset + span.length] for span in style.spans]))
7270

7371
for idx, page in enumerate(result.pages):
7472
print("----Analyzing document from page #{}----".format(idx + 1))

sdk/formrecognizer/azure-ai-formrecognizer/samples/v3.2-beta/sample_analyze_prebuilt_document.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,12 +59,10 @@ def analyze_document():
5959
)
6060
result = poller.result()
6161

62-
for idx, style in enumerate(result.styles):
63-
print(
64-
"Document contains {} content".format(
65-
"handwritten" if style.is_handwritten else "no handwritten"
66-
)
67-
)
62+
for style in result.styles:
63+
if style.is_handwritten:
64+
print("Document contains handwritten content: ")
65+
print(",".join([result.content[span.offset:span.offset + span.length] for span in style.spans]))
6866

6967
for page in result.pages:
7068
print("----Analyzing document from page #{}----".format(page.page_number))

0 commit comments

Comments
 (0)