In some cases chardet.detect fails to detect the correct template encoding. This leads to wrong InvoiceTemplate objects and wrong parsing results in the end.
Example template:
issuer: Basic Test
keywords:
- Basic Test
fields:
date:
parser: regex
regex: Issue date:\s*(\d{4}-\d{2}-\d{2})
type: date
invoice_number:
parser: regex
regex: Invoice number:\s*([\d/]+)
amount:I have implemented a new `option` to specify the actual template encoding and extended the loader mechanism
parser: regex
regex: Total:\s*(\d+\.\d\d)
type: float
single_specialchar:
parser: static
value: ä
If I add another umlaut the detection works and identifies as utf-8. As a quick workaround one could add another umlaut to an unused static value.
In some cases
chardet.detectfails to detect the correct template encoding. This leads to wrongInvoiceTemplateobjects and wrong parsing results in the end.Example template:
If I add another umlaut the detection works and identifies as
utf-8. As a quick workaround one could add another umlaut to an unused static value.