Skip to content

Automatic encoding detection for template files fails for single german umlaut #445

@timorieber

Description

@timorieber

In some cases chardet.detect fails to detect the correct template encoding. This leads to wrong InvoiceTemplate objects and wrong parsing results in the end.

Example template:

issuer: Basic Test
keywords:
  - Basic Test
fields:
  date:
    parser: regex
    regex: Issue date:\s*(\d{4}-\d{2}-\d{2})
    type: date
  invoice_number:
    parser: regex
    regex: Invoice number:\s*([\d/]+)
  amount:I have implemented a new `option` to specify the actual template encoding and extended the loader mechanism
    parser: regex
    regex: Total:\s*(\d+\.\d\d)
    type: float
  single_specialchar:
    parser: static
    value: ä

If I add another umlaut the detection works and identifies as utf-8. As a quick workaround one could add another umlaut to an unused static value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions