Skip to content

[REF] mail: use new python email API#35929

Closed
Julien00859 wants to merge 3 commits intoodoo:masterfrom
odoo-dev:master-pep594-juc
Closed

[REF] mail: use new python email API#35929
Julien00859 wants to merge 3 commits intoodoo:masterfrom
odoo-dev:master-pep594-juc

Conversation

@Julien00859
Copy link
Copy Markdown
Member

@Julien00859 Julien00859 commented Aug 21, 2019

PEP 594 is depreciating some modules from the python standard library. Among them email.message.Message old email API and related modules, imp old import logic and the not commonly used cgitb exception pretty printer.

The legacy email.message.Message API has been replaced by email.message.EmailMessage API, the later supports automatic header value conversion out of the box which simplify how emails are generated.

The email.utils.formataddr function is commonly used accros the code base to generate John Doe <johndoe@example.com> like strings. As the function has been depreciated too, it has been re-implemented but only to support utf-8 and base64 (other mail supported encoding are not used within the codebase).

The imp module is used twice, first time to locate and load addons, the second to import handmade modules. The first has been replaced by importlib.machinery.PathFinder, the second by types.ModuleName.

The cgitb is only used in order to ease the debugging in internal tools related to the documentation, as their are still in use, the module has been packaged as part of our codebase.

@Julien00859 Julien00859 requested a review from xmo-odoo August 21, 2019 15:52
@C3POdoo C3POdoo added the RD research & development, internal work label Aug 21, 2019
@Julien00859 Julien00859 force-pushed the master-pep594-juc branch 7 times, most recently from 54e128b to d5ef6da Compare August 27, 2019 11:37
@robodoo robodoo added the CI 🤖 Robodoo has seen passing statuses label Aug 27, 2019
@Julien00859 Julien00859 requested a review from odony September 4, 2019 16:31
Copy link
Copy Markdown
Collaborator

@xmo-odoo xmo-odoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should be a PR for each replacement / removal
  • PEP seems to be stalled (still draft, no notices in the 3.8b4 docs), don't see the point of replacement for replacement's sake (cgitb being imported entire) (especially since after looking at it it's pretty much just used for a debugging / testing utility in an extension of the docs processing system)
  • imp has been deprecated since 3.4 so cool, however at this point it might be worth / time to remove support for imports through openerp.addons or bare module imports (does the latter even work anymore?), the import hooks and move to something like adding the addon paths to odoo.addons.__path__ directly (not sure whether that's sufficient but a cursory read seems to indicate it ought be). If that works, I guess odoo.modules.module.ad_path would be an alias to odoo.addons.path?
  • if we're going to remove calls to deprecated stuff, we should actually do that e.g. we're still using inspect.getargspec, base64.encodestring, base64.decodestring, … which have been deprecated since 3.0/3.1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant (and risky), shouldn't ad_finders replace ad_paths, and ad_paths either removed or a proxy of sort?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can retrieve the path with ad_finder.path but there is actually quite some modules that are using the ad_paths variable. I can be aggressive and change every for path in ad_paths by for path in (finder.path for finder in ad_finders)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that ad_paths could be a pseudo-list proxying to ad_finder. Though see main comment after more thinking mayhaps we can just strip out the whole thing e.g. warn on non-odoo.addons imports in 13.0, remove in master and have odoo.addons act as a more "proper" namespace without import hooks.

Copy link
Copy Markdown
Contributor

@odony odony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @xmo-odoo, there should be one PR for each "replacement" and the rationale needs to be discussed case-by-case:

  • email modernization: it's cool because we can drop lines of (ugly) code, so it's a good idea to do it. Should be the only purpose of this PR.
  • cgitb: don't touch it, who cares, it's a debugging thing that we can always drop later if it really disappears.
  • imp: needs further discussion as a separate PR. It's a risky change, and we need to decide what we want to support officially anyway.

@robodoo robodoo removed the CI 🤖 Robodoo has seen passing statuses label Sep 5, 2019
@Julien00859 Julien00859 changed the title [REF] *: Remove python dead batteries [REF] mail: use new python email API Sep 5, 2019
@Julien00859 Julien00859 force-pushed the master-pep594-juc branch 3 times, most recently from 696e1ee to c8b554c Compare September 6, 2019 09:24
@robodoo robodoo added the CI 🤖 Robodoo has seen passing statuses label Sep 6, 2019
@Julien00859
Copy link
Copy Markdown
Member Author

@odony the requested changes are implemented and tested

[PEP 594][1] is going to depreciate legacy `email.message.Message` API in
PY3.8 in favor `email.message.EmailMessage`.

All `email.message_from_x` constructor functions take an optional
`policy` argument. The policy can either be `policy.compat32` to
generate an `email.message.Message` instance, either be `policy.SMTP`
to generate an `email.message.EmailMessage` instance.

The [current][2] default policy is `policy.compat32` but will change in
a future version of python. It is recommended to always specify the
policy.

[1]: https://www.python.org/dev/peps/pep-0594/#email-legacy-api
[2] https://docs.python.org/3.7/library/email.parser.html#email.parser.BytesParser
[PEP 594] is going to depreciate the legacy `email.message.Message` API
and its related modules. Among them the `email.utils.formataddr`
function, this function takes a `(name, email)` pair and returns a
string value suitable for From, To and Cc headers.

The stdlib function is capable of handling several character encoding
and two binary-to-ascii encoding. Odoo only uses `utf-8` which is
compatible with the base64 b2a encoding. The re-implementation of that
function has been simplified to only support base64-ed utf-8 and ascii.

[PEP 594]: https://www.python.org/dev/peps/pep-0594/\#email-lagacy-api
@robodoo robodoo removed the CI 🤖 Robodoo has seen passing statuses label Sep 17, 2019
@Julien00859 Julien00859 force-pushed the master-pep594-juc branch 2 times, most recently from 5b8b1cb to 9caca1f Compare September 17, 2019 11:44
[PEP 594] is going to depreciate the legacy `email.message.Message` API
and its related modules: `email.(charset|header|mime|utils)`.

The new `email.message.EmailMessage` API exposes a much easier interface
to create multiparted emails [1], is capable of doing all the necessary
headers value conversion (RFCs [2045], [2047], [2049]) [2] and get/set
different flavor of payload (text/bytes) in a straightforward way [3].

All headers are structured in a way to support python native types, i.e.
the `Date` header supports `datetime.datetime` objects and automatically
performs the required formatting. The same goes for multi-valued headers
like the `To` header, one can directly set a python list of values, it
will be automatically be formatted according to the RFCs.

The dedicated encoding and decoding functions are no more needed thus
has been removed has part of the refactor. FTR, [RFC2231] is an update
of [RFC2047] and based on tests we've just conducted, both GMail and
Thunderbird now support RFC2231 encoding just fine in all headers:
attachments names, From, etc.

[1]: http://docs.python.org/3/library/email.message.html
[2]: http://docs.python.org/3/library/email.headerregistry.html
[3]: http://docs.python.org/3/library/email.contentmanager.html
[2045]: https://tools.ietf.org/html/rfc2045
[2047]: https://tools.ietf.org/html/rfc2047
[2049]: https://tools.ietf.org/html/rfc2049
[2231]: https://tools.ietf.org/html/rfc2231
@robodoo robodoo removed the CI 🤖 Robodoo has seen passing statuses label Sep 19, 2019
@rco-odoo
Copy link
Copy Markdown
Member

@robodoo rebase-ff r+

@robodoo
Copy link
Copy Markdown
Contributor

robodoo commented Sep 19, 2019

Merge method set to rebase and fast-forward

robodoo pushed a commit that referenced this pull request Sep 19, 2019
[PEP 594] is going to depreciate the legacy `email.message.Message` API
and its related modules: `email.(charset|header|mime|utils)`.

The new `email.message.EmailMessage` API exposes a much easier interface
to create multiparted emails [1], is capable of doing all the necessary
headers value conversion (RFCs [2045], [2047], [2049]) [2] and get/set
different flavor of payload (text/bytes) in a straightforward way [3].

All headers are structured in a way to support python native types, i.e.
the `Date` header supports `datetime.datetime` objects and automatically
performs the required formatting. The same goes for multi-valued headers
like the `To` header, one can directly set a python list of values, it
will be automatically be formatted according to the RFCs.

The dedicated encoding and decoding functions are no more needed thus
has been removed has part of the refactor. FTR, [RFC2231] is an update
of [RFC2047] and based on tests we've just conducted, both GMail and
Thunderbird now support RFC2231 encoding just fine in all headers:
attachments names, From, etc.

[1]: http://docs.python.org/3/library/email.message.html
[2]: http://docs.python.org/3/library/email.headerregistry.html
[3]: http://docs.python.org/3/library/email.contentmanager.html
[2045]: https://tools.ietf.org/html/rfc2045
[2047]: https://tools.ietf.org/html/rfc2047
[2049]: https://tools.ietf.org/html/rfc2049
[2231]: https://tools.ietf.org/html/rfc2231

closes #35929

Signed-off-by: Raphael Collet (rco) <rco@openerp.com>
@robodoo robodoo added CI 🤖 Robodoo has seen passing statuses merging 👷 and removed merging 👷 labels Sep 19, 2019
@robodoo
Copy link
Copy Markdown
Contributor

robodoo commented Sep 19, 2019

Merged at 18299d7, thanks!

@robodoo robodoo closed this Sep 19, 2019
@Julien00859 Julien00859 deleted the master-pep594-juc branch September 20, 2019 08:12
robodoo pushed a commit that referenced this pull request Nov 14, 2019
…#35929

[FIX] tools: correctly format emails when having an encoding issue
[IMP] tools: restore P2 formataddr default behavior
[FIX] ir.mail.server: squash redundant CRs (bpo-34424)

closes #39516

Task: 2003936
Signed-off-by: Olivier Dony (odo) <odo@openerp.com>
odony added a commit to odoo-dev/odoo that referenced this pull request Aug 7, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since odoo#35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1
robodoo pushed a commit that referenced this pull request Aug 9, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since #35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

closes #55609

Signed-off-by: Olivier Dony (odo) <odo@openerp.com>
odony added a commit to odoo-dev/odoo that referenced this pull request Aug 9, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since odoo#35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

X-original-commit: 6726e9a
robodoo pushed a commit that referenced this pull request Aug 9, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since #35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

closes #55655

X-original-commit: 6726e9a
Signed-off-by: Olivier Dony (odo) <odo@openerp.com>
fw-bot pushed a commit to odoo-dev/odoo that referenced this pull request Aug 9, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since odoo#35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

X-original-commit: 02b7877
robodoo pushed a commit that referenced this pull request Aug 10, 2020
Python 3 before 3.8 has a bug that causes the email.policy classes to
incorrectly fold and RFC2047-encode "identification fields" in email
messages. This mainly applies to Message-Id, References, and In-Reply-To
fields.

We are impacted by this bug since #35929 where we switched to
using the "modern" email.message API.

RFC2047 section 5 clearly states that those headers/fields are not to be
encoded, and that would violate RFC5322.

Further, such a folded Message-Id is considered non-RFC-conformant by
popular MTAs (GMail, Outlook), which will then generate *another*
Message-Id field, causing the original threading information to be lost.
Replies to such a modified message will reference the new, unknown
Message-Id, and won't be attached to the original thread.

The solution we adopt here is to monkey-patch the SMTP policies to
special-case those identification fields and deactivate the automatic
folding, until the bug is properly and fully fixed in the standard lib.

Some considerations taken into account for this patch:

- `email.policy.SMTP` is being monkey-patched globally to make sure we
  fix all possible places where Messages are being encoded/folded
- the fix is **not** made version-specific, considering that even in Python
  3.8 the official bugfix only applies to Message-Id, but still fails to
  protect other identification fields, like *References* and
  *In-Reply-To*. The author specifically noted that shortcoming [2].
  The fix wouldn't break anything on Python 3.8 anyway.
- the `noFoldPolicy` trick for preventing folding is done with no max
  line length at all. RFC5322, section 2.1.1 states [3] that the maximum
  length is 998 due to legacy implementations, but there is no provision
  to wrap identification fields that are longer than that. Wrapping at
  998 chars would corrupt the header anyway. We'll just count on the
  fact that we don't usually need 1k+ chars in those headers.

The invalid folding/encoding in action on Python 3.6 (in Python 3.8 only
the second header gets folded):

```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: =?utf-8?q?=3C929227342217024=2E1596730490=2E324691772460938-exam?=
 =?utf-8?q?ple-30661-some=2Ereference=40test-123=2Eexample=2Ecom=3E?=
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

```

and the expected result after the fix:
```py
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = '<929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>'
>>> msg['In-Reply-To'] = '<92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: <929227342217024.1596730490.324691772460938-example-30661-some.reference@test-123.example.com>
In-Reply-To: <92922734221723.1596730568.324691772460444-another-30661-parent.reference@test-123.example.com>

```

[1] bpo-35805: https://bugs.python.org/issue35805
[2] python/cpython#13397 (comment)
[3] https://tools.ietf.org/html/rfc5322#section-2.1.1

closes #55656

X-original-commit: 02b7877
Signed-off-by: Olivier Dony (odo) <odo@openerp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI 🤖 Robodoo has seen passing statuses RD research & development, internal work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants