Skip to content

Commit 26cb465

Browse files
bpo-29755: Fixed the lgettext() family of functions in the gettext module. (#2266)
They now always return bytes. Updated the gettext documentation.
1 parent 8457706 commit 26cb465

File tree

4 files changed

+229
-107
lines changed

4 files changed

+229
-107
lines changed

Doc/library/gettext.rst

Lines changed: 80 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -48,9 +48,10 @@ class-based API instead.
4848

4949
.. function:: bind_textdomain_codeset(domain, codeset=None)
5050

51-
Bind the *domain* to *codeset*, changing the encoding of strings returned by the
52-
:func:`gettext` family of functions. If *codeset* is omitted, then the current
53-
binding is returned.
51+
Bind the *domain* to *codeset*, changing the encoding of byte strings
52+
returned by the :func:`lgettext`, :func:`ldgettext`, :func:`lngettext`
53+
and :func:`ldngettext` functions.
54+
If *codeset* is omitted, then the current binding is returned.
5455

5556

5657
.. function:: textdomain(domain=None)
@@ -67,28 +68,14 @@ class-based API instead.
6768
:func:`_` in the local namespace (see examples below).
6869

6970

70-
.. function:: lgettext(message)
71-
72-
Equivalent to :func:`gettext`, but the translation is returned in the
73-
preferred system encoding, if no other encoding was explicitly set with
74-
:func:`bind_textdomain_codeset`.
75-
76-
7771
.. function:: dgettext(domain, message)
7872

79-
Like :func:`gettext`, but look the message up in the specified *domain*.
80-
81-
82-
.. function:: ldgettext(domain, message)
83-
84-
Equivalent to :func:`dgettext`, but the translation is returned in the
85-
preferred system encoding, if no other encoding was explicitly set with
86-
:func:`bind_textdomain_codeset`.
73+
Like :func:`.gettext`, but look the message up in the specified *domain*.
8774

8875

8976
.. function:: ngettext(singular, plural, n)
9077

91-
Like :func:`gettext`, but consider plural forms. If a translation is found,
78+
Like :func:`.gettext`, but consider plural forms. If a translation is found,
9279
apply the plural formula to *n*, and return the resulting message (some
9380
languages have more than two plural forms). If no translation is found, return
9481
*singular* if *n* is 1; return *plural* otherwise.
@@ -101,24 +88,33 @@ class-based API instead.
10188
formulas for a variety of languages.
10289

10390

104-
.. function:: lngettext(singular, plural, n)
105-
106-
Equivalent to :func:`ngettext`, but the translation is returned in the
107-
preferred system encoding, if no other encoding was explicitly set with
108-
:func:`bind_textdomain_codeset`.
109-
110-
11191
.. function:: dngettext(domain, singular, plural, n)
11292

11393
Like :func:`ngettext`, but look the message up in the specified *domain*.
11494

11595

96+
.. function:: lgettext(message)
97+
.. function:: ldgettext(domain, message)
98+
.. function:: lngettext(singular, plural, n)
11699
.. function:: ldngettext(domain, singular, plural, n)
117100

118-
Equivalent to :func:`dngettext`, but the translation is returned in the
119-
preferred system encoding, if no other encoding was explicitly set with
101+
Equivalent to the corresponding functions without the ``l`` prefix
102+
(:func:`.gettext`, :func:`dgettext`, :func:`ngettext` and :func:`dngettext`),
103+
but the translation is returned as a byte string encoded in the preferred
104+
system encoding if no other encoding was explicitly set with
120105
:func:`bind_textdomain_codeset`.
121106

107+
.. warning::
108+
109+
These functions should be avoided in Python 3, because they return
110+
encoded bytes. It's much better to use alternatives which return
111+
Unicode strings instead, since most Python applications will want to
112+
manipulate human readable text as strings instead of bytes. Further,
113+
it's possible that you may get unexpected Unicode-related exceptions
114+
if there are encoding problems with the translated strings. It is
115+
possible that the ``l*()`` functions will be deprecated in future Python
116+
versions due to their inherent problems and limitations.
117+
122118

123119
Note that GNU :program:`gettext` also defines a :func:`dcgettext` method, but
124120
this was deemed not useful and so it is currently unimplemented.
@@ -179,8 +175,9 @@ class can also install themselves in the built-in namespace as the function
179175
names are cached. The actual class instantiated is either *class_* if
180176
provided, otherwise :class:`GNUTranslations`. The class's constructor must
181177
take a single :term:`file object` argument. If provided, *codeset* will change
182-
the charset used to encode translated strings in the :meth:`lgettext` and
183-
:meth:`lngettext` methods.
178+
the charset used to encode translated strings in the
179+
:meth:`~NullTranslations.lgettext` and :meth:`~NullTranslations.lngettext`
180+
methods.
184181

185182
If multiple files are found, later files are used as fallbacks for earlier ones.
186183
To allow setting the fallback, :func:`copy.copy` is used to clone each
@@ -250,26 +247,29 @@ are the methods of :class:`NullTranslations`:
250247

251248
.. method:: gettext(message)
252249

253-
If a fallback has been set, forward :meth:`gettext` to the fallback.
254-
Otherwise, return the translated message. Overridden in derived classes.
255-
256-
257-
.. method:: lgettext(message)
258-
259-
If a fallback has been set, forward :meth:`lgettext` to the fallback.
260-
Otherwise, return the translated message. Overridden in derived classes.
250+
If a fallback has been set, forward :meth:`.gettext` to the fallback.
251+
Otherwise, return *message*. Overridden in derived classes.
261252

262253

263254
.. method:: ngettext(singular, plural, n)
264255

265256
If a fallback has been set, forward :meth:`ngettext` to the fallback.
266-
Otherwise, return the translated message. Overridden in derived classes.
257+
Otherwise, return *singular* if *n* is 1; return *plural* otherwise.
258+
Overridden in derived classes.
267259

268260

261+
.. method:: lgettext(message)
269262
.. method:: lngettext(singular, plural, n)
270263

271-
If a fallback has been set, forward :meth:`lngettext` to the fallback.
272-
Otherwise, return the translated message. Overridden in derived classes.
264+
Equivalent to :meth:`.gettext` and :meth:`ngettext`, but the translation
265+
is returned as a byte string encoded in the preferred system encoding
266+
if no encoding was explicitly set with :meth:`set_output_charset`.
267+
Overridden in derived classes.
268+
269+
.. warning::
270+
271+
These methods should be avoided in Python 3. See the warning for the
272+
:func:`lgettext` function.
273273

274274

275275
.. method:: info()
@@ -279,32 +279,28 @@ are the methods of :class:`NullTranslations`:
279279

280280
.. method:: charset()
281281

282-
Return the "protected" :attr:`_charset` variable, which is the encoding of
283-
the message catalog file.
282+
Return the encoding of the message catalog file.
284283

285284

286285
.. method:: output_charset()
287286

288-
Return the "protected" :attr:`_output_charset` variable, which defines the
289-
encoding used to return translated messages in :meth:`lgettext` and
290-
:meth:`lngettext`.
287+
Return the encoding used to return translated messages in :meth:`.lgettext`
288+
and :meth:`.lngettext`.
291289

292290

293291
.. method:: set_output_charset(charset)
294292

295-
Change the "protected" :attr:`_output_charset` variable, which defines the
296-
encoding used to return translated messages.
293+
Change the encoding used to return translated messages.
297294

298295

299296
.. method:: install(names=None)
300297

301-
This method installs :meth:`self.gettext` into the built-in namespace,
298+
This method installs :meth:`.gettext` into the built-in namespace,
302299
binding it to ``_``.
303300

304301
If the *names* parameter is given, it must be a sequence containing the
305302
names of functions you want to install in the builtins namespace in
306-
addition to :func:`_`. Supported names are ``'gettext'`` (bound to
307-
:meth:`self.gettext`), ``'ngettext'`` (bound to :meth:`self.ngettext`),
303+
addition to :func:`_`. Supported names are ``'gettext'``, ``'ngettext'``,
308304
``'lgettext'`` and ``'lngettext'``.
309305

310306
Note that this is only one way, albeit the most convenient way, to make
@@ -349,49 +345,52 @@ If the :file:`.mo` file's magic number is invalid, the major version number is
349345
unexpected, or if other problems occur while reading the file, instantiating a
350346
:class:`GNUTranslations` class can raise :exc:`OSError`.
351347

352-
The following methods are overridden from the base class implementation:
353-
348+
.. class:: GNUTranslations
354349

355-
.. method:: GNUTranslations.gettext(message)
350+
The following methods are overridden from the base class implementation:
356351

357-
Look up the *message* id in the catalog and return the corresponding message
358-
string, as a Unicode string. If there is no entry in the catalog for the
359-
*message* id, and a fallback has been set, the look up is forwarded to the
360-
fallback's :meth:`gettext` method. Otherwise, the *message* id is returned.
352+
.. method:: gettext(message)
361353

354+
Look up the *message* id in the catalog and return the corresponding message
355+
string, as a Unicode string. If there is no entry in the catalog for the
356+
*message* id, and a fallback has been set, the look up is forwarded to the
357+
fallback's :meth:`~NullTranslations.gettext` method. Otherwise, the
358+
*message* id is returned.
362359

363-
.. method:: GNUTranslations.lgettext(message)
364360

365-
Equivalent to :meth:`gettext`, but the translation is returned as a
366-
bytestring encoded in the selected output charset, or in the preferred system
367-
encoding if no encoding was explicitly set with :meth:`set_output_charset`.
361+
.. method:: ngettext(singular, plural, n)
368362

363+
Do a plural-forms lookup of a message id. *singular* is used as the message id
364+
for purposes of lookup in the catalog, while *n* is used to determine which
365+
plural form to use. The returned message string is a Unicode string.
369366

370-
.. method:: GNUTranslations.ngettext(singular, plural, n)
367+
If the message id is not found in the catalog, and a fallback is specified,
368+
the request is forwarded to the fallback's :meth:`~NullTranslations.ngettext`
369+
method. Otherwise, when *n* is 1 *singular* is returned, and *plural* is
370+
returned in all other cases.
371371

372-
Do a plural-forms lookup of a message id. *singular* is used as the message id
373-
for purposes of lookup in the catalog, while *n* is used to determine which
374-
plural form to use. The returned message string is a Unicode string.
372+
Here is an example::
375373

376-
If the message id is not found in the catalog, and a fallback is specified, the
377-
request is forwarded to the fallback's :meth:`ngettext` method. Otherwise, when
378-
*n* is 1 *singular* is returned, and *plural* is returned in all other cases.
374+
n = len(os.listdir('.'))
375+
cat = GNUTranslations(somefile)
376+
message = cat.ngettext(
377+
'There is %(num)d file in this directory',
378+
'There are %(num)d files in this directory',
379+
n) % {'num': n}
379380

380-
Here is an example::
381381

382-
n = len(os.listdir('.'))
383-
cat = GNUTranslations(somefile)
384-
message = cat.ngettext(
385-
'There is %(num)d file in this directory',
386-
'There are %(num)d files in this directory',
387-
n) % {'num': n}
382+
.. method:: lgettext(message)
383+
.. method:: lngettext(singular, plural, n)
388384

385+
Equivalent to :meth:`.gettext` and :meth:`.ngettext`, but the translation
386+
is returned as a byte string encoded in the preferred system encoding
387+
if no encoding was explicitly set with
388+
:meth:`~NullTranslations.set_output_charset`.
389389

390-
.. method:: GNUTranslations.lngettext(singular, plural, n)
390+
.. warning::
391391

392-
Equivalent to :meth:`gettext`, but the translation is returned as a
393-
bytestring encoded in the selected output charset, or in the preferred system
394-
encoding if no encoding was explicitly set with :meth:`set_output_charset`.
392+
These methods should be avoided in Python 3. See the warning for the
393+
:func:`lgettext` function.
395394

396395

397396
Solaris message catalog support
@@ -509,7 +508,7 @@ module::
509508

510509
import gettext
511510
t = gettext.translation('spam', '/usr/share/locale')
512-
_ = t.lgettext
511+
_ = t.gettext
513512

514513

515514
Localizing your application

Lib/gettext.py

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,9 @@ def gettext(self, message):
279279
def lgettext(self, message):
280280
if self._fallback:
281281
return self._fallback.lgettext(message)
282-
return message
282+
if self._output_charset:
283+
return message.encode(self._output_charset)
284+
return message.encode(locale.getpreferredencoding())
283285

284286
def ngettext(self, msgid1, msgid2, n):
285287
if self._fallback:
@@ -293,9 +295,12 @@ def lngettext(self, msgid1, msgid2, n):
293295
if self._fallback:
294296
return self._fallback.lngettext(msgid1, msgid2, n)
295297
if n == 1:
296-
return msgid1
298+
tmsg = msgid1
297299
else:
298-
return msgid2
300+
tmsg = msgid2
301+
if self._output_charset:
302+
return tmsg.encode(self._output_charset)
303+
return tmsg.encode(locale.getpreferredencoding())
299304

300305
def info(self):
301306
return self._info
@@ -377,7 +382,7 @@ def _parse(self, fp):
377382
if mlen == 0:
378383
# Catalog description
379384
lastk = None
380-
for b_item in tmsg.split('\n'.encode("ascii")):
385+
for b_item in tmsg.split(b'\n'):
381386
item = b_item.decode().strip()
382387
if not item:
383388
continue
@@ -425,24 +430,24 @@ def lgettext(self, message):
425430
if tmsg is missing:
426431
if self._fallback:
427432
return self._fallback.lgettext(message)
428-
return message
433+
tmsg = message
429434
if self._output_charset:
430435
return tmsg.encode(self._output_charset)
431436
return tmsg.encode(locale.getpreferredencoding())
432437

433438
def lngettext(self, msgid1, msgid2, n):
434439
try:
435440
tmsg = self._catalog[(msgid1, self.plural(n))]
436-
if self._output_charset:
437-
return tmsg.encode(self._output_charset)
438-
return tmsg.encode(locale.getpreferredencoding())
439441
except KeyError:
440442
if self._fallback:
441443
return self._fallback.lngettext(msgid1, msgid2, n)
442444
if n == 1:
443-
return msgid1
445+
tmsg = msgid1
444446
else:
445-
return msgid2
447+
tmsg = msgid2
448+
if self._output_charset:
449+
return tmsg.encode(self._output_charset)
450+
return tmsg.encode(locale.getpreferredencoding())
446451

447452
def gettext(self, message):
448453
missing = object()
@@ -582,11 +587,11 @@ def dgettext(domain, message):
582587
return t.gettext(message)
583588

584589
def ldgettext(domain, message):
590+
codeset = _localecodesets.get(domain)
585591
try:
586-
t = translation(domain, _localedirs.get(domain, None),
587-
codeset=_localecodesets.get(domain))
592+
t = translation(domain, _localedirs.get(domain, None), codeset=codeset)
588593
except OSError:
589-
return message
594+
return message.encode(codeset or locale.getpreferredencoding())
590595
return t.lgettext(message)
591596

592597
def dngettext(domain, msgid1, msgid2, n):
@@ -601,14 +606,15 @@ def dngettext(domain, msgid1, msgid2, n):
601606
return t.ngettext(msgid1, msgid2, n)
602607

603608
def ldngettext(domain, msgid1, msgid2, n):
609+
codeset = _localecodesets.get(domain)
604610
try:
605-
t = translation(domain, _localedirs.get(domain, None),
606-
codeset=_localecodesets.get(domain))
611+
t = translation(domain, _localedirs.get(domain, None), codeset=codeset)
607612
except OSError:
608613
if n == 1:
609-
return msgid1
614+
tmsg = msgid1
610615
else:
611-
return msgid2
616+
tmsg = msgid2
617+
return tmsg.encode(codeset or locale.getpreferredencoding())
612618
return t.lngettext(msgid1, msgid2, n)
613619

614620
def gettext(message):

0 commit comments

Comments
 (0)