[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hid
From: |
jonas . hahnfeld |
Subject: |
Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden) |
Date: |
Sat, 04 Apr 2020 00:16:40 -0700 |
On 2020/04/03 22:15:06, hanwenn wrote:
> On 2020/04/03 22:00:02, dak wrote:
> > Is this likely related to the problems in `make check` that James
currently
> > experiences?
>
> Yes.
>
> Unfortunately, the default encoding depends on the environment
>
> "
> In text mode, if encoding is not specified the encoding used is
platform
> dependent: locale.getpreferredencoding(False) is called to get the
>
> "
>
> this means that -depending on locale settings- you may get ascii or
utf-8
> encoding.
>
> I didn't get a problem at first, but if I set encoding='ascii' in the
> open_write_file definition, I also get encoding errors.
It's even more weird than that, Python changed its default in version
3.7. See also one of my commit messages from January:
commit e0c78a4c710c51e1ea87d2b144c0ae713923a2af
Author: Jonas Hahnfeld <address@hidden>
Date: Wed Jan 15 16:39:56 2020 +0100
Issue 5663/1: Use codecs.open() to decode as utf-8
This is in preparation for Python 3.5 where the default encoding
depends on the value of the LANG environment variable. As far as
I can tell, this was changed later on and at least Python 3.7 and
version 3.8 always default to 'utf-8' on Linux. As I'm proposing to
make Python 3.5 the required minimum, we can't rely on this and need
to force 'utf-8' when reading files that could contain Unicode.
So likely James is using Python 3.5 or 3.6, that's why some of us (with
other versions of Python) are not seeing the issue.
As such: LGTM! Please note that codecs.open() is not needed anymore in
Python 3, it was only needed for compatibility with Python 2.4. We
should likely replace all occurrences with plain open() as this patch
does.
https://codereview.appspot.com/563810043/
- output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), dak, 2020/04/03
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), hanwenn, 2020/04/03
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), dak, 2020/04/03
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden),
jonas . hahnfeld <=
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), jonas . hahnfeld, 2020/04/04
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), rietveldpkx, 2020/04/04
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), jonas . hahnfeld, 2020/04/04
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), jonas . hahnfeld, 2020/04/04
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), rietveldpkx, 2020/04/04
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), hanwenn, 2020/04/04
- Re: output-distance: write HTML as UTF-8 (issue 563810043 by address@hidden), nine . fierce . ballads, 2020/04/04