[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Python 3 and pygments-parser
From: |
Shigio YAMAGUCHI |
Subject: |
Re: Python 3 and pygments-parser |
Date: |
Mon, 3 Jun 2024 15:21:13 +0900 |
Hi Marcus,
Are you talking about a bug or a feature addition?
If it is a bug, could you please explain the specific steps to reproduce it?
If it is a new feature, could you please explain the specification?
Thank you in advance.
Regards,
Shigio
On Thu, May 30, 2024 at 2:33 AM Marcus Harnisch
<marcus.harnisch@verilab.com> wrote:
>
> Hi Shigio
>
> I am thinking about tackling this feature in a reasonably useful and robust
> way. I am not concerned about Python 2.x, but wouldn't want to break
> compatibility either. As it stands, ‘latin1’ encoding is used for
> implementing something like “binary but with newlines”.
>
> The current implementation of pygments_parser.py is incomplete wrt I/O
> encoding and will probably break when challenged with characters outside the
> ASCII range.
> Encodings of any form of input that are not ASCII-compatible are probably not
> going to work at all.
> Many OS-facing functions, such as ‘os.getenv’, but also the low-level parts
> of ‘subprocess.Popen()’ use ‘sys.getfilesystemencoding()’ for determining the
> desired encoding. Most current unixoid OS are configured to UTF-8 based
> locales, and even Python on Windows defaults to UTF-8 for OS-facing encoding
> (since 2016, Python 3.6+, PEP 529).
> Any non-ASCII content of gtags.conf is most likely going to break
> pygments_parser.py in one way or another. I'd propose to rely on
> ‘sys.getfilesystemencoding()’ as well for reading.
> Source code must be presented to Pygment's Lexers as string. Programming
> languages that allow non-ASCII source code would normally use UTF-8 (e.g.
> Python), which I'd recommend for ‘read_file()’, possibly with an appropriate
> error handler. Depending on how a Lexer implements string handling, exotic
> encodings might even be less broken than before if bytes are preserved via
> ‘surrogateescape’ or ‘backslashreplace’.
>
> IMHO, relying on the respective system default encoding in most places and an
> explicit UTF-8 in read_file() is going to improve compatibility and by side
> effect helps with unifying code paths between Python 2 and 3.
>
> Best regards,
> Marcus
>
> On Thu, May 16, 2024 at 12:42 AM Marcus Harnisch
> <marcus.harnisch@verilab.com> wrote:
>>
>> Hi Shigio
>>
>> Glad to hear that it didn't work :-) Thank you for adding this to the known
>> bugs list.
>>
>> Best regards,
>> Marcus
>>
>> On Tue, May 14, 2024 at 8:16 AM Shigio YAMAGUCHI <shigio@gnu.org> wrote:
>>>
>>> Hi Marcus,
>>> I confirmed that the problem is reproduced.
>>> I have made a new entry to the 'Known bugs' list.
>>> Thank you for the report.
>>>
>>> [https://www.gnu.org/software/global/bugs.html]
>>> o Pygments plug-in parser with python3 does not work, if 'ctagscom' is not
>>> set.
>>> If it is not set, default path obtained by configure script should be
>>> used.
>>>
>>> $ cat > gtags.conf
>>> default:\
>>> :ctagscom=:\
>>> :langmap=C\:.c.h:\
>>> :gtags_parser=C\:/usr/local/lib/gtags/pygments-parser.la:
>>> $ gtags
>>> $ global -x '.*'
>>> $ _ # no tags
>>>
>>> Regards,
>>> Shigio
>>>
>>> On Mon, May 13, 2024 at 5:04 PM Marcus Harnisch
>>> <marcus.harnisch@verilab.com> wrote:
>>> >
>>> > Hi Shigio
>>> >
>>> > On Sat, May 11, 2024 at 5:35 AM Shigio YAMAGUCHI <shigio@gnu.org> wrote:
>>> >>
>>> >> $ cat gtags.conf
>>> >> default:\
>>> >> :ctagscom=/opt/local/bin/uctags:\
>>> >> :langmap=C\:.c.h:\
>>> >> :gtags_parser=C\:/usr/local/lib/gtags/pygments-parser.la:
>>> >
>>> >
>>> > The important difference, which exposes the bug, is your explicit
>>> > configuration of ctagscom. Leave it undefined and rely on whatever
>>> > UNIVERSAL_CTAGS has been configured to. Only if ctagscom is empty, you
>>> > will see a comparison between b'' (empty bytearray) and '' (empty string).
>>> >
>>> > Best regards,
>>> > Marcus
>>>
>>>
>>>
>>> --
>>> Shigio YAMAGUCHI <shigio@gnu.org>
>>> PGP fingerprint:
>>> 26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB
--
Shigio YAMAGUCHI <shigio@gnu.org>
PGP fingerprint:
26F6 31B4 3D62 4A92 7E6F 1C33 969C 3BE3 89DD A6EB
- Re: Python 3 and pygments-parser,
Shigio YAMAGUCHI <=