[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: test results differents between the perl and XS parsers
From: |
Gavin Smith |
Subject: |
Re: test results differents between the perl and XS parsers |
Date: |
Tue, 17 Nov 2020 17:56:09 +0000 |
User-agent: |
Mutt/1.9.4 (2018-02-28) |
On Tue, Nov 17, 2020 at 12:50:55PM +0100, Patrice Dumas wrote:
> On Tue, Nov 17, 2020 at 07:27:45AM +0000, Gavin Smith wrote:
> > On Tue, Nov 17, 2020 at 12:18:31AM +0100, Patrice Dumas wrote:
> > > +++ t/results/indices/encoding_index_latin1.pl.new 2020-11-17
> > > 00:00:54.879434507 +0100
> > > @@ -158,7 +158,7 @@
> > > 'contents' => [
> > > {
> > > 'parent' => {},
> > > - 'text' => "\x{e9} \x{e9}"
> > > + 'text' => 'é é'
> > > }
> > > ],
> > > 'extra' => {
> > >
> > > and same for encoding_index_latin1_enable_encoding,
> > > encoding_index_utf8 and other similar tests.
> > >
> > > It seems like it is the only case of accented commands in parsed text.
> > > Any idea on what's going on?
> >
> > The two strings appear to be the same string. The question is, why are
> > they output differently? I don't know, and I will look into it when I
> > have time. Things to look at include whether the string is stored
> > internally
> > as UTF-8 or Latin-1, and locale settings when the string is output.
>
> I had a look at the test and in the tests the string input is
> "\x{e9} \x{e9}", irrespective of the encoding. It is possible that this
> test is artificial and works only correctly with the perl parser. When
> I have time, I'll have a look at using an input file for the test (if it
> isn't already existing) instead to have something more similar with
> actual processing.
Yes if it is testing Latin-1 output I think this needs to be output to
a file: if it is just a Perl string that doesn't really have an encoding.