bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget 1.12 IRI test failures on Mac OS X


From: Micah Cowan
Subject: Re: [Bug-wget] wget 1.12 IRI test failures on Mac OS X
Date: Thu, 24 Sep 2009 10:57:52 -0700
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ryan Schmidt wrote:
> Hello,
> 
> I was advised last year that the wget test suite should not be used, and
> to keep an eye on the file tests/README to see if that changed:
> 
> http://marc.info/?l=wget&m=121205151029359&w=2
> 
> That file no longer exists in the wget 1.12 distribution and upon trying
> "make check" on Mac OS X 10.6.1 I see most of the tests pass. Is the
> test suite something that should be used now? I didn't see it mentioned
> in the NEWS or ChangeLog files.

Yes, the tests are hoped to be reliable, on Unix-like systems that
provide the necessary prerequisites.

> The tests that are failing for me are all the IRI ones. The IDN tests
> and all the others are passing fine. I have libidn 1.15 installed.
> Attached is the log of the failed tests.
- - Test-ftp-iri.px:

Despite the fact that Wget reports having saved a file with byte values
"fran\303\247ais.txt", the test apparently finds one with byte values
"franc\314\247ais.txt". It would seem to have been "normalized",
replacing the single Unicode character "lowercase c with cedilla" with
"c" followed by the combining-character "cedilla".

Annoyingly, gnome-terminal erroneously displays this as francąis,
instead of français (though gvim displays it correctly). It took a
little trouble to verify it was the right sequence.

So the trouble seems to be that Wget's "franc\314\247ais.txt" gets
normalized before storing in the filesystem. Perhaps this is something
that Mac OS does, itself?

- - Test-ftp-iri-fallback.px, Test-ftp-iri-recursive.px, and several others.

The two named above have the wrong information about what their test
name is. run-px prints this information anyway, so I should probably
remove the "name" settings within the tests themselves (I never liked
them in the first place: redundant information begs for mistakes like this).

A large number of tests fail because Wget saves a file with direct bytes
for latin1 encodings, and then Wget finds a file back with those bytes
URL-encoded. I don't believe that Wget is doing this encoding; though it
would if it had "restrict_file_names" set to "ascii". But if that were
the case, Wget would announce it was saving 'fran%E7ais.txt', not
'fran\347ais.txt'. Looks like it's another automatic transcoding by the
operating system.

...There's also an interesting message about an uninitialized string on
FTPServer.pm:251. I can reproduce that one, so I'll look into it.

.

It looks like the troubles your experiencing are due to the fact that
the Wget tests assume that the filesystem can take any arbitrary set of
bytes, and will store them as they were given. This is apparently not
the case for Mac OS, and I should probably have known better than to
assume it would be a universal case. Ryan, can you please confirm for me
that this is indeed what's happening?

I'll have to rework the tests, so that they don't make these
assumptions, at least on systems that can't handle them. But the good
new is that this appears to be a problem with the tests, and not with
Wget; IRIs should work fine, provided we don't use the --local-encoding
option to lie about what encoding is acceptable locally.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkq7syAACgkQ7M8hyUobTrHWGQCdG4dN3quV+KChxDE61KMmxKz1
FrAAn3spxTZXkf9P4waWETYkIluZRevI
=fhas
-----END PGP SIGNATURE-----





reply via email to

[Prev in Thread] Current Thread [Next in Thread]