bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’


From: Ludovic Courtès
Subject: bug#35785: ‘string->uri’ is locale-dependent and breaks in ‘sv_SE’
Date: Mon, 03 Jun 2019 15:01:45 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Hi Timothy,

Timothy Sample <address@hidden> skribis:

> Here’s a patch for Guile that uses explicit lists of characters in the
> ‘(web uri)’ module instead of character ranges.  It includes two tests
> that are pretty verbose, but seem to do the trick.
>
> I have a bit more background on the problem, mostly coming from a Glibc
> bug report: <https://sourceware.org/bugzilla/show_bug.cgi?id=23393>.
>
> It turns out that it is well-known upstream, and avoiding character
> ranges is the recommended approach for know.  Some other GNU tools have
> adopted what is being called the “Rational Range Interpretation”
> <https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html>.
> AIUI, this means they use the underlying encoding numbers for ranges (I
> checked the source, but I’m only mostly sure I read it right).  It looks
> like the Glibc folks are unsure how to proceed on this (but are maybe
> slightly leaning towards the “rational” approach).

Great that you gleaned good references on this topic!

> It’s all a pretty big mess, really.  I was hoping there would be some
> obvious thing that would fix the problem more generally.  Short of
> pulling in the Gnulib regex code or writing something in Scheme, it
> looks like Guile is stuck where it is now.

Yeah.  The alternative would be to not use regexps in this context, I
guess.

> I’m unsure if the changes are considered “trivial” from a copyright
> perspective.  It’s pretty close, but I think programmers tend to
> underestimate here.  I’ve started the FSF copyright assignment process
> either way, since is likely not my last Guile patch.  :)

If the process is already underway, I think it’s fine to commit this
patch (I would rather wait if it were longer and/or if we didn’t know
each other already).

> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001
> From: Timothy Sample <address@hidden>
> Date: Sun, 2 Jun 2019 14:41:20 -0400
> Subject: [PATCH] Make URI handling locale independent.
>
> Fixes <https://bugs.gnu.org/35785>.
>
> * module/web/uri.scm (digits, hex-digits, letters): New variables.
> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp,
> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly
> list each character instead of using character ranges.
> * test-suite/tests/web-uri.test: Add corresponding tests.

[...]

> +  (pass-if "http://www.example.com (sv_SE)"
> +    (dynamic-wind
> +      (lambda () #t)
> +      (lambda ()
> +        (with-locale "sv_SE.utf8"
> +          (reload-module (resolve-module '(web uri)))
> +          (uri=? (string->uri "http://www.example.com";)
> +                 #:scheme 'http #:host "www.example.com" #:path "")))

Aren’t ‘reload-module’ calls a leftover that can now be removed (also in
the other test)?

For the sv_SE test, what about taking a host name with a ‘w’, since
that’s the use case that allowed us to uncover this bug?

Apart from that it LGTM, thank you!

Ludo’.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]