[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] libpsl design [was: Re: Overly permissive hostname matchi
From: |
Ángel González |
Subject: |
Re: [Bug-wget] libpsl design [was: Re: Overly permissive hostname matching] |
Date: |
Fri, 21 Mar 2014 21:54:29 +0100 |
User-agent: |
Thunderbird |
On 21/03/14 21:13, Daniel Kahn Gillmor wrote:
i've just pushed some cleanup suggestions here:
https://github.com/rockdaboot/libpsl/pull/1
i see you've pulled them already, thanks!
i've got three more conceptual issues which warrant discussion, rather
than a patch, though. If there's a better place to have this discussion
than this mailing list, i'm happy to move to it, please let me know where.
psl_is_tld() semantics
----------------------
the way i see it, we know what it means for psl_is_tld() to return
"true" -- but "false" could mean either:
(A) "this zone is subordinate to a TLD" (as example.com is to com)
or
(B) "this zone is superior to a TLD" (as uk is to co.uk). Note that
"uk" is not a public suffix.
Hmm, actually uk is a public suffix, since not matching anything
explictely in
the list, it will be caught by the implicit last-resource rule '*'.
Also, what would you do with a domain such as his.name?
It is both inferior to a public suffix (.name) and superior
(forgot.his.name).
I think it should have a different return code, though.
IDNA
----
I hate to bring this up, because it's a nightmare and i have no good
answers, but what does this library expect to do about non-ASCII domain
names? effective_tld_names.dat contains the limits in unicode, encoded
as UTF-8, e.g.:
// xn--mgba3a4f16a.ir (<iran>.ir, Persian YEH)
ایران.ir
should we assume that the input from the user is in a similar form? do
we care about locale issues? what about unicode canonicalization? what
if the incoming data is in punycode (the xn--* ascii form) already?
the GNU folks have done the ugly ugly work for us if we're willing to
link to lgpl'ed libraries:
https://www.gnu.org/software/libidn/
I would expect the input in punycode and optionally in utf-8. This means
a preprocessing step from the original list is needed.
If we are handed a i18n domain, punycode them with libidn if we are
linked to it,
else return an error.
An application checking presumably will have already the need to deal with
i18n domain names, so I suppose that if they are able to get the
punycode for
things like querying the dns, and if they can't punycode it, it doesn't
matter so
much that it doesn't work for them ;)
It is disgusting to do a roundtrip utf-8 -> punycode -> utf-8 for
extracting the base
domain, though.
malformed inputs
----------------
What should the library do with malformed inputs? i'm thinking about
super-long strings, strings starting with more than one dot, or with
multiple dots adjacent to each other, strings that don't match whatever
encoding we're expecting users to send, etc.
--dkg
Return an error.
- Re: [Bug-wget] Overly permissive hostname matching, (continued)
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Jeffrey Walton, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Daniel Stenberg, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Jeffrey Walton, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Tim Ruehsen, 2014/03/21
- Re: [Bug-wget] Overly permissive hostname matching, Ángel González, 2014/03/20
- Re: [Bug-wget] Overly permissive hostname matching, Tim Ruehsen, 2014/03/21
- [Bug-wget] libpsl design [was: Re: Overly permissive hostname matching], Daniel Kahn Gillmor, 2014/03/21
- Re: [Bug-wget] libpsl design [was: Re: Overly permissive hostname matching],
Ángel González <=
- Re: [Bug-wget] libpsl design, Daniel Kahn Gillmor, 2014/03/21
- Re: [Bug-wget] libpsl design, Ángel González, 2014/03/21
- Re: [Bug-wget] libpsl design, Tim Rühsen, 2014/03/22
- Re: [Bug-wget] libpsl design, Daniel Kahn Gillmor, 2014/03/22
- Re: [Bug-wget] libpsl design, Tim Rühsen, 2014/03/23
- Re: [Bug-wget] libpsl design, Dagobert Michelsen, 2014/03/23
- Re: [Bug-wget] libpsl design, Daniel Kahn Gillmor, 2014/03/23
- Re: [Bug-wget] Read error at byte ..., Tim Ruehsen, 2014/03/19
- Re: [Bug-wget] Read error at byte ..., Ángel González, 2014/03/19
- Re: [Bug-wget] Read error at byte ..., Jeffrey Walton, 2014/03/19