[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: two (bugs? misfeatures?) in libidn
From: |
Simon Josefsson |
Subject: |
Re: two (bugs? misfeatures?) in libidn |
Date: |
Thu, 16 Aug 2012 22:04:03 +0200 |
User-agent: |
Gnus/5.130006 (Ma Gnus v0.6) Emacs/23.3 (gnu/linux) |
Jon Nelson <address@hidden> writes:
> On Thu, Aug 2, 2012 at 3:21 PM, Simon Josefsson <address@hidden> wrote:
>> Jon Nelson <address@hidden> writes:
>>
>>> I've encountered two bugs or misfeatures in libidn:
>>
>> Hi! Thanks for your report.
>>
>>> 1. given an idna-encoded input, it is possible to generate invalid
>>> UTF-8 output (as defined by RFC3629). The UTF-8 is invalid because
>>> codepoints above 0x10FFFF are used.
>>>
>>> See http://tools.ietf.org/html/rfc3629
>>
>> Can you be more concrete, what inputs does this happen for and what
>> output would you expect? An example would help illustrate the problem.
>
> Example: echo xn--1234xxxxxxxxxx | idn -u --debug
Thank you. Interestingly, the punycode code from RFC 3492 happily
decodes the string to Unicode code points > U+10FFFF. I can't see
anything in RFC 3492 (punycode) or RFC 3490 (IDNA ToUnicode) that
requires checking for code points > U+10FFFF, or where that check would
be done. Arguable, the final conversion from UCS4 to UTF8 should
trigger an error in libidn, but then the damage is already done:
ToUnicode has returned a sequence of code points which are illegal. So,
it seems ToUnicode should perform this check somewhere, but I can't find
where it would be suitable reading RFC 3492 and RFC 3490. Thoughts?
/Simon