libcdio-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libcdio-devel] Vulnerable use of strcpy in iso9660_fs.c


From: Pete Batard
Subject: Re: [Libcdio-devel] Vulnerable use of strcpy in iso9660_fs.c
Date: Tue, 9 Apr 2024 00:44:00 +0200
User-agent: Mozilla Thunderbird

Hi Thomas,

Thanks for the input. At least in Rufus (and most likely in the proposal I submit) I'm not going to bother trying to "guess" what the maximum possible size of a trick UTF-8 string might be, because, if the Unicode Comittee decides to add a bunch of new codepages where a UTF-16 sequence can generate more than what we currently expect the maximum corresponding UTF-8 sequence size to be (be it 1.5 times or more), our attempt at future-proofing the conversion may go down the drain. Or maybe there's a mathematical proof that a UTF-8 glyph byte encoding can never be larger than 1.5 the UTF-16 glyph byte encoding, but, considering the the Unicode Comittee already created a superset of an existing encoding representation (UCS-2 -> UTF-16), I'm not sure I like the idea of trying to be too smart about or expecting specs not to change the deal.

So I'm going to stick to i_fname for length, with the expectation that we're unlikely to see realistic truncations outside of images designed to trigger one, and to my usual approach of "only when someone produces a real-life ISO that breaks our assertion should we need to bother about improving on the most straightforward solution".

On the other hand, I do agree that producing a warning if we detect an overflow is something we want.

The result of this is that my proposal to libcdio should match the patch I just applied to Rufus in [2] very closely. That is, unless someone wants to submit something else before I do (which won't be for at least another week).

Regards,

/Pete

[1] https://design215.com/toolbox/utf8-4byte-characters.php
[2] https://github.com/pbatard/rufus/commit/8a8e41875171f115c7b97608a8b7625787b48b47

On 2024.04.06 17:39, Thomas Schmitt via Libcdio-devel wrote:
Hi,

Pete Batard wrote:
   strncpy(cpy_result, p_psz_out, i_inlen);

Known as nitpicker i want to to point out that this would avoid a memory
corruption in case of overflow but would also truncate the name,
potentially to an incomplete UTF-8 byte sequence at the end.

I add the technical part of my private mail to Rocky of yesterday with
assessment and proposal which is in part combinable with Pete's.

------------------------------------------------------------------
My assessment for now:

The  strcpy() gesture in function _iso9660_recname_to_cstring() is part
of the conversion from UCS-2 to UTF-8.
I don't see the need for strncpy(), because the result of
cdio_charset_to_utf8() seems to be 0-terminated. (At least it does not
reply a length which would be needed if no terminating 0 is appended.)

But the memory safety depends on the caller having allocated cpy_result
with a sufficient size.
And there i see a potential problem with some languages.

The allocation of cpy_result happens in _iso9660_dir_to_statbuf(),
line 858 ff.:

   if (!dir_len) return NULL;

   i_fname = from_711(p_iso9660_dir->filename.len);

   /* .. string in statbuf is one longer than in p_iso9660_dir's listing '\1' */
   stat_len = sizeof(iso9660_stat_t) + i_fname + 2;

   /* Reuse multiextent p_stat if not NULL */
   if (!p_stat) {
     p_stat = calloc(1, stat_len);
     first_extent = true;

The size of p_stat->filename is then (i_fname + 2), i.e.  what is needed
for the UCS-2 representation of the name plus trailing 16-bit 0.
p_stat->filename becomes parameter cpy_result:

Line 956:
         if (!_iso9660_recname_to_cstring(&p_iso9660_dir->filename.str[1],
                                          i_inlen, NULL, p_stat->filename,
                                          u_joliet_level))
Line 965:
     if (!_iso9660_recname_to_cstring(&p_iso9660_dir->filename.str[1],
                                      i_inlen, NULL, p_stat->filename,
                                      u_joliet_level))

Normally text shrinks during the conversion from UCS-2 to UTF-8. But
there are UCS-2/UTF-16 characters which need more than 2 bytes. E.g.:
   https://www.compart.com/en/unicode/U+0800
   Samaritan Letter Alaf
   UTF-8 Encoding:      0xE0 0xA0 0x80
   UTF-16 Encoding:     0x0800

So if the UCS-2 name contains more than one of these UTF-8 3-byters the
surplus space for the 16-bit 0 can be used up and the cpy_result can
overflow, regardless of strcpy() versus strncpy().
(Each 1-byte UTF-8 character in the conversion result adds another byte
  of room for 3-byters. But i doubt that a string with samaritan letters
  will contain many letters from 7-bit US-ASCII.)

It looks like UTF-8 4-byters are not possible in UCS-2.
If we encounter UTF-16 instead, the Joliet characters which yield 4-byte
UTF-8 bytes are 4-byters themselves. (Conversion might yield poor results
if really UCS-2 is expected. But memory should be safe.)

--------------------------------------------------------------------------
Proposal in addition to Pete's:

In case of HAVE_JOLIET and u_joliet_level allocate in
    _iso9660_dir_to_statbuf()
3/2 of the UCS-2 name length (i_fname of a good UCS-2 string is divisibe
by 2):

   stat_len = sizeof(iso9660_stat_t) + 3 * i_fname / 2 + 2;

Rather an alternative to Pete's proposal:

Further i propose to add a parameter
   size_t cpy_result_size
to
   _iso9660_recname_to_cstring()
and to feed it with (3 * i_fname / 2 + 2)  from _iso9660_dir_to_statbuf().
In case of overflow, a message about programming error should be emitted.


Have a nice day :)

Thomas






reply via email to

[Prev in Thread] Current Thread [Next in Thread]