bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#30116: [PATCH] `substitute' crashes when file contains NUL character


From: Maxim Cournoyer
Subject: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)
Date: Mon, 22 Jan 2018 23:27:04 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)

address@hidden (Ludovic Courtès) writes:

> Mark H Weaver <address@hidden> skribis:
>
>> Maxim Cournoyer <address@hidden> writes:
>>
>>> address@hidden (Ludovic Courtès) writes:
>>>
>>>> Maxim Cournoyer <address@hidden> skribis:
>>>>
>>>>> I've encountered the following crash when trying to use substitute on a
>>>>> file which contains NUL characters:
>>>>
>>>> Yes, that’s because Guile’s ‘regexp-exec’ simply wraps libc’s ‘regexec’,
>>>> which does not handle NULs.
>>>>
>>>> We should consider switching to the pure-Scheme SRFI-115:
>>>>
>>>>   https://srfi.schemers.org/srfi-115/srfi-115.html
>>>
>>> This looks good, and I started looking into porting `substitute' to it,
>>> but quickly noticed it doesn't seem to be implemented in Guile yet?
>
> ISTR that the reference implementation works fine on Guile.
>
>> Indeed.  SRFI-115 for Guile is on my TODO list, although it might be
>> better to wait until after we switch to using UTF-8 encoding internally
>> for strings, since that will drastically affect the implementation of
>> any efficient regexp matcher on Scheme strings.
>
> Indeed, though I suppose it doesn’t matter much for the cases where
> ‘substitute*’ is used?
>
>> Anyway, 'substitute*' is to be used only on text files, and NUL bytes
>> are not a valid textual character.  So, I think that this case is
>> outside of what 'substitute*' is meant to do, and therefore not a bug in
>> 'substitute*', although of course a more graceful error would surely be
>> preferable.
>
> Yes, that’s also a good point.
>
> So yeah, I think it may be good “eventually” to switch to SRFI-115, but
> that’s not urgent.

Sorry for taking some time to answer; I was puzzled by the fact that my
repro didn't work when ran from the REPL. It seems the problem only
occurs when run inside Guix's build environment, maybe a side effect
which depends on the locale used?

In the `patch-el-files' phase of the emacs-build-system, we find the
following snippet:

    (with-directory-excursion el-dir
      ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
      ;; with the "ISO-8859-1" locale.
      (unless (false-if-exception (substitute-cmd))
        (with-fluids ((%default-port-encoding "ISO-8859-1"))
          (substitute-cmd))))

In case an exception is returned while processing the file, it is
retried being opened with the "ISO-8859-1" encoding. Or, this resolves
to a call to `open-file', which documentation says:

‘b’
          Use binary mode, ensuring that each byte in the file will be
          read as one Scheme character.

          To provide this property, the file will be opened with the
          8-bit character encoding "ISO-8859-1", ignoring the default
          port encoding.  *Note Ports::, for more information on port
          encodings.

So, by opening an file whose encoding is unknown as a ISO-8859-1 file,
we are doing the same as if we had passed the 'binary option. Could this
explain why we end up with NUL characters where we were expecting text?

To validate this hypothesis, I've added the following test message to
the patch-el-files phase:

      (unless (false-if-exception (substitute-cmd))
        (format (current-error-port) ">>> IS THIS IT? <<<")
        (with-fluids ((%default-port-encoding "ISO-8859-1"))
          (substitute-cmd))))

And re-ran the emacs-realgud build (minus the patch working around this
issue), and this is what I got:

--8<---------------cut here---------------start------------->8---
starting phase `patch-el-files'
>>> IS THIS IT? <<<Backtrace:
          15 (primitive-load "/gnu/store/xdmirz24yxpxzqh8xj83z9h0axm…")
In ice-9/eval.scm:
   191:35 14 (_ _)
In srfi/srfi-1.scm:
   863:16 13 (every1 #<procedure 9a9540 at /gnu/store/mz8vs1cxv1z7y…> …)
In 
/gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/gnu-build-system.scm:
   684:27 12 (_ _)
In 
/gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/emacs-build-system.scm:
   117:10 11 (patch-el-files #:outputs _)
In srfi/srfi-1.scm:
    640:9 10 (for-each #<procedure substitute-one-file (file-name)> _)
In ice-9/boot-9.scm:
    849:4  9 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  8 (call-with-input-file _ _ #:binary _ #:encoding _ # _)
In 
/gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/utils.scm:
   609:26  7 (_ _)
   635:26  6 (_ #<input: ./realgud/common/bp-image-data.el 17> #<inp…>)
In srfi/srfi-1.scm:
   466:18  5 (fold #<procedure 7ffff53d1520 at /gnu/store/mz8vs1cxv…> …)
In 
/gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/utils.scm:
   638:37  4 (_ _ "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\…")
In ice-9/regex.scm:
   189:12  3 (list-matches _ _ _)
   177:19  2 (fold-matches _ "\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿ…" …)
In unknown file:
           1 (regexp-exec #<regexp 81e400> "\"II*\x00(\x03\x00\x00ÿ…" …)
In ice-9/boot-9.scm:
   760:25  0 (dispatch-exception _ _ _)

ice-9/boot-9.scm:760:25: In procedure dispatch-exception:
ice-9/boot-9.scm:760:25: string contains #\nul character: 
"\"II*\x00(\x03\x00\x00ÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ@@@@ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿÿÿÿÿÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿBBBBÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x10\x10\x10\x10ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿþÿ<<<<ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ<<<<ÿÿÿÿ\x04\x04\x04\x04ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x01\x01\x01\x01ÿÿÿÿ\x04\x04\x04\x04ÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿ<<<<ÿÿÿÿ\x0f\x0f\x0f\x0fÿÿÿÿ\x0f\x0f\x0f\x0fÿÿÿÿ\x0f\x0f\x0f\x0fÿÿþÿ>>>>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ\x14\x00\x00\x01\x03\x00\x01\x00\x00\x00\n"
builder for 
`/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' failed 
with exit code 1
@ build-failed 
/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv - 1 builder 
for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' 
failed with exit code 1
guix build: error: build failed: build of
`/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv'
failed
--8<---------------cut here---------------end--------------->8---

So it is indeed triggered by switching to the "ISO-8859-1" encoding
(although I still cannot reproduce this from the REPL?).

If I remove the exception guard like this:

--8<---------------cut here---------------start------------->8---
     (with-directory-excursion el-dir
       ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded
       ;; with the "ISO-8859-1" locale.
-      (unless (false-if-exception (substitute-cmd))
-        (with-fluids ((%default-port-encoding "ISO-8859-1"))
-          (substitute-cmd))))
+      (substitute-cmd))
     #t))
--8<---------------cut here---------------end--------------->8---

the exception thrown on the first substitute* call is this:

--8<---------------cut here---------------start------------->8---
starting phase `patch-el-files'
Backtrace:
          12 (primitive-load "/gnu/store/dvyyqxfr08fsr18k2f43gakh23d…")
In ice-9/eval.scm:
   191:35 11 (_ _)
In srfi/srfi-1.scm:
   863:16 10 (every1 #<procedure 9a96a0 at /gnu/store/xn6p33hhfyz6l…> …)
In 
/gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/gnu-build-system.scm:
   684:27  9 (_ _)
In 
/gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/emacs-build-system.scm:
   104:27  8 (patch-el-files #:outputs _)
In srfi/srfi-1.scm:
    640:9  7 (for-each #<procedure substitute-one-file (file-name)> _)
In ice-9/boot-9.scm:
    849:4  6 (with-throw-handler _ _ _)
In ice-9/ports.scm:
   444:17  5 (call-with-input-file _ _ #:binary _ #:encoding _ # _)
In 
/gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/utils.scm:
   609:26  4 (_ _)
   645:22  3 (_ #<input: ./realgud/common/bp-image-data.el 15> #<inp…>)
In ice-9/rdelim.scm:
   195:24  2 (read-line _ _)
In unknown file:
           1 (%read-line #<input: ./realgud/common/bp-image-data.el …>)
In ice-9/boot-9.scm:
   760:25  0 (dispatch-exception _ _ _)

ice-9/boot-9.scm:760:25: In procedure dispatch-exception:
ice-9/boot-9.scm:760:25: Throw to key `decoding-error' with args
`("peek-char" "input decoding error" 84 #<input:
./realgud/common/bp-image-data.el 15>)'.
--8<---------------cut here---------------end--------------->8---

Should we keep my workaround for now? It seems there are valid cases to
have the file opened as "ISO-8859-1", but this can mean introducing
binary symbols such as NUL in the data (thus regexp crashes). When we
finally move to srfi-115, we should remove this workaround.

WDYT?

Here's an updated patch with Ludovic's suggestion:

Attachment: 0001-utils-Prevent-substitute-from-crashing-on-files-cont.patch
Description: Text Data

Maxim

reply via email to

[Prev in Thread] Current Thread [Next in Thread]