[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#30116: [PATCH] `substitute' crashes when file contains NUL character
From: |
Mark H Weaver |
Subject: |
bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) |
Date: |
Thu, 14 Jun 2018 03:02:47 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Hi Maxim,
Thanks for working on this. I found a problem with this patch,
and I also have a suggestion. Please see below.
Maxim Cournoyer <address@hidden> writes:
> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <address@hidden>
> Date: Sun, 14 Jan 2018 20:31:33 -0500
> Subject: [PATCH] utils: Prevent substitute from crashing on files containing
> NUL chars.
>
> Fixes issue #30116.
>
> * guix/build/utils.scm (substitute): Add condition to skip lines containing
> the NUL character.
> ---
> guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------
> 1 file changed, 26 insertions(+), 18 deletions(-)
>
> diff --git a/guix/build/utils.scm b/guix/build/utils.scm
> index 7391307c8..975f4e70a 100644
> --- a/guix/build/utils.scm
> +++ b/guix/build/utils.scm
> @@ -3,6 +3,7 @@
> ;;; Copyright © 2013 Andreas Enge <address@hidden>
> ;;; Copyright © 2013 Nikita Karetnikov <address@hidden>
> ;;; Copyright © 2015 Mark H Weaver <address@hidden>
> +;;; Copyright © 2018 Maxim Cournoyer <address@hidden>
> ;;;
> ;;; This file is part of GNU Guix.
> ;;;
> @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the line
> that will be written as
> a substitution of the original line. Be careful about using '$' to match the
> end of a line; by itself it won't match the terminating newline of a line."
> (let ((rx+proc (map (match-lambda
> - (((? regexp? pattern) . proc)
> - (cons pattern proc))
> - ((pattern . proc)
> - (cons (make-regexp pattern regexp/extended)
> - proc)))
> + (((? regexp? pattern) . proc)
> + (cons pattern proc))
> + ((pattern . proc)
> + (cons (make-regexp pattern regexp/extended)
> + proc)))
> pattern+procs)))
> (with-atomic-file-replacement file
> (lambda (in out)
> (let loop ((line (read-line in 'concat)))
> - (if (eof-object? line)
> - #t
> - (let ((line (fold (lambda (r+p line)
> - (match r+p
> - ((regexp . proc)
> - (match (list-matches regexp line)
> - ((and m+ (_ _ ...))
> - (proc line m+))
> - (_ line)))))
> - line
> - rx+proc)))
> - (display line out)
> - (loop (read-line in 'concat)))))))))
> + (cond
> + ((eof-object? line)
> + #t)
> + ((string-contains line (make-string 1 #\nul))
> + ;; The regexp functions of the GNU C library (which Guile uses)
> + ;; cannot deal with NUL characters, so skip to the next line.
> + (format #t "skipping line with NUL characters: ~s\n" line)
> + (loop (read-line in 'concat)))
This code will unconditionally *delete* all lines that contain NULs.
This will happen because the lines with NULs are not being written to
the output file, which will replace the original file when this loop
reaches EOF. So, any lines that are not copied to the output will be
lost.
To preserve the lines with NULs, you should call (display line out)
before calling 'loop'.
Also, please use (string-index line #\nul) to check for NULs instead of
'string-contains'. It should be more efficient.
> + (else
> + (let ((line (fold (lambda (r+p line)
> + (match r+p
> + ((regexp . proc)
> + (match (list-matches regexp line)
> + ((and m+ (_ _ ...))
> + (proc line m+))
> + (_ line)))))
> + line
> + rx+proc)))
> + (display line out)
> + (loop (read-line in 'concat))))))))))
With the changes suggested above, I would have no objection to pushing
this to core-updates. However, it occurs to me that we could handle the
NUL case in a better way:
Since the C regex functions that we use cannot handle NUL bytes, we
could use a different code point to represent NUL during those
operations. We could choose a code point from one of the Unicode
Private Use Areas <https://en.wikipedia.org/wiki/Private_Use_Areas> that
does not occur in the string.
Let NUL* be the code point which will represent NUL bytes. First
replace all NULs with NUL*s, then perform the substitutions, and finally
replace all ALT*s with NULs before writing to the output.
As an important optimization, we should avoid performing these extra
operations unless (string-index line #\nul) finds a NUL.
We could then perform these extra substitutions with simple, inefficient
code. Maybe something like this (untested):
--8<---------------cut here---------------start------------->8---
(with-atomic-file-replacement file
(lambda (in out)
(let loop ((line (read-line in 'concat)))
(if (eof-object? line)
#t
(let* ((nul* (or (and (string-index line #\nul)
(unused-private-use-code-point line))
#\nul))
(line* (replace-char #\nul nul* line))
(line1* (fold (lambda (r+p line)
(match r+p
((regexp . proc)
(match (list-matches regexp line)
((and m+ (_ _ ...))
(proc line m+))
(_ line)))))
line*
rx+proc))
(line1 (replace-char nul* #\nul line1*)))
(display line1 out)
(loop (read-line in 'concat)))))))))
--8<---------------cut here---------------end--------------->8---
Where the following additional private procedures would be added to
(guix build utils) above the definition for 'substitute':
--8<---------------cut here---------------start------------->8---
(define (unused-private-use-code-point s)
"Find a code point within a Unicode Private Use Area that is not
present in S, and return the corresponding character object. If one
cannot be found, return false."
(define (scan lo hi)
(and (<= lo hi)
(let ((c (integer->char lo)))
(if (string-index s c)
(scan (+ lo 1) hi)
c))))
(or (scan #xE000 #xF8FF)
(scan #xF0000 #xFFFFD)
(scan #x100000 #x10FFFD)))
(define (replace-char c1 c2 s)
"Return a string which is equal to S except with all instances of C1
replaced by C2. If C1 and C2 are equal, return S."
(if (char=? c1 c2)
s
(string-map (lambda (c)
(if (char=? c c1)
c2
c))
s)))
--8<---------------cut here---------------end--------------->8---
What do you think?
Mark