[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#26574: v4.4: POSIX violation with respect to output of a trailing ne
From: |
Eric Blake |
Subject: |
bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix |
Date: |
Thu, 20 Apr 2017 11:46:15 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0 |
On 04/20/2017 11:36 AM, Michael Klement wrote:
> Thanks for the detailed feedback, Eric.
>
> The POSIX spec. is, unfortunately, vague on this topic:
>
> The definition of a line (which you quote) is complemented with the
> definition of an incomplete line
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195>:
>
>> A sequence of one or more non- <newline> characters at the end of the file.
>
>
> So while the standard is aware of this possibility and gives it a name that
> suggests it is a kind of line, but something's missing, there is precious
> little behavior prescribed with respect to such incomplete lines.
>
You're welcome to submit a bug report to get POSIX to more clearly word
its intentions that a file with an incomplete line is NOT a text file
(http://austingroupbugs.net/main_page.php), but everyone on the Austin
Group (myself included) has already agreed that the intention is there
(even if the wording could be improved): Omitting a trailing newline
causes sed to enter into the realm of undefined behavior - and this is
BECAUSE there are existing sed implementations that behave differently
when a trailing newline is omitted. Some do not do anything with an
incomplete line (sed behaves as though the file were truncated at the
last newline).
> So we have:
>
> sed's "input files shall be text files."
> a text file contains "characters organized into zero or more lines"
>
> Beyond the "zero or more lines", the only restrictions placed on what
> constitutes a text file
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403>
> are:
> " The lines do not contain NUL characters and none can exceed {LINE_MAX}
> bytes in length, including the <newline> character. "
>
> If you interpret the word "lines" in the phrase "zero or more lines" to mean
> complete lines only (which is reasonable), then indeed any file that ends in
> an incomplete line is not a text file.
>
> I really wish the spec. were more explicit about incomplete lines.
As I said, you're welcome to propose a bug report with suggested wording
improvements.
>
>> If anything, the only
>> change I would make is have 'sed --posix' error out on non-text input,
>> to call attention to the user's attempt to feed non-posix-compliant data
>> to sed.
>
>
> That is definitely an option, but perhaps intuitive understanding and
> historical practice / other implementations could be considered instead:
>
> Intuitively, a file containing text with an incomplete line is obviously
> still a text file
Not per the POSIX definition of a text file.
It is still a file, but no longer a text file.
It wouldn't be the first time intuition has been wrong.
> wc is an interesting case, which doesn't count an incomplete line as a line
> (the spec
> <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html>. is
> actually unambiguous there and mandates counting the newlines),
Indeed, wc is a good example of how the POSIX writers specifically went
out of their way to describe behaviors of programs that MUST be
consistent when presented with a non-text file; as well as the escape
clause that for all other programs (including sed) that require text
file inputs, the behavior is intentionally unspecified if the trailing
newline is not present.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix, Michael Klement, 2017/04/20
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix, Eric Blake, 2017/04/20
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix, Michael Klement, 2017/04/20
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix,
Eric Blake <=
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix, Assaf Gordon, 2017/04/20
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix, Michael Klement, 2017/04/20
- bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix, Eric Blake, 2017/04/20