[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: nl: -d option with a single character
From: |
KOBAYASHI Takashi |
Subject: |
Re: nl: -d option with a single character |
Date: |
Mon, 14 Dec 2020 14:37:09 -0800 |
Sorry for my rush job, thanks for your refactoring!
I agree with not breaking compat with POSIX.
However, with this change, the behavior of the -d blank character is
different than before.
I think the previous behavior was important to ensure that all input is
treated as a single page.
This case is not specifically mentioned in POSIX, so I propose it is kept
as the GNU extension.
The fix patch is modified.
Takashi
2020年12月14日(月) 8:53 Pádraig Brady <P@draigbrady.com>:
> On 13/12/2020 23:23, KOBAYASHI Takashi wrote:
> > Hello,
> >
> > I found a bug of -d in nl when a single character is specified. The
> patches
> > are also attached. I think there are two ways to fix this.
> >
> > The current behavior:
> > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
> > 1 a
> > 2 @:@:
> > 3 b
> > 4 @@
> > 5 c
> >
> > POSIX expectations(Attach: nl-d-posix.patch):
> > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
> > 1 a
> >
> > 1 b
> > 2 @@
> > 3 c
> >
> > POSIX(1003.1-2017)
> >> -d delim
> >> Specify the delimiter characters that indicate the start of a logical
> > page section. These can be changed from the default characters "\:" to
> two
> > user-specified characters. If only one character is entered, the second
> > character shall remain the default character ':'.
> >
> > Also, Texinfo is written in the same behavior as POSIX.
> >> ‘-d CD’
> >> ‘--section-delimiter=CD’
> >> Set the section delimiter characters to CD; default is ‘\:’. If
> >> only C is given, the second remains ‘:’. (Remember to protect ‘\’
> >> or other metacharacters from shell expansion with quotes or extra
> >> backslashes.)
> >
> >
> > I don't like this complex POSIX specification because it doesn't allow us
> > to specify a single character delimiter.
> > Does anyone know why POSIX was specified the way it was? I would like to
> > know the background.
> > I believe that the section delimiter should be generated from the string
> > given to the option, regardless of the length of the characters. And by
> the
> > string given two characters, it is possible to make the second character
> > ":".
> >
> > The following is more clear(Attach: nl-d-incompatible.patch):
> > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
> > 1 a
> > 2 @:@:
> > 3 b
> >
> > 1 c
> > $ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@:'
> > 1 a
> >
> > 1 b
> > 2 @@
> > 3 c
>
> Nice find!
>
> I agree that supporting matching a single char would be preferable
> to the POSIX mandated behavior of assuming a second ':' character.
> However no-one has specifically asked for the single char support,
> and I don't think it's worth breaking compat with POSIX and other
> implementations. So we should not apply your second "incompatible"
> patch I think.
>
> BTW, we also have an undocumented behavior that strings longer
> than two characters can be provided.
> We should probably document that GNU extension.
>
> As for the implementation, this code is already a bit "copy and pastey",
> so I'd prefer not to copy the logic again.
>
> I've attached two patches to address this.
> The first changes nothing, but refactors things to simplify the fix.
> The second (in your name) applies the specific fix and adds a test.
>
> I'll apply these later if you're Ok with it.
>
> cheers,
> Pádraig
>
nl-delim-fix2.patch
Description: Source code patch