[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#25750: [sed] Matching square brackets
From: |
林自均 |
Subject: |
bug#25750: [sed] Matching square brackets |
Date: |
Tue, 28 Mar 2017 14:52:35 +0000 |
Hi Bob,
Thank you for the detailed explanation. That was so helpful.
Best,
John Lin
林自均 <address@hidden> 於 2017年3月28日 週二 下午10:47寫道:
> Hi Bob,
>
> Thank you for the detailed explanation. That was so helpful.
>
> Best,
> John Lin
>
> Bob Proulx <address@hidden> 於 2017年2月16日 週四 下午5:17寫道:
>
> 林自均 wrote:
> > I want to remove the square brackets in a string:
> >
> > $ echo '[1,2,3]' | sed 's/\[//g' | sed 's/\]//g'
> > 1,2,3
> >
> > And it works.
>
> Yes. But the above isn't strictly correct regular expression usage.
> Let's discuss it piece by piece.
>
> echo '[1,2,3]' |
>
> Okay. Good test pattern.
>
> sed 's/\[//g' |
>
> Okay. Since the [ would start a character class and you want it to
> match itself it needs to be escaped.
>
> sed 's/\]//g'
>
> This is not strictly correct. You have escaped the ] with \]. But
> that is not needed. The ] does not do anything special in that
> context. It ends a character class started by a [ but outside of that
> it is simply a normal character. Escaping the \] defaults to being
> just a ] character. But it is a bad habit to get into because
> escaping other characters such as \+ turns on ERE handling. Your
> expressoin should be this following instead.
>
> sed 's/]//g'
>
> Those two could be combined into one sed command.
>
> echo '[1,2,3]' | sed -e 's/\[//g' -e 's/]//g'
> 1,2,3
>
> Or by a combined string split by the ';' separator.
>
> echo '[1,2,3]' | sed 's/\[//g;s/]//g'
> 1,2,3
>
> I tend to prefer the latter. But either is fine.
>
> > However, when I want to do it in a single sed, it does not work:
> >
> > $ echo '[1,2,3]' | sed 's/[\[\]]//g'
> > [1,2,3]
>
> That is incorrect usage. Do not escape characters inside of [...]
> character classes. The above is behaving correctly. But do not
> escape characters inside of [...] character classes.
>
> You are starting a character class to match any of the enclosed
> characters. That is good. But then it is broken by escaping the
> characters inside the character class. Do not escape them. Inside of
> a character class there is nothing special about those characters
> because the class turns off special characters. Therefore trying to
> escape them is wrong. That is the problem.
>
> Please review the documentation on regular expressions here:
>
>
> https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html#Character-Classes-and-Bracket-Expressions
>
> Most meta-characters lose their special meaning inside bracket
> expressions:
>
> ']' ends the bracket expression if it’s not the first list
> item. So, if you want to make the ‘]’ character a list item,
> you must put it first.
>
> Therefore you must start the character class, then immediately put in
> the ] to match itself literally. It does not end the character class
> since an empty class wouldn't make sense.
>
> [ -- start of the character class
> ] -- match a literal ]
> [ -- match a literal [
> ] -- end the class
>
> Here is the working example:
>
> echo '[1,2,3]' | sed 's/[][]//g'
> 1,2,3
>
> > I can manage to make it work by a weird regexp:
> >
> > $ echo '[1,2,3]' | sed 's/[]\[]//g'
> > 1,2,3
>
> That is also incorrect usage. You have added an additional \ into the
> class. You thought you were esaping the [ but since it is inside of a
> bracket character class expression already the \ was simply a normal
> character and matched itself.
>
> echo '[1,2,3]\1\2\3'
> [1,2,3]\1\2\3
> echo '[1,2,3]\1\2\3' | sed 's/[]\[]//g'
> 1,2,3123
> echo '[1,2,3]\1\2\3' | sed 's/[][]//g'
> 1,2,3\1\2\3
>
> As you can see including the \ also removed the \ characters too.
> Because \ was included as part of the character class.
>
> > Is that a bug? If it is, I would like to spend some time to fix it.
>
> It is not a bug. It is incorrect usage. I will close the ticket.
> But please let us know if this makes sense to you. Feel free to
> continue the discussion.
>
> Bob
>
>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#25750: [sed] Matching square brackets,
林自均 <=