bug#18777: [PATCH] dfa: improvement for checking of multibyte character

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18777: [PATCH] dfa: improvement for checking of multibyte character

From:	Norihiro Tanaka
Subject:	bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Date:	Tue, 16 Dec 2014 21:42:32 +0900

On Mon, 15 Dec 2014 09:43:54 -0800
Paul Eggert <address@hidden> wrote:

> Can't we improve this when using_utf8 () is true?  In that case, every
> ASCII character is always single byte.  Also, the bytes 0xc0, 0xc1,
> and 0xf5 through 0xff can be added to the table: they are not
> single-byte characters but they are always encoding errors so they will
> be a character boundary as far as skip_remains_mb is concerned.  This
> suggests that the table 'always_single_byte' should be renamed to
> something like 'always_character_boundary'.
> 
> >     wint_t wc = WEOF;
> > +  if (always_single_byte[*p])
> > +    return p;

Thanks for the review and suggestion.  If using_utf8 () is true, we can
set always_character_boundary to true except 0x80-0xbf.

> This won't assign anything to *WCP, contrary to the documented API for
> for skip_remains_mb.  This is OK (as callers don't care) but the API
> documentation should be changed to reflect the actual behavior.

Oh!  if WCP is needed, we must be go through step by step, as a wide
character before P is set to *WCP.  I fixed it and updated the API
documentation.

0001-dfa-improvement-for-checking-of-multibyte-character-.patch
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/12/15
- bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Paul Eggert, 2014/12/15
  - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka <=
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Paul Eggert, 2014/12/16
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/12/16
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Paul Eggert, 2014/12/16
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/12/17
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Paul Eggert, 2014/12/17
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/12/17
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Paul Eggert, 2014/12/18
    - bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary, Norihiro Tanaka, 2014/12/18

Prev by Date: bug#19388: grep 2.21-1 identifies iso encoded text files as binary
Next by Date: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Previous by thread: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Next by thread: bug#18777: [PATCH] dfa: improvement for checking of multibyte character boundary
Index(es):
- Date
- Thread