bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17328: dfa.c will fail if used on more than one DFA


From: behoffski
Subject: bug#17328: dfa.c will fail if used on more than one DFA
Date: Thu, 24 Apr 2014 08:17:12 +0930
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 04/24/14 05:19, Paul Eggert wrote:
Thanks for reporting that.  That static variable was the result of a recent 
optimization.  I guess we'll need to optimize in a different way.  I noticed 
another static variable that dfaexec also uses, namely 'mbs'; this needs to be 
moved into the struct dfa, for the benefit of any applications that really need 
stateful encodings.  A bonus is that I expect this'll make the dfa code run a 
bit faster. I installed the attached patch, which I hope addresses the issues 
you raised along with the mbs issue.

The code still needs more work in this area.  There shouldn't be any static 
variables at all, even when parsing, though this would change the API.  And the 
code isn't consistent about referring to dfa->mb_cur_max versus MB_CUR_MAX; not 
sure why that is.

The modules created by my "untangle" script are working towards this goal:
Almost all code in fsalex, fsaparse, fsatoken and fsamusts have *no*
static variables (I found a couple of exceptions, e.g. fsalex has an
initialise-once-only static variable in function using_simple_locale,
but this is easily fixed).  The context is explicit in the API:  An
opaque pointer to outsiders, but expanding to a struct pointer internally.

Charclass has one hidden static pointer (a list of pools of classes), but
is explicitly set up to allow multiple lexers/parsers etc to exist in
parallel.  This list is not protected by mutexes, and so is not
thread-safe, but otherwise is capable of handling multiple clients in
parallel (including reusing charclasses across clients).

Fsalex goes even further, and tries to set itself up to be correct even
if the locale is changed later:  It explicitly mines (part of) the
current locale for information when fsalex_syntax () is called, and uses
that thereafter.  Clients dealing with token streams from fsalex, such
as fsaparse, need to depend on the locale snapshot of the lexer in order
to make correct decisions:  An explicit API that I omitted to mention
earlier, "proto-lexparse.h", lays out the protocol for information
exchange between any lexer and any parser in a module-agnostic fashion.
The interface is incomplete, but currently includes the following
opcodes and a generic function to exchange information:

typedef enum proto_lexparse_opcode_enum
{
  PROTO_LEXPARSE_OP_GET_LOCALE,
  PROTO_LEXPARSE_OP_GET_IS_MULTIBYTE_ENV,
  PROTO_LEXPARSE_OP_GET_REPMN_MIN,
  PROTO_LEXPARSE_OP_GET_REPMN_MAX,
  PROTO_LEXPARSE_OP_GET_WIDE_CHAR,
  PROTO_LEXPARSE_OP_GET_DOTCLASS,
} proto_lexparse_opcode_t;

typedef int proto_lexparse_exchange_fn_t (void *lexer_context,
                                          proto_lexparse_opcode_t opcode,
                                          void *parameter);

My long-term hope is that the need for this "exchange" function
would dwindle when the token stream delivered by the lexer is reworked
to deliver information more directly that it does at present, but this
is a non-trivial job, and can wait until later.

At present, the untangle code focuses on the tokens/class/lex/parse/musts
code, and does not try to break out the remaining high-level dfa code
at present.  Having a "dfa->" context reference for the remaining code
should be easy, as this code is already set up for multiple instances in
the original.

--------

I will rebase the untangle script sometime in the next week; I'm
waiting for things to settle down a bit before undertaking this work,
as rebasing the script, while automated to a fair extent, is still a
considerable effort.  As the gap between my reworking of the code and
the current dfa.c code grows, the effort needed to analyse and possibly
reinterpret modified code in a different form to fit into the
"untangle"d layout grows.

Finally, of course, I've been laying low, trying to let people sort
out the next release without having the untangle code as an immediate
distraction.  I've broken my silence here, as the comments regarding
both performance and the desire to eliminate static variables align
very closely with the effort I've already expended writing "untangle",

cheers,

behoffski (Brenton Hoff)
Programmer, Grouse Software






reply via email to

[Prev in Thread] Current Thread [Next in Thread]