bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#23269: Multi-threaded operation, mbrtowc, and "untangle" script [was


From: sur-behoffski
Subject: bug#23269: Multi-threaded operation, mbrtowc, and "untangle" script [was Re: bug#23269...]
Date: Thu, 21 Apr 2016 09:58:37 +0930
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0

On 04/21/16 02:10, Paul Eggert wrote:
On 04/20/2016 01:49 AM, address@hidden wrote:
This will likely mean interface additions in dfa.h and some minor
code changes in dfa.c.

One thing that bugged me about dfa.c (when I was looking at this yesterday) is 
that it maintains some state in static variables, which means it can't be used 
in multiple threads using different locales. That's not an issue with grep or 
gawk now, but might be for other apps and might conceivably be a problem even 
in grep, which has a multithreaded patch pending and might conceivably want to 
use per-file encodings. So perhaps, while we're thinking about exposing the 
uni-to-multibyte cache anyway, we might want to look into fixing these other 
interface issues as well.

PS. I'm dropping address@hidden from the CC: list, as that email address hasn't 
worked for many years....



G'day,

(Sobs quietly to self:)  One of the explicit design goals that I had
behind writing the "untangle" Lua script was to reduce or eliminate
static variables:  If I recall correctly (it's been 18 months since I
looked at this), I split earlier parts of dfa.c into:
     * charclass;
     * lexer; and
     * parser;

with the remaining dfa.c code (especially the search algorithm)
untouched as being in the "too hard" (for a first pass) basket.

Each of these had an explicit instance/context pointer, e.g. "class",
"lexer" or "parser", as appropriate, eliminating any static variables.
I believe the only exception to this, for a long time, was the handover
of {m,n} counts by static variables -- I ended up inventing a clumsy
"fence" interface so that the parser could explicitly fetch these
values from the opaque lexer context.

I kept updating the script after releases, but stopped when asked to,
as people felt that the signal/noise ratio in the list, resulting from
the regular releases of the script, was being reduced.  Since that
time, a few minor, obvious changes that I wrote in the untangle script
have appeared in patches by others.  A number of static variables have
been changed to being per-instance variables during this time, when
the code has been touched for other reasons, and the instance change
is easy to include.

(At the same time, there has been considerable activity in dfa.c
itself, so updating "untangle" would be a significant undertaking.)

As I was writing this at the time, I was thinking about having different
instances running in parallel, and I recall looking at mbrtowc in this
light.  There is a potential problem if multiple locales are desired:
Some locale-specific processing is done when the modules are first
initialised (e.g. setting up some tables), and mbrtowc itself is not
thread-safe, as it assumes a "current" locale.

So, I'm not sure if a thread-safe (i.e. locale-safe) version of mbrtowc
exists; if not, this needs to be addressed before a split-locale,
multi-threaded version is feasible.  (LC_CTYPE race conditions?)

cheers,

sur-behoffski (Brenton Hoff)
Programmer, Grouse Software






reply via email to

[Prev in Thread] Current Thread [Next in Thread]