[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re-working yyitems
From: |
Valentin Tolmer |
Subject: |
Re-working yyitems |
Date: |
Tue, 7 May 2019 11:23:44 +0900 |
Hi,
I'm still in the progress of refactoring glr.cc to be more c++-friendly and
eventually introduce variants. Right now, I'm trying to eliminate the last
use of YYMALLOC: yyitems.
yyitems is an (optionally) expandable array of yyGLRStackItem, which
contains all the items in all the glr stacks. It's currently handled as an
arena-style array where allocating a new item is just bumping up the
yynextFree pointer. When it needs to grow, reallocation is a bit tricky
since we have to update all the cross-referencing fields of the elements.
Same for compression (IIUC, it's compressed when we get back to a single
stack after removing the others). The main advantage is that
allocating/deleting items on the stack is near-instantaneous, but it makes
the code complex.
If we want to avoid re-writing a vector-like container, the grow operation
should be a simple copy. However, the elements have internal pointers to
each other. The possibilities are thus:
- Keep the current implementation: fast, but ugly and complex
- Switch to indexes instead of pointers for internal references, and switch
the container to vector<item>: extra care must be taken when deleting
elements to update the indexes, but otherwise growing, pushing and popping
should be painless
- Switch to a vector<unique_ptr<item>>: This way the pointers are stable.
We're counting on malloc to handle the arena for us, and
growing/pushing/popping/deleting is painless. However, we lose access to
the index of an element (we can't work it out from a pointer), which is
only used in debug. If the index itself is not important and can be
replaced by a UID, it would be easy enough to add one to the item. This is
the approach I propose.
Detailed summary of the operations:
Currently:
yyGLRStackItem* yyitems; -> raw array of items.
item.yystate/yyoption -> state or option (union field)
item.yystate.yypred/item.yyoption.yystate -> raw pointer to another state.
- when growing:
- if state:
- reloc yystate.yypred
- reloc yystate.yysemantics.yyfirstVal
- if option:
- reloc yyoption.yystate
- reloc yyoption.yynext
- reloc yysplitPoint
- reloc all the yystacks[].yystate
- when compressing:
- update the yypred
- null the split point and last deleted
- after split point:
- update yystate
- update yystacks[0].yystate
- when creating an element:
- essentially no-op
- bump nextFree
- callers should call reserve_glrstack to make sure there is headroom.
- when accessing an element:
- faster (?), element is inlined in the vector.
- when dumping:
- easy access to the element's index (&item - yyitems)
Proposed:
std::vector<std::unique_ptr<yyGLRStackItem>> yyitems; -> array of
unique_ptr of
items. Overall, much simpler code, removing several fields, simplifying the
growing/compressing functions, guaranteed memory safety, ...
- when growing:
- small vector, cheaper to move
- simple push_back
- when compressing:
- std::remove_if or equivalent
- when creating an element:
- cost of allocation
- when accessing an element:
- cost of dereference (but anyway the elements were not often accessed
sequentially (?))
- when dumping:
- no access to the element's index
- Is it needed?
- Do they need to be dense? Or can we assign a UID to each element?
Need to
store the UID, though.
--
Valentin Tolmer
- Re-working yyitems,
Valentin Tolmer <=