Re: [Qemu-devel] [PATCH v4 04/10] block/pflash

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement interele

From:	Markus Armbruster
Subject:	Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices
Date:	Tue, 25 Jun 2019 10:32:52 +0200
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Stephen Checkoway <address@hidden> writes:

>> On Jun 24, 2019, at 12:05, Philippe Mathieu-Daudé <address@hidden> wrote:
>> 
>>> On 6/22/19 2:25 PM, Philippe Mathieu-Daudé wrote:
>>> Hi Stephen,
>>> 
>>> This series haven't fall through the cracks, however it is taking me
>>> longer than expected to review it.
>>> 
>>>> On 4/26/19 6:26 PM, Stephen Checkoway wrote:
>>>> It's common for multiple narrow flash chips to be hooked up in parallel
>>>> to support wider buses. For example, four 8-bit wide flash chips (x8)
>>>> may be combined in parallel to produce a 32-bit wide device. Similarly,
>>>> two 16-bit wide chips (x16) may be combined.
>>>> 
>>>> This commit introduces `device-width` and `max-device-width` properties,
>>>> similar to pflash_cfi01, with the following meanings:
>>>> - `width`: The width of the logical, qemu device (same as before);
>>>> - `device-width`: The width of an individual flash chip, defaulting to
>>>>  `width`; and
>>>> - `max-device-width`: The maximum width of an individual flash chip,
>>>>  defaulting to `device-width`.
>>>> 
>>>> Nothing needs to change to support reading such interleaved devices but
>>>> commands (e.g., erase and programming) must be sent to all devices at
>>>> the same time or else the various chips will be in different states.
>>> 
>>> After some thoughts on this, I'd rather we model how hardware manage
>>> interleaved devices: do it at the bus level, and instanciate N devices
>>> in an interleaved config.
>>> I believe that would drastically reduce this device complexity, and we
>>> would match the real internal state machine.
>>> Also this could be reused by other parallel devices used in a such config.
>>> 
>>>> For example, a 4-byte wide logical device can be composed of four x8/x16
>>>> devices in x8 mode. That is, each device supports both x8 or x16 and
>>>> they're being used in the byte, rather than word, mode. This
>>>> configuration would have `width=4`, `device-width=1`, and
>>>> `max-device-width=2`.
>>> 
>>> 
>>> I'm thinking of this draft:
>>> 
>>> FlashDevice # x8
>>>  MemoryRegionOps
>>>    .valid.max_access_size = 1
>>> 
>>> FlashDevice # x16
>>>  MemoryRegionOps
>>>    .valid.min_access_size = 2
>>>    .valid.max_access_size = 2
>>> 
>>> FlashDevice # x8/x16
>>>  MemoryRegionOps
>>>    .valid.min_access_size = 1
>>>    .valid.max_access_size = 2
>>> 
>>> We might use .impl.min_access_size = 2 and consider all NOR flash using
>>> 16-bit words internally.
>>>    .impl.max_access_size = 2 is implicit.
>>> 
>>> So for you example we'd instanciate one:
>>> 
>>> InterleaverDevice
>>>  Property
>>>    .bus_width = 4 # 4-byte wide logical device, `width=4`
>>>    .device_width = 1 # `device-width=1`
>>>  MemoryRegionOps
>>>    .valid.max_access_size = .bus_width # 4, set at realize()
>>>    .impl.max_access_size = .device_width # 1, set at realize()
>>> 
>>> Then instanciate 4 pflash devices, and link them to the interleaver
>>> using object_property_set_link().
>>> 
>>> typedef struct {
>>>    SysBusDevice parent_obj;
>>>    MemoryRegion iomem;
>>>    char *name;
>>>    /*
>>>     * On a 64-bit wide bus we can have at most
>>>     * 8 devices in 8-bit access mode.
>>>     */
>>>    MemoryRegion device[8];
>>>    unsigned device_count;
>>>    unsigned device_index_mask;
>>>    /* Properties */
>>>    unsigned bus_width;
>>>    unsigned device_width;
>>> } InterleaverDeviceState;
>>> 
>>> static Property interleaver_properties[] = {
>>>    DEFINE_PROP_LINK("device[0]", InterleaverDeviceState,
>>>                     device[0],
>>>                     TYPE_MEMORY_REGION, MemoryRegion *),
>>>    ...
>>>    DEFINE_PROP_LINK("device[7]", InterleaverDeviceState,
>>>                     device[7],
>>>                     TYPE_MEMORY_REGION, MemoryRegion *),
>>>    DEFINE_PROP_END_OF_LIST(),
>>> };
>>> 
>>> Then previous to call InterleaverDevice.realize():
>>> 
>>> In the board realize():
>>> 
>>> 
>>>    for (i = 0; i < interleaved_devices; i++) {
>>>        pflash[i] = create_pflash(...);
>>>        ...
>>>    }
>>> 
>>>    ild = ... create InterleaverDevice ...
>>>    for (i = 0; i < interleaved_devices; i++) {
>>>        char *propname = g_strdup_printf("device[%u]", i);
>>> 
>>> 
>>>        object_property_set_link(OBJECT(&ild->device[i]),
>>>                                 OBJECT(pflash[i]),
>>>                                 propname, &err);
>>>        ...
>>>    }
>>> 
>>> Finally,
>>> 
>>> static void interleaved_realize(DeviceState *dev, Error **errp)

I guess you mean interleaver_realize().

>>> {
>>>    InterleaverDeviceState *s = INTERLEAVER_DEVICE(opaque);
>>> 
>>>    s->device_count = s->bus_width / s->device_width;
>>>    s->device_index_mask = ~(s->device_count - 1);
>>>    ...
>>> }
>>> 
>>> static void interleaved_write(void *opaque, hwaddr offset,
>>>                              uint64_t value, unsigned size)

Likewise.

>>> {
>>>    InterleaverDeviceState *s = INTERLEAVER_DEVICE(opaque);
>>>    MemoryRegion *mr;
>>> 
>>>    /*
>>>     * Since we set .impl.max_access_size = device_width,
>>>     * access_with_adjusted_size() always call this with
>>>     * size = device_width.
>>>     *
>>>     * Adjust the address (offset).
>>>     */
>>>    offset >>= size;
>>>    /* Access the N interleaved device */
>>>    mr = s->device[offset & s->device_index_mask];
>>>    memory_region_dispatch_write(mr, offset, &value, size,
>>>                                 MEMTXATTRS_UNSPECIFIED);
>>> }

What makes this idea interesting is the separation of concerns: we
capture the "interleave memory" aspect in its own device, which we can
then use with any kind of memory device (i.e. any device that provides
the interface the interleaver expects).  The memory devices remain
oblivious of the interleave aspect.

If we needed interleave for just one memory device model, baking it into
that device model would likely be simpler.  I think that's how we ended
up baking it into the cfi.pflash* devices.

>>> 
>>> I'll try a PoC.
>> 
>> So I have a PoC, but then realize I can not use the same flash dump...
>> 
>> I need to deinterleave is. This is easily fixed with few lines of
>> Python, then if I want to store/share the dump (aka 'backend storage') I
>> have to re-interleave it.
>> 
>> I wonder if it would be possible/easy to add a 'interleave' option to
>> blockdev to be able to have 2 pflash devices sharing the same backend...
>> Is it worthwhile? Kevin/Markus/Max any thought on this?

I'm not sure I understand completely, so let me restate the problem and
your solution idea.

"Flash memory is narrow, and needs to be interleaved to a more
convenient width" is an implementation detail.  For the most part, you
want to hide this detail, and view the combination of interleaver logic
+ narrow memory as a unit.  In particular, when connecting to a block
backend for persistence, you want to connect this whole unit, without
having to know anything about its internal interleaving.

You obviously have to connect the block backend to the interleaver.
But what do you connect to the memory devices then?

One idea is to have an interleaver block filter node.  Each memory
device gets connected to the block backend via a suitably configured
interleaver block filter node, which provides access to its own stripes.
Together, they cover the whole block backend.

Is this reasonably close to what you mean?

Here's another possible idea: factor persistence out of the memory
devices as well.

Our persistent memory devices are funny beasts: they pretend to be block
devices just to gain convenient means for implementing persistence.

Their access pattern is quite different from real block devices: they
read the complete image at initialization time, then only ever write.

Unless the device's unit of writes happens to be a multiple of the block
backend's block size, there's write amplification: we write the blocks
that contain the written range.  Due to the way the block layer works,
this can even result in a read-modify-write cycle (I think).

Now consider the following composite device:

           sysbus
             |
    +--------|--------+
    |        |        |
    |    persister ------ block backend
    |        |        |
    |   interleaver   |
    |    /  ...  \    |
    | mem   ...   mem |
    +-----------------+

If we ignore the internal composition, we have a device similar to the
cfi.pflash*: it's a TYPE_SYS_BUS_DEVICE with a BlockBackend property.

Internally, the persister takes care of (1) initializing the contents,
and (2) persisting writes to the block backend.  The interleaver takes
care of routing reads and writes to the memory devices, adjusting width
as necessary.

Glossed over here: the guest interface.  I figure the interleaver and
the mem devices cooperate to support wide access.  I haven't thought
through the details.

Of course, device decomposition is not the only way to separate
concerns!  Perhaps factoring out persistence and interleaving into a
library for use by the device models that need it would be simpler.
Can't say without trying it.

> Hi Phil,
>
> Sorry for the delay, I’ve been traveling.
>
> I considered something like this approach and I think it could work. 
> Ultimately, I opted not to go that route for a few reasons:
> - duplicated CFI tables and other state waste (a small amount of) memory when 
> the flash chips are the same (the usual case in my limited experience)

Is the state de-duplication 100% transparent to the guest?

> - it adds an extra layer of read/write calls plus recombining from/splitting 
> into the component parts

I suspect the layer exists in a monolithic device model as well.  Going
to a composite device model then turns direct calls into indirect ones.
This is obviously not free.

I figure the persister could stay out of the read path.

> - duplicated timers firing to walk the programming/erasing state machine 
> forward for each chip
> - the firmware or data stored in the chips is likely already interleaved 
> necessitating either splitting it up before use or adding functionality to a 
> lower layer to split it (as you’ve suggested here)
>
> None of the above seem like a big deal separately or together but I didn’t 
> find the advantages of this approach to be sufficiently compelling to justify 
> it. Namely, it allows using a heterogeneous set of flash chips to implement a 
> logical device.

As long as persistence and interleaving apply to flash memory only, we
can probably afford some emulation inefficiency.  Software engineering
concerns are likely more important.  Regardless, I agree with you that
the effort to separate things needs to be justified.

> Nevertheless, if that’s the route you think is best, I have no objections.

[...]

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices, Philippe Mathieu-Daudé, 2019/06/22
- Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices, Philippe Mathieu-Daudé, 2019/06/24
  - Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices, Stephen Checkoway, 2019/06/24
    - Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices, Markus Armbruster <=
    - Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices, Stephen Checkoway, 2019/06/25

Prev by Date: Re: [Qemu-devel] [PULL SUBSYSTEM s390x 0/3] s390x/tcg: pending patches
Next by Date: Re: [Qemu-devel] [PATCH v2 1/1] qmp: Add deprecation information to query-machines
Previous by thread: Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices
Next by thread: Re: [Qemu-devel] [PATCH v4 04/10] block/pflash_cfi02: Implement intereleaved flash devices
Index(es):
- Date
- Thread