poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] Add io space MMAP for memory mapping devices and files


From: Andreas Klinger
Subject: Re: [PATCH v2] Add io space MMAP for memory mapping devices and files
Date: Mon, 8 Jan 2024 19:33:14 +0100

Hi Jose,

"Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 18:06:
> 
> > Hi Jose,
> >
> > "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 15:08:
> >> 
> >> > Hi Jose,
> >> >
> >> > "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Mo, 08. Jan 00:03:
> >> >> 
> >> >> > Hi again,
> >> >> >
> >> >> > Andreas Klinger <ak@it-klinger.de> schrieb am Sa, 06. Jan 20:42:
> >> >> >> Hi Jose,
> >> >> >> 
> >> >> >> "Jose E. Marchesi" <jemarch@gnu.org> schrieb am Sa, 06. Jan 20:14:
> >> >> >> > > I think that the approach of patch v2 could be optimized by a 
> >> >> >> > > .set alignment
> >> >> >> > > variable for those guys who are aware of such hardware 
> >> >> >> > > constraints on their
> >> >> >> > > architecture and want to optimize this. But I would set the 
> >> >> >> > > default to a value
> >> >> >> > > which works for all, which is the size of the address bus.
> >> >> >> > 
> >> >> >> > I think it is perfectly ok to have IOD specific configuration
> >> >> >> > parameters, sure.
> >> >> >> > 
> >> >> >> > Let's make it a Poke variable that the user can customize, plus a
> >> >> >> > setting in the registry so the .set dot-command works on it.  I 
> >> >> >> > suppose
> >> >> >> > it shall have the name mmap in it, like pk_mmap_alignment.  WDYT?
> >> >> >> 
> >> >> >> great. It makes sense to name it mmap as it only applies to mmap io 
> >> >> >> spaces.
> >> >> > [...]
> >> >> >
> >> >> > After thinking about the problem again I came to a different solution 
> >> >> > which is
> >> >> > much simpler, doesn't need an additional .set variable and avoids the 
> >> >> > signal
> >> >> > handler as well as it is copying with the address bus size if 
> >> >> > possible, thus
> >> >> > avoiding performance bottlenecks. The read function is as follows:
> >> >> >
> >> >> > static int
> >> >> > ios_dev_mmap_pread (void *iod, void *buf, size_t count, ios_dev_off 
> >> >> > offset)
> >> >> > {
> >> >> >   struct ios_dev_mmap *dev_map = iod;
> >> >> >   int align = sizeof(void*);
> >> >> >   char *m = buf;
> >> >> >
> >> >> >   if (offset > dev_map->size || count > dev_map->size - offset)
> >> >> >     return IOD_EOF;
> >> >> >
> >> >> >   /* copy unaligned bytes */
> >> >> >   while (count && offset % align)
> >> >> >     {
> >> >> >       memcpy (m, dev_map->addr + offset, 1);
> >> >> >       count--;
> >> >> >       offset++;
> >> >> >       m++;
> >> >> >     }
> >> >> >
> >> >> >   /* copy with the address bus size */
> >> >> >   while (count >= align)
> >> >> >     {
> >> >> >       memcpy (m, dev_map->addr + offset, align);
> >> >> 
> >> >> This is still copying a byte at a time, isn't it?
> >> >
> >> > There is no general answer to this question. It depends on what the C 
> >> > library
> >> > and the compiler are making out of it. Thus it's also dependent on the
> >> > architecture and the value of align.
> >> 
> >> Actually, memcpy is guaranteed to work regardless of any alignment
> >> requirement restrictions imposed by the architecture targetted by the
> >> compiler, so the general answer is that you can always assume that
> >> memcpy is equivalent to copying bytes by bytes.
> >> 
> >> The compiler will only optimize a memcpy (foo, bar, 8) into a long word
> >> move if it knows that foo and bar are properly aligned according to the
> >> rules of the target.
> >
> > You're right.
> >
> > But I'm still wondering why I get the SIGBUS (on zynqmp) when doing a 
> > memcpy of
> > 2 bytes from an odd address which is mmaped, like:
> >
> > memcpy(some-variable, odd-address, 2);
> 
> I have no idea.  I guess that's up to the driver providing the mmapped
> contents?  Is the driver documenting the rules on how to access the
> contents?

I wrote the driver and I'm doing a usual memory mapping. 

I wrote a function which demonstrates the difference if memcpy is called with a
variable a (for alignment) or with a fixed value:

#include <string.h>

int mycopy(void)
{
        int a = 4;
        unsigned int x;
        unsigned char y[4];

        memcpy(&x, &y[0], 4);

        memcpy(&x, &y[0], a);

        return 0;
}

In the objdump we can see that when called with the fixed value of 4 bytes to be
copied there are just 2 instructions and the function memcpy isn't called at
all. So the compiler resolves the memcpy without the function call.

I think this could really be optimized in the code snipped I provided earlier to
avoid byte by byte copying.

Here is the objdump:

mycopy.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <mycopy>:
#include <string.h>

int mycopy(void)
{
   0:   e92d4800        push    {fp, lr}
   4:   e28db004        add     fp, sp, #4
   8:   e24dd010        sub     sp, sp, #16
   c:   e59f2070        ldr     r2, [pc, #112]  ; 84 <mycopy+0x84>
  10:   e08f2002        add     r2, pc, r2
  14:   e59f306c        ldr     r3, [pc, #108]  ; 88 <mycopy+0x88>
  18:   e7923003        ldr     r3, [r2, r3]
  1c:   e5933000        ldr     r3, [r3]
  20:   e50b3008        str     r3, [fp, #-8]
  24:   e3a03000        mov     r3, #0
        int a = 4;
  28:   e3a03004        mov     r3, #4
  2c:   e50b3010        str     r3, [fp, #-16]
  30:   e51b300c        ldr     r3, [fp, #-12]
        unsigned int x;
        unsigned char y[4];

        memcpy(&x, &y[0], 4);
  34:   e50b3014        str     r3, [fp, #-20]  ; 0xffffffec

        memcpy(&x, &y[0], a);
  38:   e51b2010        ldr     r2, [fp, #-16]
  3c:   e24b100c        sub     r1, fp, #12
  40:   e24b3014        sub     r3, fp, #20
  44:   e1a00003        mov     r0, r3
  48:   ebfffffe        bl      0 <memcpy>

        return 0;
  4c:   e3a03000        mov     r3, #0
  [...]

> >> > But yes I need to confess that when looking into the glibc for example 
> >> > this code
> >> > snippet could be optimized to allow larger amounts of data to be passed 
> >> > to the
> >> > memcpy function so that it's up to the C library and compiler to 
> >> > optimize it the
> >> > best.
> >> >
> >> >
> >> >> 
> >> >> >       count -= align;
> >> >> >       offset += align;
> >> >> >       m += align;
> >> >> >     }
> >> >> >
> >> >> >   /* copy remaining unaligned bytes */
> >> >> >   while (count)
> >> >> >     {
> >> >> >       memcpy (m, dev_map->addr + offset, 1);
> >> >> >       count--;
> >> >> >       offset++;
> >> >> >       m++;
> >> >> >     }
> >> >> >
> >> >> >   return IOD_OK;
> >> >> > }
> >> >> >
> >> >> > The write function is quite similar implemented in the same manner.
> >> >> >
> >> >> > WDYT?
> >> >> >
> >> >> > Andreas
> >> >
> >> > Best regards,
> >> >
> >> > Andreas

-- 
Andreas Klinger
Grabenreith 27
84508 Burgkirchen
+49 8623 919966
ak@it-klinger.de
www.it-klinger.de
www.grabenreith.de

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]