qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Evaluating Disk IO and Snapshots


From: Juergen Pfennig
Subject: Re: [Qemu-devel] Evaluating Disk IO and Snapshots
Date: Fri, 20 Jan 2006 23:53:34 +0100
User-agent: KMail/1.7.2

Hi Andre
you suggested ...

  While you are at it, have you considered using the LZO libraries
  instead of zlib for compression/decompression speed? Sure, it won't
  compress as much as zlib, but speed improvements should be noticeable.

... sorry. This is a misunderstanding. 

(1) I will not modify qcow and friends. Beware!
(2) The thing works only for the -snapshot file.
(3) The snapshot file uses no compression.
(4) Non-Linux/BSD host would fall-back to qcow.
(5) Yes, a windows implementation would be possible.

Here more details:

The storage for temp data will not rely on sparse files. It will use
two memory mapped temp files, one for index info and one for real
data. I have implemented a simple version of it and am testing it
currently. Speed improvements (IO time) are significant (about 20%).

The zero-memory copy thing ...

There will be a new function for use by ne2000.c, ide.c and friends:

    ptr = bdrv_memory( ...disk..., sector, [read|write|cancel|commit])

In many situations the function can return a pointer into a
mem-mapped region (the windows swap file would be a good example).
This helps to avoid copying data aroud in user-space or between
user-space and kernel. The cancel/commit can be implemented via 
aliasing. The code also helps to combine disk sectors back to pages
without extra cost (windows usually write 4k blocks or larger).

THE PROBLEM: avoiding read before write. I will have a look at the
kernel sources.

Whereas I expect only a 1% winn by the zero-copy stuff, my tests for
another little thing promise a 4% improvment (measured in CPU 
cycles). Or 12.5 ns per IO byte. This is how it works:

OLD CODE (vl.c):
  void *ioport_opaque[MAX_IOPORTS];
  IOPortWriteFunc *ioport_write_table[3][MAX_IOPORTS];
  IOPortWriteFunc *ioport_read_table[3][MAX_IOPORTS];

  void cpu_outl(CPUState *env, int addr, int val)
  {   ioport_write_table[2][addr](ioport_opaque[addr], addr, val);
  }

OLD CODE (ide.c and even worse in ne200.c):
  void writeFunction(void *opaque, unsigned int addr, unsigned int data)
  { IDEState *s = ((IDEState *)opaque)->curr;
     char *p;
     p = s->data_ptr;
     *(unsigned int *)p = data;
     p += 4;
     s->data_ptr = p;
     if (p >= s->data_end) s->end_function();
  }

As you can see repeated port IO produces a lot of overhead. 115 ns per
32-bit word (P4 2.4 GHz CPU).

New Code (vl.c):
  typedef struct PIOInfo {
    /* ... more fields ... */
    IOPortWriteFunc* write;
    void*            opaque;
    char*            data_ptr;
    char*            data_end;
  } PIOInfo;

  PIOInfo*    pio_info_table[MAX_IOPORTS];

  void cpu_outx(CPUState *env, int addr, int val)
  {
    PIOInfo *i = pio_info_table[addr];
    if(i->data_ptr >= i->data_end) // simple call
       i->write(i->opaque, addr, val);
    else {                         // copy call
        *(int *)(i->data_ptr) = val;
        i->data_ptr += 4;
    }
 }

The new code moves the data coying (from ide.c and ne2000.c) into
vl.c. This saves 60 ns per 32-bit word. Some memory is saved,
cache-locality is increased. Async IO implementation gets easier.

THE PROBLEMS:

(1) For a simple call there is a 7ns penalty compared to the
    current solution.
(2) Until now the ide.c and ne2000.c drivers are very closely
    modelled to the hardware. The c code looks a bit like a 
    circuit diagram (1:1 relation). My proposal adds some
    abstraction. The ide.c driver would give up the "drive
    cache" memory and the ne2000.c driver would 1st fetch
    the (raw) data and then process it.

Disappointed?

Yes, it's a bit ugly. For modest speed enhancements a lot of code
is needed. But on the other hand: many small things taken together
can become a big progress (Paul's code generator, dma, async IO...).

I have attached my timing test. Copile it with -03 (-O4 makes no
sense unless you split the code into different files).

Yours Jürgen


Attachment: test.c
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]