emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Suggestion for gdb-ui.el


From: Albert Veli
Subject: Re: Suggestion for gdb-ui.el
Date: Wed, 01 Jun 2005 10:02:37 +0200

Hi Nick!

>  > * Suggestion: Update lisp/progmodes/gdb-ui.el so the *registers* buffer
>  > somehow remembers the line number and scrolls back to the same line
>  > after updating the registers.
> 
> The code was meant to do this but didn't because of an issue with point
> and window point.  If you update from the repository this should work now.

Yes it works now. Great!

> 
>  > It would also be nice if the changed registers would be highlighted
>  > somehow (I noticed it's on the TODO list in gdb-ui.el).
> 
> This shouldn't be too hard to do with the GDB/MI commands.  I have other
> changes that also use GDB/MI but I am waiting till after the release before
> installing them in case such changes introduce bugs.  How useful is this
> feature? I've not done low-level debugging or watched the contents of the
> registers.  What debugging situations require this?  Are you debugging a
> disassembly view in GDB from C code, or assembler (either generated by the
> compiler or hand written) directly.

If you write hand-written inline assembly then it is very useful to see
the registers while debugging. While doing calculations in assembly you
often try to put as many variables as possible into registers and try
not to load/store more than necessary to memory (because memory is
slow).

It can be helpful to see which registers have changed if you, for
instance, write an inline asm function and want to see if it changes
any registers (skip through it with next/nexti and watch the registers).
If you assume registers don't change, but in reality they do change it
can introduce hard-to-find(TM) bugs :)

Without making a short story too long...I'm working on a program that
collects and analyses measurement data from an instrument. Right now I'm
investigating if it would be worth the effort to write SSE optimized
routines for the analysis algorithm.

It would work like this:

1. Check if the cpuid instruction is available. If not, then
   SSE is not  available either.

2. Call cpuid and check if the SSE flag is set.

3. if (SSE is available)
      set function pointers to SSE optimized routines
   else
      set function pointers to C routines.

An SSE optimized routine (inline gcc) could look something like this:


---8<---- ssetest.c ---------

#include <stdio.h>

/* Multiply 4x4 matrix (m) by a 4x1 vector (v)
 * Result is a 4x1 vector (res)
 *
 *     m00 m01 m02 m03     v0
 * m = m10 m11 m12 m13 v = v1
 *     m20 m21 m22 m23     v2
 *     m30 m31 m32 m33     v3
 *
 *       m00*v0 + m01*v1 + m02*v2 + m03*v3
 * res = m10*v0 + m11*v1 + m12*v2 + m13*v3
 *       m20*v0 + m21*v1 + m22*v2 + m23*v3
 *       m30*v0 + m31*v1 + m32*v2 + m33*v3
 *
 * It is more efficient to send in m transposed into the inline asm code
 * because of the nature of the SSE instructions.
 *
 *      m00 m10 m20 m30
 * mt = m01 m11 m21 m31
 *      m02 m12 m22 m32
 *      m03 m13 m23 m33
 */
void mul_transposed_4x4_4x1_sse(float *mt, float *v, float *res)
{
   asm(
      /* Load mt into xmm0-xmm3 */
      "movups     (%0), %%xmm0\n\t" /* xmm0 = mt[0],   mt[1],  mt[2],
mt[3] */
      "movups 0x10(%0), %%xmm1\n\t" /* xmm1 = mt[4],   mt[5],  mt[6],
mt[7] */
      "movups 0x20(%0), %%xmm2\n\t" /* xmm2 = mt[8],   mt[9], mt[10],
mt[11] */
      "movups 0x30(%0), %%xmm3\n\t" /* xmm3 = mt[12], mt[13], mt[14],
mt[15] */

      /* Load v into xmm4 */
      "movups (%1), %%xmm4\n\t"     /* xmm4 = v[0], v[1], v[2], v[3] */

      /* Now multiplicate mt by v */

      /* xmm5 = v[3], v[3], v[3], v[3] */
      "movaps %%xmm4, %%xmm5\n\t"
      "shufps $0xff, %%xmm5, %%xmm5\n\t"
      "mulps %%xmm5, %%xmm3\n\t"    /* xmm3 = mt[12]*v[3], mt[13]*v[3],
mt[14]*v[3], mt[15]*v[3] */

      /* xmm5 = v[2], v[2], v[2], v[2] */
      "movaps %%xmm4, %%xmm5\n\t"
      "shufps $0xaa, %%xmm5, %%xmm5\n\t"
      "mulps %%xmm5, %%xmm2\n\t"    /* xmm2 = mt[8]*v[2], mt[9]*v[2],
mt[10]*v[2], mt[11]*v[2] */

      /* xmm5 = v[1], v[1], v[1], v[1] */
      "movaps %%xmm4, %%xmm5\n\t"
      "shufps $0x55, %%xmm5, %%xmm5\n\t"
      "mulps %%xmm5, %%xmm1\n\t"    /* xmm1 = mt[4]*v[1], mt[5]*v[1],
mt[6]*v[1], mt[7]*v[1] */

      /* xmm4 = v[0], v[0], v[0], v[0] */
      "shufps $0x00, %%xmm4, %%xmm4\n\t"
      "mulps %%xmm4, %%xmm0\n\t"    /* xmm0 = mt[0]*v[0], mt[1]*v[0],
mt[2]*v[0], mt[3]*v[0] */

      /* Add it up in xmm0 */
      "addps %%xmm1, %%xmm0\n\t" /* xmm0 = mt[0]  * v[0] + mt[4]  *
v[1], mt[1]  * v[0] + mt[5]  * v[1],
                                  *        mt[2]  * v[0] + mt[6]  *
v[1], mt[3]  * v[0] + mt[7]  * v[1]
                                  */
      "addps %%xmm3, %%xmm2\n\t" /* xmm2 = mt[8]  * v[2] + mt[12] *
v[3], mt[9]  * v[2] + mt[13] * v[3],
                                  *        mt[10] * v[2] + mt[14] *
v[3], mt[11] * v[2] + mt[15] * v[3]
                                  */
      "addps %%xmm2, %%xmm0\n\t" /* xmm0 = mt[0]*v[0] + mt[4]*v[1] +
mt[8]*v[2] + mt[12]*v[3],
                                  *        mt[1]*v[0] + mt[5]*v[1] +
mt[9]*v[2] + mt[13]*v[3],
                                  *        mt[2]*v[0] + mt[6]*v[1] +
mt[10]*v[2] + mt[14]*v[3],
                                  *        mt[3]*v[0] + mt[7]*v[1] +
mt[11]*v[2] + mt[15]*v[3]
                                  */
      /* Move result to res */
      "movups %%xmm0, (%2)\n\t"

      :                           /* No output */
      : "r"(mt), "r"(v), "r"(res) /* Input: mt=%0, v=%1, res=%2 */

        /* clobbered: xmm0-xmm5 and memory (in *res) */
      : "%xmm0", "%xmm1", "%xmm2", "%xmm3", "%xmm4", "%xmm5", "memory");
}


int main(void)
{
   float res[4];

   float mt[16] = {
      0.0f, 4.0f,  8.0f, 12.0f,
      1.0f, 5.0f,  9.0f, 13.0f,
      2.0f, 6.0f, 10.0f, 14.0f,
      3.0f, 7.0f, 11.0f, 15.0f
   };

   float v[4] = { 0.0f, 1.0f, 2.0f, 3.0f };

   /* Add code to check for SSE here */

   mul_transposed_4x4_4x1_sse(mt, v, res);

   /* Print out res vec. Should be
    *  0*0 +  1*1 +  2*2 +  3*3 = 14
    *  4*0 +  5*1 +  6*2 +  7*3 = 38
    *  8*0 +  9*1 + 10*2 + 11*3 = 62
    * 12*0 + 13*1 + 14*2 + 15*3 = 86
    */
   printf("%.1f, %.1f, %.1f, %.1f\n",
          res[0], res[1], res[2], res[3]);

   return 0;
}

--------- ssetest.c --->8----


Note that the code above don't call cpuid. You must have SSE to
try it out in gdb-ui (processor >= Intel Pentium3 || processor >= AMD
Athlon XP).

On GNU/Linux you can check the output of
'cat /proc/cpuinfo | grep ^flags'
to see if sse is there.

For details about cpuid, check out the "Basic Architecture" manual
chapter 14, available here:

 http://developer.intel.com/design/pentium4/manuals/index_new.htm


Regards, Albert





reply via email to

[Prev in Thread] Current Thread [Next in Thread]