[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnucap-devel] floating point optimization
From: |
al davis |
Subject: |
Re: [Gnucap-devel] floating point optimization |
Date: |
Fri, 8 Dec 2006 03:13:32 -0500 |
User-agent: |
KMail/1.9.5 |
On Thursday 07 December 2006 04:44, al davis wrote:
> Two circuit files, one with 147000 nodes, other with 590000
> nodes. The larger circuit swapped unaccepably on the small
> machine so I tested only the smaller circuit there. These
> were used to compare speed.
>
> AMD, large, AMD small, intel-small
> std 39 sec 9.5 sec 11.2 sec
> -ffast-math: 39 sec 9.5 sec 11.2 sec
> -ffloat-store: 50 sec 12 sec 13 sec
>
> The "small" circuit takes 30 minutes to run on ng-spice, on
> the AMD, with equivalent results. Note that the time is 9.5
> SECONDS on gnucap, 30 MINUTES in ng-spice. The algorithms
> are different.
Adding more data ...
The "large" circuit takes almost 8 hours on ng-spice.
This is very significant ..
The "large" circuit is 4 times as big as the "small" circuit,
repeating the "small" circuit 4 times.
With gnucap, it takes 4 times as long to run, or "linear time".
Run time is linearly proportional to the size of the circuit.
A simple loop runs in linear time.
With ng-spice (and spice 3f5 from which it was derived) it takes
about 16 times as long to run. 16 is 4 squared,
indicating "quadratic time". Run time is proportional to the
square of the circuit size. A classic quadratic time code
block would be a pair of nested loops.
I think the bottleneck in Spice is matrix ordering. I think it
uses the Markowitz algorithm, which is typically quadratic
time. It takes a long time to set up, then runs reasonably
fast after it is set up. For small circuits, ordering time is
insignificant. It is masked my matrix solution time for linear
circuits, or model evaluation time for nonlinear circuits. A
quadratic algorithm will eventually dominate if the size gets
big enough.
Gnucap uses a modified block-depth-first-search ordering. It
only orders within blocks, taming the quadratic effect. So, it
turns out at worst to be the sum of (size_of_block)^2. The
biggest block in this test is 37 nodes. When a subcircuit is
duplicated, it is only ordered once. This test circuit is all
repetition, so the size the ordering algorithm sees is 37
nodes, once. That's even better than linear.
On the other hand ... Gnucap takes about twice as much memory
as ng-spice. The matrix is stored twice, components carry
extra data to support mixed-mode, nodes carry extra data.
There is really more space overhead than 2x, but it benefits
some from sharing in a hierarchy. Taming this memory usage is
high on the to-do list. I want to be able to say that it will
handle a million nodes. It is not there yet, unless you have
about 4 gig of real memory.
Classic LU decomposition for a dense matrix takes cubic time.
Sparse matrix techniques improve this, based on the assumption
that as the matrix grows the number of connections remains
roughly constant. If this is so, you can typically get linear
time. Expansion by minors takes factorial time, which explains
why nobody competent does it that way.