[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0)
From: |
Robert Lougher |
Subject: |
Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0) |
Date: |
Mon, 15 Nov 2004 02:59:12 +0000 |
Hi,
Further to my previous email, here's a patch to JamVM 1.2.0 to include
memory barriers on Intel. Mark, as you've seen the problem with
Eclipse 3, could you give it a test? If there's anybody else who's
seen problems I'd be grateful if you could also give it a go.
Thanks,
Rob.
P.S. I believe the compare_and_swap implementation on Intel was
correct, as any locked instruction forms a memory barrier. However,
there are a couple of other places where ordering is important -- I've
added memory barriers here. In particular, bytecode rewriting in the
interpreter. I suspect this is the most likely cause of the problem,
as Mark said making static methods synchronised slows it down (i.e.
only 1 thread can be in the method). The memory barrier itself is a
locked no-op; the sfence, lfence and mfence instructions exist on the
P4 but will not work on all processors.
On Sat, 13 Nov 2004 12:07:37 -0500, Chris Pickett
<address@hidden> wrote:
> Robert Lougher wrote:
> > Hi all,
> >
> > On Sat, 13 Nov 2004 11:58:53 +0100, Mark Wielaard <address@hidden> wrote:
> >
> >>The Eclipse 3 (but not 2) startup problem seems to only happen on SMP
> >>machine (it disappears when I don't use a SMP kernel, this is on a Intel
> >>hyperthreading system) with jamvm [*]. It works fine with gcj/gij (it
> >>doesn't work anymore with kaffe though since they don't implement
> >>java.lang.ClassLoader.setSigners which we now call).
> >
> >
> > I'm not terribly surprised -- I've never tested JamVM on a real or
> > virtual SMP machine before. When writing the thin-locking
> > implementation I didn't include any SMP memory barriers, so it's
> > something I've been expecting to hear! I'll look at including them
> > for the next release. Mark, would you be willing to do the testing?
> >
> >
> >>Cheers,
> >>
> >>Mark
> >>
> >>[*] Hint for Robert. When inspecting with -verbose I can see that some
> >>classes are [loaded] multiple times. I can slow down crashing a bit by
> >>making various VMClass static methods synchronized, but that is not a
> >>full solution. I think this is a bug in the runtime that needs to guard
> >>against defining the same class from multiple threads and not completely
> >>fixable in our core libraries setup.
> >>
> >
> >
> > I don't think this is the cause. This can happen even on a
> > uni-processor machine. Two threads can see a class hasn't been loaded
> > and start to define it. However, the updating of the loaded class
> > hash table is locked. One thread will win the race and update the
> > table, the other will find it already there, and discard the one it's
> > just loaded. This keeps locking to a minimum, and should lead to
> > overall faster behaviour. It's a bug as to where the -verbose message
> > is printed -- it should only be done by the thread that wins the race.
>
> For what it's worth, we've had SMP problems in SableVM for a while now
> also. They too seem related to thread startup and thread death. It
> never occurred to me that this might be a Classpath problem since until
> now I thought we were the only ones, but then again it could just be
> that both JamVM and SableVM have equally bad internal locking :(. I
> tried putting in memory barriers as prescribed by the JSR133 cookbook
> [1], but it didn't make any difference. In fact, I tried putting a
> StoreLoad barrier in between every single bytecode instruction, and it
> still didn't help. I haven't tested Eclipse, but will try to (or some
> other SableVM person with a working Eclipse installation could try).
>
> [1] http://gee.cs.oswego.edu/dl/jmm/cookbook.html
>
> SableVM also doesn't have any handling of Java volatiles, which do
> indeed exist in the Classpath threading code. However, one would think
> that with a barrier in between every single bytecode that this wouldn't
> matter and that something else must be wrong. We did manage to squash a
> couple of threading bugs when somebody tried to build on NetBSD (I
> think...), and got compile-time pthread initialization warnings.
>
> Again my experience says that this isn't strictly limited to SMP
> machines, but that on UP's the time between context switches is so long
> that it's much harder to catch these heisenbugs.
>
> I think it would be interesting to hear from VM developers who _don't_
> have problems on SMP machines but had them in the past and somehow
> managed to eliminate them.
>
> Chris
>
mb-patch
Description: Binary data
- Re: GNU Classpath 0.12, ..., 1.0, (continued)
- Re: GNU Classpath 0.12, ..., 1.0, Steven Augart, 2004/11/11
- Re: GNU Classpath 0.12, ..., 1.0, Mark Wielaard, 2004/11/13
- Re: GNU Classpath 0.12, ..., 1.0, Robert Lougher, 2004/11/13
- SMP problems (was Re: GNU Classpath 0.12, ..., 1.0), Chris Pickett, 2004/11/13
- Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0), Chris Pickett, 2004/11/13
- Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0), Chris Pickett, 2004/11/13
- Re: SMP problems (was Re: GNU Classpath 0.12, ..., 1.0),
Robert Lougher <=
RE: GNU Classpath 0.12, ..., 1.0, Jeroen Frijters, 2004/11/11