Thanks, Michael. You answered my question, and at just the right level for me.
Not feeling that adventurous, I think I'll wait for Snow Leopard rather than experiment with Linux!
By the way, Crafty has always been open source, and has been a great help to many would-be chess programmers. If you'd like to take a look at the code, you can get the latest version 23.0 at
Thanks again,
Louis
On Aug 7, 2009, at 12:55 AM, Michael Petch wrote: Howdy Louis,
I think that MAX_NUMTHREADS was an artificial limit set by the hardware of the day. Christian can likely tell you why it is 16 specifically but I am assuming that it was a someone arbitrary(and reasonable) value based on cores available on most systems.
Onto your OS/X issue. I did a bit of research and my original view on waiting for Snow Leopard may actually be all that is required.
Nehalem processors diverge from the previous generation of Intel processors because they no longer based on SMP (Symmetric MultiProcessor) designs. In an SMP system, generally all processors have access to main memory (RAM) via a single data bus. The problem of course is that the more cores you have, the more contention for memory read/writes that have to occur on that one bus.
Intel decided that SMP designs likely will not scale properly in the future when dealing with large core counts (32, 62, 128 cores etc) so they moved their Nehalem design to NUMA type systems instead of SMP. NUMA is non uniform memory access. In this type of design cores may not necessarily be able to share memory with other processors without some help. I'm nto going to get into the gorey details but the bus system Intel is pushing is the QPI (QuickPath interconnect) bus. This literally replaces the good old FSB (Front side Bus)
NUMA architectures do allow for the concept of "Remote" and "Local" data. Shared data may not be directly available by a processor but it can be retrieved (remotely) but it will be slower. Operating System Kernels need NUMA support in order for shared data access on different buses to work properly.
So your asking, why tell me all this? Well the answer is simple. Apple in their infinite wisdom started using new QPI/Numa hardware without actually fully implementing NUMA in its current kernel! This hasn't been well documented by Apple but it was discovered when companies started running Xserve on the new QPI/Nehalem systems.
Without proper NUMA support, processors can't arbitrarily share memory with all other processors. Which seems to be the case here with GnuBG. Gnubg launches in a single process and then asks the OS/X to create threads (with shared memory requirements). It appears by default that each processor is considered as a separate entity without sharing (On OS/X Leopard). The exception is that eacg core appears as 2 virtual cores. Virtual cores are on the same processor, thus the same bus so one can share memory across them.
It seems when Gnubg launches, all the threads are created on one processor (the processor is originally chosen by OS/X) and accessible by 2 virtual cores (Using Hyperthreading). It seems Apple did this so they could put out new equipment before the next OS (Snow Leopard) was released.
So what does Snow Leopard have that Leapard doesn't? NUMA support.
My guess is that if you got your hands on Snow Leopard you may find that what you are seeing changes. Apparently this very problem exists for people using CS4 (Adobes Creative Studio 4).
Linux supports NUMA, you might be adventuresome and try to install Linux on your Apple Hardware and see what happens.
Your chess program may work because of the way it splits up tasks (It may even use a combination of Posix Threads and separate process spaces). I haven't seen the source code so its very hard to say.
Michael Petch |