Here are my results; the user CPU column was giving me strange numbers, so I'm just going to list total time spent (wall clock time). I have three different loops, so here are the base timings (running the foreach loop in sequential mode, without doMC loaded) for each:
4.33 7.72 16.14My CPU has 4 cores, but the OS sees it as 8 virtual cores. Here are my results for 1, 2, 4 and 8 threads:
1 4.36 7.90 16.20 2 2.50 [2.18] 4.30 [3.95] 8.80 [8.10] [8-15% slower] 4 1.46 [1.09] 2.50 [1.98] 5.06 [4.05] [25-35% slower] 8 1.32 [0.55] 2.30 [0.99] 4.30 [2.02] [110-140% slower]
All times are in seconds, and this loop represents most of the time spent in my script, so while the results are a long way from linear, they represent a useful speed-up. The numbers in square brackets show the speeds if I had got linear improvement.
By the way, my foreach loop had 200 to 250 iterations. The above results tell me that when each foreach loop iteration does more work we get better efficiency. This is fairly course parallelization, which suggests to me that there is lots of room for improvement in the doMC() code.
UPDATE: When running with top, and cores set to 4, I notice 4 new instances of R appear each time it hits the foreach loop, and then they disappear after. I.e. it appears to be using processes, not threads, and creating them on-demand! No wonder my results indicate so much overhead!