Saturday, August 29, 2009

Blocks Break Even point.

Something a little different today -- A few experiments using Snow Leopards GCD
Perhaps someone will come along and prove my methodology wrong -- I'm just trying to get a simple 'gut' feel for how well GCD performs and what the 'break even' is


I wrote a simple cocoa app that does a trivial thing -- It starts a timer, then pushes a request to a background thread who's only job is to call the foreground thread to stop the timer.  Below is the code snippit of interest.

-(BOOL) finish {
stopTime=[NSDate timeIntervalSinceReferenceDate];
NSLog(@"time was %f",stopTime-startTime);
return TRUE;
}


/********* using GCD *************/


-(IBAction) gcd:(id) sender {
startTime=[NSDate timeIntervalSinceReferenceDate];
dispatch_queue_t bgQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
    dispatch_async(bgQueue , ^{
        dispatch_async(dispatch_get_main_queue(), ^{
[self finish];
        });
    });
}
-(void) _bg {
[self performSelectorOnMainThread:@selector(finish) withObject:nil waitUntilDone:NO];
}


/******** Using Perform Selector ***********/


-(IBAction) bg:(id) sender {
startTime=[NSDate timeIntervalSinceReferenceDate];
[self performSelectorInBackground:@selector(_bg) withObject:nil ];
}


/******** Direct Call *************/
-(IBAction) no:(id) sender {
startTime=[NSDate timeIntervalSinceReferenceDate];
[self finish];
}


/********** Using Threads ***********/
-(IBAction) _thread {
NSAutoreleasePool *pool=[NSAutoreleasePool new];;
[NSThread detachNewThreadSelector:@selector(finish) toTarget:self withObject:nil];
[pool release];
}


-(IBAction) thread:(id) sender {
startTime=[NSDate timeIntervalSinceReferenceDate];
[NSThread detachNewThreadSelector:@selector(_thread) toTarget:self withObject:nil];
}



I actually wrote this 4 ways -- One using the 'blocks' feature, one using 'performSelector', one using NSThread and one that calls 'doit' by itself.  All tests were run on a 8 core Mac Pro (16 virtual cores) at 2.66GHz (this is the current as of this writing Mac Pro).

I also took three measure (the code above doesn't show all of the measurements):

  1. The amount of time to push the work to the background and return
  2. The amount of time to execute a trivial instruction in the background
  3. The amount of time to push to the background and then instantly push a trivial instruction back on the main thread


The results?




Time to push to background & return
[us]
Time to execute in background
[us]
Time to push to background, execute in background and push results to main thread
[us]
GCD1320143
performSelector5797194
 Threads (NSThread)58103198
No threading/backgroundingNANA<1


Bottom line?  GCD offers decent performance improvements over conventional approaches. But even with it's improved performance it's overhead is equal to 'hundreds' of method calls (on my test system I can call > 200 methods in 13us) so if your goal is to offload work you would need to already be facing 100us length tasks (e.g. IO, networking, long computation) before it's clearly a win -- Nothing really new here on that conclusion however as that's always been true with multiple threads

I suspect that cocoa now uses GCD under the hood for a lot of it's work so it's not suprising that the various methods all time in about the same -- I'd be curious to see this run on Leopard (but not curious enough to re-boot into leopard!)
Finally the question could be asked:  How much work would it take before the work equaled the overhead of backgrounding?   I wrote  simple loop to just spin until  the timer reached 150us


-(IBAction) loop:(id) sender {
int i;
startTime=[NSDate timeIntervalSinceReferenceDate];
while ([NSDate timeIntervalSinceReferenceDate]-startTime <150e-6) i++;
NSLog(@"number of iterations was %d",i);
}


It turns out to be about 2400 iterations for 150us.


A final note -- I'm testing the case of pushing work off on a background thread to improve the user experience -- clearly there are examples of multithreaded applications where you want to maximize processor utilization or set up a complicated processing chain that this analysis doesn't apply to.

My 'walk away' is you need to perform thousands of method calls before you reach the break even point for using any threading or background tasks.

2 comments:

Anonymous said...

The main thread is somewhat of a special case (it requires the use of Mach signaling since it maintains compatibility with the traditional CF/NSRunLoop).

It would be interesting to do the same comparison between two arbitrary (non-main) threads.

John said...

Agreed.

That was also pointed out in Mike's write up and inspired this post (he mentions the same thing in the comments)