XSLT存档  

不及格的程序员-八神

 查看分类:  ASP.NET XML/XSLT JavaScripT   我的MSN空间Blog
posts - 174,  comments - 1172,  trackbacks - 0

NSArray enumeration performance examined

One day, I was thinking about NSArray enumeration (also called iteration): since Mac OS X 10.6 and iOS 4, there's the wonderful new world of blocks, and with it came enumerateObjectsUsingBlock:. I assumed it must be slower than fast enumeration (for (object in array) { ... }) due to overhead, but I didn't know for sure and thus decided to actually measure the performance.

Which kinds of enumeration do exist?

Basically, we have four kinds of enumeration available (see also Mike Ash's Friday Q&A 2010-04-09: Comparison of Objective-C Enumeration Techniques).

  1. objectAtIndex: enumeration

    Using a for loop which increases an integer and querying the object using [myArray objectAtIndex:index] is the most basic form of enumeration.

     

    NSUInteger count = [myArray count];
    for (NSUInteger index = 0; index < count ; index++) {
        [self doSomethingWith:[myArray objectAtIndex:index]];
    }

     

  2. NSEnumerator

    This is a form of external iteration[myArray objectEnumerator] returns an object. This object has a method nextObject that we can call in a loop until it returns nil

     

    NSEnumerator *enumerator = [myArray objectEnumerator];
    id object;
    while (object = [enumerator nextObject]) {
        [self doSomethingWith:object];
    }

     

  3. NSFastEnumerator

    The idea behind fast enumeration is to use fast C array access to optimize iteration. Not only is it supposed to be faster than traditional NSEnumerator, but Objective-C 2.0 also provides a very concise syntax.

     

    id object;
    for (object in myArray) {
        [self doSomethingWith:object];
    }

     

  4. Block enumeration

    Available since the introduction of blocks, this allows to iterate an array with blocks. Its syntax isn't as nice as fast enumeration, but there is one very interesting feature: concurrent enumeration. If enumeration order is not important and the jobs can be done in parallel without locking, this can provide a considerable speedup on a multi-core system. More about that in the concurrent enumeration section.

    [myArray enumerateObjectsUsingBlock:^(id object, NSUInteger index, BOOL *stop) {
        [self doSomethingWith:object];
    }];
    [myArray enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
        [self doSomethingWith:object];
    }];

     

Linear enumeration

First, let's discuss linear enumeration: one item after the other.

Graphs

NSArray enumeration performance, Mac OS X, logarithmic scale
NSArray enumeration performance, Mac OS X, linear scale
NSArray enumeration performance, iOS, logarithmic scale
NSArray enumeration performance, iOS, linear scale

Conclusions

Somewhat surprisingly, NSEnumerator is even slower than using objectAtIndex:. This is true for both Mac OS X and iOS. I suspect this is due to the enumerator checking whether the array was modified with each iteration. Somewhat unsurprisingly, fast enumeration holds up to its name and is the fastest solution.

For small arrays, block enumeration is a bit slower than objectAtIndex:, but with more elements in the array the performance becomes almost as fast as fast enumeration.

The difference between fast enumeration and NSEnumeration is already quite big with a million entries: on the iPhone 4S, the former took about 0.037 seconds while the later took about 0.140 seconds. That's already a factor of 3.7.

An oddity

The very first time an NSArray is allocated in a program and the very first time an enumerator is requested with objectEnumerator it takes an unusual long time to complete. For example, to allocate an array with one element the median was 415 nanoseconds on my 2007 MacBook Pro 17". But the very first time an array was allocated it take more than 500,000 nanoseconds, sometimes even up to 1,000,000 nanoseconds! The same with querying the enumerator: instead of the median of 673 nanoseconds it took also took more than 500,000 nanoseconds.

I can only guess about the reason, but I suspect lazy loading is to blame. You probably won't notice this in a real application because by the time your code is running Cocoa or Cocoa Touch probably already has created an array.

Concurrent enumeration

Block enumeration offers the option to enumerate the objects concurrently, if possible. This means the work load is potentially spread over several CPU cores. Not every operation to be done while enumerating can be parallelized, so concurrent enumeration only makes sense if no locking is required within the block: either because each operation really is absolutely independent or because atomic operations can be used (like OSAtomicAdd32 and friends).

So, how well does it perform compared to the other enumeration methods?

The graphs

NSArray enumeration performance with concurrency, Mac OS X, logarithmic scale
NSArray enumeration performance with concurrency, Mac OS X, linear scale
NSArray enumeration performance with concurrency, iOS, logarithmic scale
NSArray enumeration performance with concurrency, iOS, linear scale

Conclusions

For small number of elements, the concurrent enumeration is by far the slowest method. This probably has to do with the additional work that needs to be done preparing the array for concurrent access and starting the threads (I don't know whether GCD or "traditional" threading is used and it doesn't matter; it's an implementation detail we shouldn't need to care about).

However, if the array is large enough all of a sudden concurrent enumeration becomes the fastest method, just as expected. On the iPhone 4S and a million entries, it took about 0.024 seconds to enumerate with concurrent enumeration, but 0.036 seconds with fast enumeration. By contrast, NSEnumeration took 0.139 seconds for the same array! That's already a pretty big difference, a factor of 5.7.

On my office 2011 iMac 24" with a Core i7 quad-core CPU, the million entries where enumerated concurrently in 0.0016 seconds. Fast enumeration of the same array took 0.0044 seconds and NSEnumeration 0.0093 seconds. That's a factor of 5.8 which is pretty close to the iPhone 4S results. I was expecting a bigger difference here, though. On my 2007 MacBook Pro with Core2 Duo dual-core CPU, the factor was "just" 3.7 here.

The threshold when concurrent enumeration became useful was somewhere between 10,000 and 50,000 elements in my tests. With less elements, just go with normal block iteration.

Allocation

I also wanted to know whether the performance is any different depending on how the array was created. I tested two different methods:

  1. Create a C array which references the object instances and create the array using initWithObjects:count:.
  2. Create a NSMutableArray and subsequently add objects using addObject:.

There's no difference when iterating, but there is a difference when allocating: the initWithObjects:count: method is faster. With a very large number of objects, this difference can become significant. Here's an example on how to create an array with NSNumbers:

 

NSArray *generateArrayMalloc(NSUInteger numEntries) {
        id *entries;
        NSArray *result;
        
        entries = malloc(sizeof(id) * numEntries);
        for (NSUInteger i = 0; i < numEntries; i++) {
                entries[i] = [NSNumber numberWithUnsignedInt:i];
        }
    
        result = [NSArray arrayWithObjects:entries count:numEntries];
    
        free(entries);
        return result;
}

 

NSArray allocation performance, Mac OS X, logarithmic scale
NSArray allocation performance, Mac OS X, linear scale
NSArray allocation performance, iOS, logarithmic scale
NSArray allocation performance, iOS, linear scale

How did I measure?

You can download the test application to see how I've measured. Basically I'm measuring how long it takes to iterate an array without actually doing anything else, and repeat that 1000 times. For the graph, the median for each array size was taken. Compilation was done with optimization turned off (-O0). For iOS, testing was done with an iPhone 4S. For Mac OS X, I used my 2007 MacBook Pro 17" and my office 2011 iMac 24". The graphs for Mac OS X show the results of the iMac, but on the MacBook Pro the graphs looked similar, it just was slower.


 

 

What's an effective and great way to compare all the values of NSArray that contains NSNumbersfrom floats to find the biggest one and the smallest one?

Any ideas how to do this nice and quick in objective-c?

shareeditflag
 
    
   
There is no such thing as an NSArray that contains floats. It would have to be an NSArray that contains NSNumbers. – matt Apr 10 '13 at 18:43
    
   
@matt Thanks! I've corrected the question name – Sergey Grischyov Apr 10 '13 at 22:11 

5 Answers

up vote122down voteaccepted

If execution speed (not programming speed) is important, then an explicit loop is the fastest. I made the following tests with an array of 1000000 random numbers:

Version 1: sort the array:

NSArray *sorted1 = [numbers sortedArrayUsingSelector:@selector(compare:)];
// 1.585 seconds

Version 2: Key-value coding, using "doubleValue":

NSNumber *max=[numbers valueForKeyPath:@"@max.doubleValue"];
NSNumber *min=[numbers valueForKeyPath:@"@min.doubleValue"];
// 0.778 seconds

Version 3: Key-value coding, using "self":

NSNumber *max=[numbers valueForKeyPath:@"@max.self"];
NSNumber *min=[numbers valueForKeyPath:@"@min.self"];
// 0.390 seconds

Version 4: Explicit loop:

float xmax = -MAXFLOAT;
float xmin = MAXFLOAT;
for (NSNumber *num in numbers) {
    float x = num.floatValue;
    if (x < xmin) xmin = x;
    if (x > xmax) xmax = x;
}
// 0.019 seconds

Version 5: Block enumeration:

__block float xmax = -MAXFLOAT;
__block float xmin = MAXFLOAT;
[numbers enumerateObjectsUsingBlock:^(NSNumber *num, NSUInteger idx, BOOL *stop) {
    float x = num.floatValue;
    if (x < xmin) xmin = x;
    if (x > xmax) xmax = x;
}];
// 0.024 seconds

The test program creates an array of 1000000 random numbers and then applies all sorting techniques to the same array. The timings above are the output of one run, but I make about 20 runs with very similar results in each run. I also changed the order in which the 5 sorting methods are applied to exclude caching effects.

Update: I have now created a (hopefully) better test program. The full source code is here: https://gist.github.com/anonymous/5356982. The average times for sorting an array of 1000000 random numbers are (in seconds, on an 3.1 GHz Core i5 iMac, release compile):

Sorting      1.404
KVO1         1.087
KVO2         0.367
Fast enum    0.017
Block enum   0.021

Update 2: As one can see, fast enumeration is faster than block enumeration (which is also stated here: http://blog.bignerdranch.com/2337-incremental-arrayification/).

EDIT: The following is completely wrong, because I forgot to initialize the object used as lock, as Hot Licks correctly noticed, so that no synchronization is done at all. And with lock = [[NSObject alloc] init]; the concurrent enumeration is so slow that I dare not to show the result. Perhaps a faster synchronization mechanism might help ...)

This changes dramatically if you add the NSEnumerationConcurrent option to the block enumeration:

__block float xmax = -MAXFLOAT;
__block float xmin = MAXFLOAT;
id lock;
[numbers enumerateObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(NSNumber *num, NSUInteger idx, BOOL *stop) {
    float x = num.floatValue;
    @synchronized(lock) {
        if (x < xmin) xmin = x;
        if (x > xmax) xmax = x;
    }
}];

The timing here is

Concurrent enum  0.009

so it is about twice as fast as fast enumeration. The result is probably not representative because it depends on the number of threads available. But interesting anyway! Note that I have used the "easiest-to-use" synchronization method, which might not be the fastest.

shareeditflag
 
1  
   
often block enumeration is faster than fast enumeration. did u try that? how often did u execute your examples? – vikingosegundo Apr 10 '13 at 16:50
    
   
@vikingosegundo: I have updated the answer. Block enumeration is slower than fast enumeration here. – Martin R Apr 10 '13 at 17:00 
    
   
How often did u execute the codes? one or few runs wont give you a good result. – vikingosegundo Apr 10 '13 at 17:01
    
   
So explicit loop is fastest!!! Quite unbelievable – Anoop Vaidya Apr 10 '13 at 17:05 
    
   
@vikingosegundo: I will improve the test program to determine the average execution time for each method, and then report the result later. – Martin R Apr 10 '13 at 17:06 
posted on 2013-01-31 09:34 不及格的程序员-八神 阅读(...) 评论(...) 编辑 收藏