CPU: Pitfalls and Techniques
Software Engineer Marcel Weiher discusses how Objective-C can achieve both best-of-breed performance and high levels of expressiveness and productivity in the iOS and macOS platforms.
Having had a look at the parameters driving performance and techniques for identifying slow code, let’s now turn to actual techniques for making code run fast. We will look at efficient object representations and ways for those objects to communicate and access data. We will also examine streamlining computation. In all this, the objective will typically be to effectively combine the “Objective” and the “C” parts of Objective-C to achieve the desired balance between performance and encapsulation.
In general, the basic idea is for objects to have C on the inside and messages on the outside, and for the objects themselves to be fairly coarse-grained, mostly static entities. When following these principles, it is possible to start with a fully object-oriented implementation without worries, but with the knowledge that it will be possible to later optimize away any inefficiencies. It has been my experience that it is quite possible to achieve the performance of plain C, and sometimes even beyond.
However, there are pitfalls that not only make an Objective-C program slow (slower than so-called scripting languages), but even worse can be major obstacles to later optimization efforts. These pitfalls usually lie in library constructs that are easy to use but have hidden performance costs, costs that are not localized within a single object where they could be eliminated, but present in interfaces and therefore spread throughout the system and much harder to expunge.
The following will show different options for data representation, communication, and computation, along with their respective trade-offs in terms of coupling, cohesion, and performance.
Representation
One of the primary tasks of a program, especially an object-oriented program, is to represent data. Due to the hybrid nature of the language, an Objective-C programmer has many options available for this task.
Without any claims of completeness, structured data can be represented using a C struct, Objective-C object, or various forms of key-value stores, most prominently Foundation’s NSDictionary and CoreFoundation’s CFDictionary, which are both getting more and more use. Simple scalars can be represented as C float, double, or int and their multitude of variations, Foundation NSInteger and CoreGraphics CGFloat typedefs, and finally Foundation NSNumber and CoreFoundation CFNumber objects. Note that the naming conventions are a bit confusing here: The names NSInteger and NSNumber strongly suggest that these two types are related—for example, with NSInteger being a specific subclass of NSNumber—but in fact they are completely unrelated. NSInteger is a typedef that resolves to a 32-bit int on 32-bit architectures and to a 64-bit long on 64-bit architectures, whereas int is 32 bits in both cases. Similar with CGFloat, which turns into a 32-bit float on 32-bit architectures and a 64-bit double on 64-bit architectures. Example 3.1 shows a few of the possible number representations.
Example 3.1 Numbers as primitives and objects
#import <Foundation/Foundation.h> int main() { int a=1; float b=2.0; NSNumber *c=[NSNumber numberWithInt:3]; CFNumberRef d=CFNumberCreate(kCFAllocatorDefault, kCFNumberFloatType, (const void*)&b ); NSNumber *e=@(5); NSLog(@"a=%d b=%g c=%@ d=%@ e=%@",a,b,c,d,e); return 0; }
In order to come to a good solution, the programmer must weigh trade-offs between decoupling and encapsulation on one hand and performance on the other hand, ideally getting as much decoupling and encapsulation without compromising performance, or conversely maximizing performance while minimizing coupling.
Primitive Types
Possibly the easiest call to make is in the representation of simple scalar types like characters/bytes, integers, and floating point numbers: use the built-in C primitive types whenever possible, and avoid object wrappers whenever possible.
With the language supporting them natively, scalars are convenient to use and perform anywhere from 10 to more than 100 times better than their corresponding Foundation object NSNumber or its CoreFoundation equivalent CFNumber. Table 3.1 gives the details: the first three columns are times for different arithmetic operations on scalar types. The differences in timings for 32- and 64-bit addition and multiplication are probably measuring artifacts, though they were stable when preparing these measurements and it is important to report actual results as measured, not what we think the results should be.
Table 3.1 Primitive operations in 32- and 64-bit architectures
Operation |
add |
multiply |
divide |
-intVal |
NS(int) |
CF(float) |
NS(float) |
64-bit (ns) |
0.67 |
0.79 |
14 |
15 |
44 |
169 |
190 |
32-bit (ns) |
0.72 |
0.76 |
7.8 |
22 |
232 |
182 |
211 |
Division is slower than the other arithmetic operations because dividers in CPUs usually only handle a few bits at a time, rather than a full word, which also explains why 64-bit division is significantly slower than 32-bit division.
Compared to entities that can usually be stored in registers and manipulated in a single clock cycle (or less on superscalar designs), any object representation has excessive overhead, and Objective-C’s fairly heavyweight objects are doubly so. Foundation and CoreFoundation make this overhead even worse by providing only immutable number objects, meaning any manipulation must create new objects. Finally, scalars like numbers and characters tend to be at the leaves of any object graph and therefore are the most numerous entities in a program, with every object containing at least one but more likely many instances of them.
On the flip side, there is little variation or private data that would benefit from the encapsulation and polymorphism that are made possible by an object representation, and number objects are in many ways even less capable than primitive types, for example, by not providing any arithmetic capabilities. This could change in the future if Foundation or another framework provided a number and magnitudes hierarchy similar to that of Smalltalk or LISP, where small integers automatically morph into infinite precision integers, fractions, floating point, or even complex numbers as needed. Alas, Foundation provides none of these capabilities, though the introduction of tagged integers in the 64-bit runtime on OS X 10.7 along with the addition of number literals in 10.8 could be a sign of improvements in the future.
Of course, there are times when an object is required by some other interface, for example, when adding content to an NSArray or NSDictionary. In this case, you must either use NSNumber or an equivalent or provide alternatives to those interfaces—an option we will explore more later in the chapter.
One wrinkle of Table 3.1 is that although most times are similar between 32 and 64 bits, two numbers are different. The division result is about twice as slow on 64 bit, whereas the creation of integer NSNumber objects is six times faster. The division result is easily explained by the fact that the integer division hardware on the particular CPU used processes a fixed number of bits per cycle, and 64-bit operands simply have twice as many bits. The multiply and add circuits, on the other hand, operate on full 64-bit words at once.
The difference in allocation speeds for integer objects on the other hand has nothing to do with the CPU differences and everything with the fact that Apple introduced tagged integers in OS X, but only in the modern runtime, and only for the 64-bit version of that runtime. Tagged integers are a technique taken from old LISP and Smalltalk systems where the value of an integer object is encoded not in an allocated structure pointed to by the object, as usual, but rather in the object pointer itself. This saves the pointer indirection when accessing and especially the memory allocation when creating or destroying the data (integers in this case). This representation takes advantage of the fact that object pointers are at least word aligned, so the lower 2 or 3 bits of a valid object pointer are always 0 on 32-bit and 64-bit systems, respectively. Table 3.2 shows how the tagged pointer representation puts a “1” in the low bit to distinguish tagged pointers from regular pointers, another 7 bits for typing the value, and the remaining 24 or 56 bits to store a value.
Table 3.2 Tagged and regular pointers
Bits |
8-32/64 |
4-7 |
3 |
2 |
1 |
0 |
Regular pointer |
upper address bits |
0 |
||||
Tagged pointer |
value |
subtype |
tag id |
1 |
In fact, it is puzzling that the performance for integer NSNumber creation isn’t much better than it is, since all it takes is the bit-shift and arithmetic OR shown in the makeInt() function of Example 3.2, possibly with some tests depending on the source and target number type—operations that should be in the 1 to 2 ns total range.
Example 3.2 Summing manually created tagged NSNumber objects
#import <Foundation/Foundation.h> #define kCFTaggedObjectID_Integer ((3 << 1) + 1) #define kCFNumberSInt32Type 3 #define kCFTaggedIntTypeOffset 6 #define kCFTaggedOffset 2 #define kCFTaggedIntValueOffset (kCFTaggedIntTypeOffset+kCFTaggedOffset) #define MASK (kCFNumberSInt32Type<<kCFTaggedIntTypeOffset) #define kCFTaggedIntMask (kCFTaggedObjectID_Integer | MASK) static inline int getInt( NSNumber *o ) { long long n=(long long)o; if ( n & 1 ) { return n >> kCFTaggedIntValueOffset; } else { return [o intValue]; } } static inline NSNumber *makeInt( long long o ) { return (NSNumber*)((o << kCFTaggedIntValueOffset) | kCFTaggedIntMask); } int main( int argc , char *argv[] ) { NSNumber* sum = nil; for (int k=0;k<1000000; k++ ) { sum =makeInt(0); for (int i=1;i<=1000;i++) { sum =makeInt(getInt(sum)+i); } } NSLog(@"%@/%@ -> '%@'",sum,[sum class],[sum stringValue]); return 0; }
The reason of course is that Apple has so far hidden this change behind the existing messaging and function call application programming interfaces (APIs) going through CoreFoundation. We are also advised that the representation, including the actual tags, is private and subject to change. What we are leaving on the table is significant: The code in Example 3.2 runs in 1.4 s, compared to 11.4 s for the Foundation/CoreFoundation-based code from Chapter 1.
Hopefully this will change in the future, and the compiler will become aware of these optimizations and be able to generate tagged pointers for integer objects and some of the other tagged types that have been added in the meantime. But as of OS X 10.11 and Xcode 7.3, it hasn’t happened.
Strings
A data type that almost qualifies as a primitive in use is the string, even though it is actually variable in length and doesn’t fit in a processor register. In fact, Objective-C strings were the first and for a long time the only object that had compiler support for directly specifying literal objects.
There are actually several distinct major uses for strings:
Human readable text
Bulk storage of serialized data as raw bytes or characters
Tokens or keys for use in programming
While these cases were traditionally all handled uniformly in C using char* pointers, with some NUL terminated and others with a length parameter handled out of band, conflating the separate cases is no longer possible now that text goes beyond 7-bit ASCII.
Cocoa has the NSString class for dealing with human readable text. It handles the subtleties of the Unicode standard, delegating most of the details to iconv library. This sophistication comes at a cost: roughly one order of magnitude slower performance than raw C strings. Table 3.3 shows the cost of comparing 10- and 32-byte C-Strings with 10- and 32-character NSString objects.
Table 3.3 NSString and C-String operations
Operation |
1 ns |
!strcmp(10) |
strcmp(32) |
!nscmp(10) |
ns append |
nscmp(32) |
1 ns |
1 |
3.3 |
10 |
76 |
77 |
82 |
!strcmp(10) |
|
1 |
3 |
23 |
23 |
25 |
strcmp(32) |
|
|
1 |
7.5 |
7.6 |
8.2 |
!nscmp(10) |
|
|
|
1 |
1 |
1.1 |
ns append |
|
|
|
|
1 |
1.1 |
nscmp(32) |
|
|
|
|
|
1 |
Although NSStrings are expensive, this is an expense well spent when the subject matter really is human-readable text. Implementing correct Unicode handling is complex, error prone, and inherently expensive. In addition, the option of having multiple representations with a common interface is valuable, allowing string representations optimized for different usage scenarios to be used interchangeably. For example, literal NSStrings are represented by the NSConstantString class that stores 8-bit characters, whereas the standard NSCFString class (backed by CFString CoreFoundation objects) stores 16-bit unichars internally. Subclasses could also interchangeably provide more sophisticated implementations such as ropes, which store the string as a binary tree of smaller strings and can efficiently insert/delete text into large strings.
Starting with OS X 10.10, string objects on the 64-bit runtime also got the tagged pointer treatment that we previously saw for integers. This may seem odd, as strings are variable-length data structures, arrays of characters. However, 64 is quite a lot of bits, enough to store seven 8-bit characters and some additional identifying information such as the length. In fact, when I myself proposed tagged pointer strings back in 2007, I also had variants with eight 7-bit ASCII strings, or an even tighter packing that ignores most of the control and special characters to use only 6 bits and thus have room for 9 characters. I don’t know if any of those variants are implemented.
Example 3.3 illustrates the different NSString implementation types: a literal is an instance of ___NSCFConstantString, a CF variant of NSConstantString. Creating a mutable copy creates a new string object, whereas creating a copy of that mutable copy creates a tagged pointer string because the string is only 5 characters long. All of this is implementation dependent, but the differences are relevant when looking at NSDictionary lookup performance.
Example 3.3 Show differences between normal, constant, and tagged strings
#import <Foundation/Foundation.h> void printString( NSString *a ) { NSLog(@"string=%@ %p class: %@",a,a,[a class]); } int main() { NSString *cs=@"Const"; printString(cs); printString([cs mutableCopy]); printString([[cs mutableCopy] copy]); } cc -Wall -o taggedstring taggedstring.m -framework Foundation ./taggedstring string=Const 0x108fe2040 class: __NSCFConstantString string=Const 0x7fb359c0d630 class: __NSCFString string=Const 0x74736e6f4355 class: NSTaggedPointerString
While great for human readable text, NSString objects are somewhat heavyweight to be used for serialized data, which is handled more safely and efficiently by the NSData class. Unlike NSString, which requires an encoding to be known for text data and can therefore not be safely used on arbitrary incoming data (it will raise an exception if the data does not conform to the encoding), NSData can be used with arbitrary, potentially binary data read from the network or a disk. For performance, it is possible to get a pointer to the NSData’s contents via the -byte or -mutableBytes methods for processing using straight memory access, whereas NSString (rightfully) protects its internal data representation, with processing only possible by sending high-level messages or by copying the data out of the NSString as 16-bit unichar character data or encoded 8-bit bytes.
When parsing or generating serialized data formats, even textual ones, it is significantly more efficient to treat the serialized representation such as the JSON in Example 3.4 as raw bytes in an NSData, parse any structure delimiters, numbers, and other non-textual entities using C character processing, and create NSString objects exclusively for actual textual content, rather than reading the serialized representation into an NSString and using NSScanner or other high-level string processing routines.
Example 3.4 Textual content of JSON file is shown in bold
[ { "name": "AAPL", "price": 650.1, "change": 20.41 }, { "name": "MSFT", "price": 62.79, "change": -0.9 }, { "name": "GOOG", "price": 340.79, "change": -5.2 }, ]
Even the strings that appear in such a file tend to be structural rather than actual content, such as the dictionary keys in Example 3.4. These types of structural strings are also represented as NSString objects in Cocoa, just like human-readable text. While convenient due to the literal NSString syntax (@“This is a constant string”), this conflating of human-readable text and functional strings can at times be unfortunate in terms of performance. Fortunately, many types of keys that are more optimized exist—for example, basic C strings, message names, and instance variable names.