4.3 Basic Data Types
Foundation provides a number of data types, some as primitive C types, and some as object types. Some of these represent some kind of structured data, such as a string or a date, while others are collections of arbitrary types.
Any nontrivial Cocoa program is likely to make heavy use of some of these. All of them provide a rich set of methods for manipulating them, and so you should take care to check the documentation carefully before implementing new features for them.
4.3.1 Non-Object Types
OpenStep was originally designed to work on very slow computers by today's standards. One of the big improvements in performance over Smalltalk came from the judicious use of non-object types. The most obvious of these are the various primitive integer and floating point types. There are also a small number of structures, such as NSRange, which are used throughout the Foundation frameworks.
There are several reasons why these are not objects. The first is their size. Most of these structures are pairs of values. A range is a start and a length, for example. Adding on four bytes for an isa pointer and four bytes for a reference count would double their size. By making them structures, they can be passed by value in registers, which makes calling methods (and functions) that use or return them faster. Finally, they are rarely aliased. When you set a range or a point or rectangle somewhere, you want to set a copy.
The most common structures used in Cocoa are
- NSRange, a pair of positive integers representing an offset and length in a sequence. These are most commonly used with NSStrings for defining substrings, but can be used with arrays and other similar data structures.
- NSPoint, which contains two floating-point values representing x and y coordinates.
- NSSize, which is structurally equivalent to NSPoint. The difference between NSSize and NSPoint is that the values for a size should never be negative. As a structure it is unable to enforce this constraint; however, assigning a negative value to either field may cause exceptions or subtle failures.
- NSRect, an aggregate of an NSPoint and an NSSize that allows a rectangle to be defined in 2D space.
Note that the last three of these are all most commonly used for drawing functions in AppKit, even though they are defined in Foundation.
Foundation also includes a number of other primitive data types, including a large number of enumerated types. Common examples of these include NSComparisonResult, which defines NSOrderedAscending, NSOrderedSame, and NSOrderedDescending, and is used to define how two objects should be ordered. If you sort a collection of Cocoa objects, the order will be defined by calling a function or a method that returns one of these three values on pairs of objects in the collection.
4.3.2 Strings
One of the most commonly used classes in Foundation is NSString. Technically speaking, this means subclasses of NSString, since it is a class cluster and is never directly used.
Each concrete subclass of NSString must override at least two of the methods defined by this class: -length and -characterAtIndex:. The first of these returns the length of the string, and the second returns a unicode (32-bit) character at a specified index. Note that the internal format of the string may be in any format. The class cluster design allows 8-, 16-, and 32-bit strings to all be stored internally when a given string does not include any characters from outside the set that can be expressed with these. The programmer can be largely oblivious to this and use these strings interchangeably: The NSString subclass will transparently handle any conversion required.
Although these are the only methods that need to be overridden, most of the methods in NSString will call getCharacters:range:, which writes a substring into a buffer provided by the caller. Subclasses that implement this directly, rather than using the superclass implementation that repeatedly calls -characterAtIndex:, will be much faster.
Note that this method name begins with the get prefix. This is a common Cocoa idiom for methods that return a value into space provided by the caller. Contrast this with the length method, which does not have the get prefix, and just returns the length.
Although it is possible to create your own subclass of NSString, it is generally a better option to compose objects without subclassing. An example of this in the Foundation framework is NSAttributedString. This responds to -stringValue messages to return the string for which it stores attributes, but cannot be used directly in place of a string. We will look at this class in a lot more detail in Chapter 8.
NSString has one public subclass (which is also a class cluster), for representing strings that can be modified: NSMutableString. This adds methods for modifying characters. Only seven new methods are added by this class, with six being defined in terms of the one primitive method: replaceCharactersInRange:withString:.
The NSString class has a huge number of methods, and 10.5 added a lot more. A lot of these are to do with path handling. One of the problems that OS X developers encountered a lot in the early days was the fact that MacOS and OPENSTEP had different ways of representing paths. MacOS used a multi-routed file hierarchy, with one file for each disk, with path components separated by colons. OPENSTEP used a UNIX-style file hierarchy, with a single root and path components separated by slashes. Mac OS X applications often had to deal with both.
Fortunately, this was a problem that NeXT had already encountered. OpenStep applications were able to run on Solaris, OPENSTEP, and Windows. Windows file paths were similar in structure to classic MacOS paths. NSString has a set of methods for adding and deleting path components and splitting paths apart in a way that is independent of the underlying filesystem representation. It is good practice to use these, rather than manually constructing paths.
Recent versions of OS X have begun to move away from using file paths entirely, with a lot of methods now using URLs in the file:// namespace instead of file paths. There are fewer methods on NSString for dealing with these; however, the NSURL class provides a lot more.
4.3.3 Boxed Numbers and Values
The advantage of using primitive types is speed. The disadvantage is that they don't integrate well with collections that expect objects. There are three classes that are provided for working around these. Each of them boxes a specific kind of primitive value.
The most general boxing class is NSValue, which can contain any primitive type. This is most commonly used for encapsulating the Foundation struct types, such as NSRange and storing them in collections. This class has a subclass (actually, a class cluster), NSNumber, which is used to store single numerical values. Any value from a char to a long long stored in one of these, and it will correctly cast the result if any of the -somethingValue family of methods is called. For example, you can create an NSNumber from a primitive unsigned int like this:
[NSNumber numberWithUnsignedInt: myInt];
It could then be stored in a collection, retrieved, passed to another method, and then turned into a 64-bit value like this:
[aNumber longLongValue];
Be careful when doing this, however. If you do the reverse operation—create an NSNumber with a 64-bit value and then retrieve a 32-bit or smaller value—then there will be silent truncation of the result.
Decimal Arithmetic
In addition to the standard binary types inherited from C, and their boxed equivalents, Foundation defines an NSDecimal structure and a NSDecimalNumber boxed equivalent. These can be used for performing decimal floating point arithmetic. Some decimal numbers, such as 0.1, cannot be represented as finite binary values. This is problematic for financial applications, where a fixed number of decimal digits of precision is required. The NSDecimal type can be used to accomplish this.
There is one remaining boxed value, which is often overlooked. NSNull is a singleton—only one instance of it ever exists—representing a boxed version of NULL.
Unlike many of the other classes in Foundation, there is no NSMutableNumber or NSMutableDecimalNumber. If you need to modify a boxed value, you need to first unbox it, then perform primitive operations on it, and then box it again. This makes sense, since operations on primitive values are typically a lot faster than message sends. In a language like Smalltalk or Lisp, the compiler would try to transparently turn the object into a primitive value and do this for you, but Objective-C compilers are not (yet) clever enough to do so.
4.3.4 Data
In C, arbitrary data is typically represented in the same way as strings; by char*s. In Cocoa, using string objects would not work, since they perform character set conversion. The NSData class exists to encapsulate raw data. You can think of it as a boxed version of void*, although it also stores a length, preventing pointer arithmetic bugs from overwriting random memory locations.
You can get a pointer to the object's data by sending it a -bytes message. It may seem that this will be more efficient; however, this is not always the case. In some cases, the underlying representation may be a set of non-contiguous memory regions, or data in a file that has not been read into memory. When you call -bytes the object is required to ensure that all of the data is in a contiguous memory region, which may be an expensive operation. Subsequent operation on the data will, in the absence of swapping, be very fast.
You can use NSData and its mutable subclass, NSMutableData, for doing simple file I/O operations. Data objects can be initialized using the contents of a file, either using file reading operations or using mmap(). Using a memory-mapped NSData object is often a very convenient way of doing random access on a file. On 32-bit platforms you can exhaust your address space fairly quickly doing this, but on 64-bit systems you have a lot of spare address space for memory mapped files.
One big advantage of accessing files in this way is that it is very VM-friendly. If you read the contents of a file into memory and then the system is low on RAM, then it has to write out your copy to the swap file, even if you haven't modified it. If you use a NSData object created with dataWithContentsOfMappedFile: or similar, then it will simply evict the pages from memory and read them back from the original file when needed.
Since NSData objects can be initialized from URLs, they provide a very simple means of accessing the system's URL loading services. OS X has code for loading data from a wide variety of URL types, including files, HTTP, and FTP.
4.3.5 Caches and Discardable Data
Memory conservation is an important problem for a lot of modern applications. In recent years, the price of memory has fallen considerably, and so it becomes increasingly tempting to use some of it to store results from calculations or data received over the network. This suddenly becomes a problem when you want to port your code to a device that has a small amount of memory, like the iPhone, or when everyone is doing it.
With OS X 10.6, Apple introduced the NSDiscardableContent protocol. This defines a transactional API for working with objects. Before you use an object that implements this protocol, you should send it a -beginContentAccess message. If this returns YES, then you can use the object as you would and then send an -endContentAccess message when you are finished. Other code may send the object a -discardContentIfPossible message, and if this message is received outside of a transaction, then the receiver will discard its contents.
This is easiest to understand with a concrete implementation, such as that provided by a new subclass of NSMutableData called NSPurgeableData. This behaves in exactly the same way as NSMutableData, but also implements the NSDiscardableContent protocol. When it receives a -discardContentIfPossible message, it will free the data that it encapsulates unless it is currently being accessed.
You may want to combine objects that uses the NSDiscardableContent protocol with existing code. The -autoContentAccessingProxy method, declared on NSObject, lets you do this safely. This returns a proxy object that calls -beginContentAccess on the receiver when it is created, and -endContentAccess when it is destroyed, passing all other messages on to the original object. This prevents the contents of the object from being freed as long as the proxy exists.
This is useful for storing cached data, for example, images rendered from other data in the application, that can be regenerated if required. The object remains valid, but its contents do not. This means that you can use it as a form of zeroing weak reference in non-garbage-collected environments. It is more flexible than a weak reference, however, because it provides fine-grained control over when it can be freed.
Most commonly, you will use objects that implement this protocol in conjunction with NSCache. This class is conceptually similar to a dictionary but is designed for storing discardable content. When you add an object to a cache, you use the -setObject:forKey:cost: method. The third argument defines the cost of keeping this object in the cache. When the total cost exceeds the limit set by -setTotalCostLimit:, the cache will attempt to discard the contents of some objects (and, optionally, the objects themselves) to reduce the cost.
Most commonly the cost is memory. When using NSPurgeableData instances, you would use the size as the limit. You might also use caches to limit the number of objects holding some other scarce resource, such as file handles, or even some remote resources hosted on a server somewhere.
4.3.6 Dates and Time
Time on POSIX systems is stored in time_t values. In a traditional UNIX system, this was a 32-bit signed value counting seconds since the UNIX epoch (the start of 1970). This means that there will be a new version of the Y2K bug some time in 2038, when this value overflows. On OS X, the time_t is a long, meaning that it is 32 bit on 32-bit systems and 64 bit on 64-bit systems. If people are still using OS X in three hundred trillion years, when this overflows, then they probably will have had enough time to port their software to some other system.
Since the implementation of time_t is implementation-dependent, it was not a good fit for Cocoa. On some platforms it is an integer, on others a floating point value. Cocoa defines a NSTimeInterval type, which is a double. As a floating point value, the accuracy of an NSTimeInterval depends on the size of the value. A double has a 53-bit mantissa and a 10-bit exponent. If the least significant bit of the mantissa is a millisecond, then the value can store 9 x 1012 seconds, or around 285,427 years. If you use a range of under around a hundred thousand years, it will store half milliseconds, and so on. For any value that could be stored in a 32-bit time_t, the value will be accurate to under a microsecond, which is usually more accurate than is needed. The time slicing quantum for most UNIX-like systems is around 10ms, meaning that you are very unlikely to get timer events more accurately than every few tens of milliseconds.
As with other primitive values, Foundation defines both the basic primitive type and a number of classes for interacting with them in a more friendly way. These gain a little more precision by using the start of 2001 (the year OS X was publicly released) as their reference date.
Date handling is much more complex than time handling. While an NSTimeInterval can represent a time four hundred years ago easily, getting the corresponding calendar date is much more complex. The Gregorian calendar was introduced in 1582, but Britain didn't switch over until 1752 and Russia didn't switch until 1918. The existence of leap years and leap seconds further complicates matters, meaning that a NSTimeInterval may represent different dates in different locales. And all of this is before you get into the matter of time zones.
The NSDate class is a fairly simple wrapper around a time interval from some reference date (2001 by default, although the UNIX epoch and the current timestamp are other options). The NSCalendarDate subclass provides a version in the Gregorian calendar, although its use is discouraged.
With 10.4, Apple introduced the NSCalendar class, which encapsulates a calendar. A calendar is a mechanism from mapping between time intervals and dates. Early calendars were simple means of mapping between fixed dates, such as the summer and winter solstices, and seasons. Modern calendars map between time intervals and more complex dates. Cocoa understands a number of different calendars, including the Gregorian, Buddhist, Chinese, and Islamic calendars.
If you create an NSCalendar with +autoupdatingCurrentCalendar, then the calendar will automatically update depending on the currently specified locale. This means you should avoid caching values returned from the calendar, since they may change at arbitrary points in the future.
A NSCalendar allows you to turn a NSDate into an NSDateComponents object. This object is roughly equivalent to the POSIX struct tm. It allows the year, month, day, day of the week, and so on to be extracted, based on the interpretation of an NSDate in a specified calendar.
In general, you should always store dates in NSDate objects and only convert them to a given calendar when you want to display them in the user interface. This is one of the reasons why using NSCalendarDate is discouraged—as an NSDate subclass it is very tempting to use it for long-term storage—the other being that it is limited to the Gregorian calendar, making it unsuitable for use in Japan, China, and much of the rest of the world outside the Americas and Europe.