Definitive XML Schema: Simple Types
8.1. Simple type varieties
There are three varieties of simple types: atomic types, list types, and union types.
- Atomic types have values that are indivisible, such as 10 or large.
- List types have values that are whitespace-separated lists of atomic values, such as <availableSizes>10 large 2</availableSizes>.
- Union types may have values that are either atomic values or list values. What differentiates them is that the set of valid values, or “value space,” for the type is the union of the value spaces of two or more other simple types. For example, to represent a dress size, you may define a union type that allows a value to be either an integer from 2 through 18, or one of the string values small, medium, or large.
List and union types are covered in Chapter 10.
8.1.1. Design hint: How much should I break down my data values?
Data values should be broken down to the most atomic level possible. This allows them to be processed in a variety of ways for different uses, such as display, mathematical operations, and validation. It is much easier to concatenate two data values back together than it is to split them apart. In addition, more granular data is easier to validate.
It is a fairly common practice to put a data value and its units in the same element, for example <length>3cm</length>. However, the preferred approach is to have a separate data value, preferably an attribute, for the units, for example <length units="cm">3</length>.
Using a single concatenated value is limiting because:
- It is extremely cumbersome to validate. You have to apply a complicated pattern that would need to change every time a unit type is added.
- You cannot perform comparisons, conversions, or mathematical operations on the data without splitting it apart.
- If you want to display the data item differently (for example, as “3 centimeters” or “3 cm” or just “3”, you have to split it apart. This complicates the stylesheets and applications that process instance documents.
It is possible to go too far, though. For example, you may break a date down as follows:
<orderDate> <year>2001</year> <month>06</month> <day>15</day> </orderDate>
This is probably overkill unless you have a special need to process these items separately.