Simple Types of XML Schema
- Simple type varieties
- Simple type definitions
- Simple type restrictions
- Facets
- Preventing simple type derivation
Both element and attribute declarations can use simple types to describe the data content of the components. This chapter introduces simple types, and explains how to define your own atomic simple types for use in your schemas.
9.1 Simple type varieties
There are three varieties of simple type: atomic types, list types, and union types.
Atomic types have values that are indivisible, such as 10 and large.
List types have values that are indivisible, such as 10 and . types have values that are whitespace-separated lists of atomic values, such as <availableSizes>10 large 2</availableSizes>.
Union types may have values that are either atomic values or list values. What differentiates them is that the set of valid values, or "value space," for the type is the union of the value spaces of two or more other simple types. For example, to represent a dress size, you may define a union type that allows a value to be either an integer from 2 through 18, or one of the string values small, medium, or large.
List and union types are covered in Chapter 11, "Union and list types."
9.1.1 Design hint: How much should I break down my data values?
Data values should be broken down to the most atomic level possible. This allows them to be processed in a variety of ways for different uses, such as display, mathematical operations, and validation. It is much easier to concatenate two data values back together than it is to split them apart. In addition, more granular data is much easier to validate.
It is a fairly common practice to put a data value and its units in the same element, for example <length>3cm</length>. However, the preferred approach is to have a separate data value, preferably an attribute, for the units, for example <length units="cm">3</length>.
Using a single concatenated value is limiting because:
It is extremely cumbersome to validate. You have to apply a complicated pattern that would need to change every time a unit type is added.
You cannot perform comparisons, conversions, or mathematical operations on the data without splitting it apart.
If you want to display the data item differently (for example, as "3 centimeters" or "3 cm" or just "3", you have to split it apart. This complicates the stylesheets and applications that process the instance document.
It is possible to go too far, though. For example, you may break a date down as follows:
<orderDate> <year>2001</year> <month>06</month> <day>15</day> </orderDate>
This is probably an overkill unless you have a special need to process these items separately.