- Introduction
- Terminology
- Concatenation
- Metacharacters
- Precedence
- Characters
- Character Class Expressions
- Constraining Simple Content
- Examples
- More Information
- About the Author
Constraining Simple Content
As this document demonstrates, writing regular expressions is not that difficult. Although writing a good regular expression is not always as easy as it might first appear. Similarly, constraining simple content with a regular expression is nominally quite easy. There are some places in an XML Schema, however, where the results are not quite so obvious.
As mentioned in the beginning of this article, the pattern constraining facet restricts a simple type, such as in the following simple type:
<xsd:simpleType name="identifier"> <xsd:restriction base="xsd:token"> <xsd:pattern value="[A-Z][A-Za-z0-9_]*"/> </xsd:restriction> </xsd:simpleType>
A restriction such as the one above takes care of most XML Schema needs. The restrictions start to get confusing when there are multiple patterns. The next example can look like an identifier or a decimal number:
<xsd:simpleType name="identifierOrDecimal"> <xsd:restriction base="xsd:token"> <xsd:pattern value="[A-Z][A-Za-z0-9_]*"/> <xsd:pattern value="\d+"/> </xsd:restriction> </xsd:simpleType>
Note that a logical "or" of the patterns provides validation. That is, the XML instance is valid when value of the simple type is a programming identifier (such as 'SomeName_99') or a decimal digit (such as '387').
When a simple type with a pattern restriction derives from another simple type with a pattern restriction, the result is a logical "and" of the patterns. The pattern in the next exampleby itselfspecifies two or more Latin alphabetic characters. However, since this is a restriction of identifier, a value corresponding to identifier3 must meet both the two character minimum and the initial uppercase character requirement:
<xsd:simpleType name="identifier3"> <xsd:restriction base="indentifier"> <xsd:pattern value="[a-zA-Z]{2,}"/> </xsd:restriction> </xsd:simpleType>
Finally, The most difficult set of patterns to validate are those that combine both the logical "or" of regular expressions (multiple patterns restricting multiple simple types) with the logical "and" of those patterns due to one simple type restricting another.