SGML Roots
Both XML and HTML are derived from the Standard Generic Markup Language (SGML), created 20 years ago for the purpose of processing documents using a computer. SGML tags embedded within the document directed the processing of the document; defined its format, font family, and so on.
At a certain point in its evolution, presumably because you could quickly see the inefficiency of recoding a document every time you wanted to change from Letter to A4 paper size for example, or to change the type font from Times to Helvetica, some of the formatting instructions were generalized and abstracted into a separate document. This allowed the formatting instructions to be modified independently of the document content, and more quickly and easily.
The concept of separating out the document formatting instructions from the content was implemented in a style sheet. Markup tags embedded within a document could be made into a kind of variable loaded or initialized from an external source. The program that processed the document, therefore, had two inputsthe main document with the embedded tags and the associated style sheet or content model. Therefore, you could change many of the physical characteristics of the output document, based on changing the associated style sheet. These style sheets evolved into document type definitions (DTD) and finally into XML Schemas.
As document processing gradually became the province of PCs and workstations, the concepts behind SGML re-emerged first in HTML and then in XML. XML was originally developed to overcome the limitations of HTML, in particular to better support dynamic content creation and management. HTML is okay for static content, but as the Web evolves toward more of a software-enabled platform, in which data has associated meaning, content needs to be generated and digested dynamicallywhich capability XML supports.
HTML has a fixed set of tags, whereas XML allows any number of tags to be defined. With XML, the problem of associated meaning for the tags became much greater because there is no finite set of predefined tags. XML tags are all made up for the particular application.
Using XML, you can define any number of elements that associate meaning with data; that is, that describe the data and what to do with it using one or more elements created for the purpose. For example:
<Company> <CompanyName region="US"> Skateboots Manufacturing </CompanyName> <address> <line> 200 High Street </line> <line> Springfield, MA 55555 </line> <Country> USA </Country> </address> <phone> +1 781 555 5000 </phone> </Company>
As shown in the example, XML not only allows you to define elements that describe the data, but it also allows you to define structures that group related data. It's easy to imagine a search for elements that match certain criteria, such as <Country> and <phone>, for a given company. Or search for all <company> elements, and return a list of those entities identifying themselves as companies on the Web. Furthermore, as mentioned earlier, XML allows associated schemas to separately validate the data and describe other attributes and qualities of the data, which is something completely impossible using HTML.
This capability of XML has led to its prominence and widespread adoption as a data formatting and structuring language. Thus, the XML community itself is divided between those focused on document markup and those focused on data definition. With XML's capability to define any number of tags and to define data types and structures, any application or program data can be transformed, or mapped, into XML documents.