- Information Is Interesting Stuff
- Information and Structure Are Inseparable
- Formal Languages Are Easier to Compute Than Natural Languages
- Generic Markup Makes Natural Languages More Formal
- A Brief History of the Topic Maps Paradigm
- Data and Metadata: The Resource-Centric View
- Subjects and Data: The Subject-Centric View
- Understanding Sophisticated Markup Vocabularies
- The Topic Maps Attitude
- Summary
Data and Metadata: The Resource-Centric View
Metadata is not only "about data"it is also always data, itself. One person's data is another person's metadata. There is, in general, no difference between data and metadata; it's all a matter of perspective.
It is normal to think of metadata as being somehow "in orbit" around the data about which the metadata provides information. The existence of a metadata Web site that provides information about data Web sites affects global knowledge interchange in two ways.
When users are at the metadata Web site, their attention can be directed at one or more data Web sites, and users can know the reasons why.
When users are at the data Web site, they may derive more useful information if they also know about the availability of the metadata Web site and its reasons for expressing metadata about that data.
The idea that metadata can be externally and arbitrarily associated with data is a powerful one, but, by itself, this attractive and simple idea leads nowhere. When a single data Web site is associated with (that is, pointed at by) millions of metadata Web sites, the result can easily be "infoglut"such a tidal wave of information that, as a practical matter, its overall utility is zero. There needs to be a way to use computers to determine the relevance of all this information to the user's specific situation and to show the relevant information while hiding the rest.
It is ironic that the recent huge improvement that information technology has brought to the accessibility of informationsuch as providing instant hyperlink traversal to any Web site, anywhere in the worldhas itself made more and more information inaccessible due to the sheer quantity of it. The dream of global knowledge interchange recedes, even as it becomes real. Our power to filter out unwanted information must keep pace with the quantity of unwanted information. It's a race that we currently appear to be losing.
Although it may sound strange, it is imperative that we develop technical, economic, and business models that will allow businesses to make money by hiding informationby providing information that can be used to hide other information. It's also imperative that these models absolutely support and cherish diversity. This is because particular information filtration problems may, as a purely practical matter, require hiding information that emanates from a variety of sources and that reflects a variety of worldviews. These diverse sources may not even know about each other, much less deliberately design their products in such a way as to make them "federable" (that is, usable in concert) with one another. This is what the topic maps paradigm is all about: making diverse metadata sources more or less automatically federable.
One of the things that a metadata Web site may usefully provide is information as to which other Web sites have information on specific topics. Such metadata Web sites are often (and misleadingly) called search engines. But search engines do not usually provide topically organized information. Yahoo! is one notable exception, but it works only for a small number of topics and only in ways that are consistent with Yahoo!'s singular and necessarily self-serving view of the wide world of information. Instead, unlike Yahoo!'s topically oriented features, most search engines merely provide information about which other Web sites provide information that contains certain strings of characters. A user interested in information on a particular topic must be clever enough and lucky enough to be able to sneak up on relevant information on the basis of strings that he or she hopes will be found in such informationand not found in too much other information. The user must guess the language of the desired Web sites' information well enough to imagine which strings are relevant.
When a user attempts to find information, the user usually has a particular topic in mind about which he or she wishes to know more. The user is not interested in Web sites or specific information resources, except insofar as they offer information that is specifically relevant to that topic. The first order of business, then, really should be to allow the user and the computer to agree about exactly what topic the user wants to research. Once the computer has established the exact topic, the computer's task should be to hide all the information about the topic that, for one reason or another, the user should not be bothered with and to render only the remaining information. This kind of user interaction with the Web is supportable if topic maps are widely used because the topic maps paradigm explicitly permits and supports business models based on the development and exploitation of lists of topics that have names and occurrences in multiple languages for use in multiple contexts and that can themselves be found on the basis of their relationships with many other findable topics.
Still, there is an unbounded number of topics, there is an awful lot of information out there, and the sheer quantity is growing at a phenomenal rate. Many individual pieces of information can often be regarded as being relevant to many different topics simultaneously. Nobody will ever categorize everything, but many people will categorize some of it many times over, often in different and even conflicting ways.13 The topic maps paradigm explicitly permits and supports business models that are based on the development and exploitation of categorizations of information resources. Every category can be represented as a topic. Similarly, every system of categorization can also be represented as a topic. In fact, there is nothing that can't be represented as a topic. The exploitation of preexisting categorizations is not only the key to hiding unwanted information; it's also the key to finding it in the first place, unless it happens to contain some string that you are lucky enough to guess and that doesn't also appear in more than a few other resources.
Metametadata, Metametametadata . . .
One way to federate metadata is to create metadata about the metadata. Then, of course, we may need to federate that metametadata with other metametadata, using metametametadata. The absurdity of this approach is obvious: there is little opportunity for benefit to be realized from standardization in a model that requires infinitely recursive metalevels. There must be a better way. And there is: the topic maps paradigm moves in the other direction by recognizing the existence of a single, implicit, underlying layer. It's the same underlying universe that is known in philosophical circles as Platonic forms14 (so named for Plato, the ancient Greek philosopher mentioned earlier).