- 1.1 An Increasing Number of Languages
- 1.2 Software Languages
- 1.3 The Changing Nature of Software Languages
- 1.4 The Complexity Crisis
- 1.5 What We Can Learn From ...
- 1.6 Summary
1.5 What We Can Learn From ...
Luckily, a large amount of existing knowledge is helpful in the creation of software languages. In this book, I draw mainly from three sources: natural-language studies, traditional computer language theories, and graph grammars. And although this field does not have a great influence in this book, I also include a short description of the work from the visual languages community, because it is rather similar to traditional computer language theories but aimed at graphical, or visual, languages.
1.5.1 Natural-Language Studies
When someone who is not a computer scientist hears the word language, the first thing that comes to mind is a natural language, such as the person's mother tongue. Although software languages are artificially created and natural languages are not, we can still learn a lot from the studies of natural languages. (See Background on Natural-Language Studies.) Of course, we cannot use every part of it. For example, software languages do not have a sound structure, so there is no need to study phonology.1 However, all other fields of study are as relevant to software languages as they are to natural languages. It must be noted that these fields of study are, of course, interrelated. One cannot reasonably study one aspect of a language, such as morphology, without being at least aware of the other aspects, such as syntaxis.
It is interesting to see that with the advent of mobile phones, the phenomenon of multiple syntaxes is also emerging in natural language. A second morphological and syntactical structuring of natural language expressions has developed; for example, everybody understands the next two phrases as being the same: 4u and for you.
1.5.2 Traditional Language Theory
In the late 1950s, the fundamentals of current-day theory for textual software languages were laid down by such people as Chomsky [1965] and Greibach [1965, 1969]. In the 1970s and 1980s, these fundamentals were used by, among others, Aho and Ullman to develop the theory of compiler construction [Hopcroft and Ullman 1979, Aho et al. 1985]. In this research, grammars were used to specify textual languages.
The original motivation for the study of grammars was the description of natural languages. While linguists were studying certain types of grammars, computer scientists began to describe programming languages by using a special notation for grammars: Backus-Naur Form (BNF). This field has brought us a lot of knowledge about compiler technology. For more information, see Background on Grammars (p. 48), Background on Compiler Technology (p. 96), and Background on Backus-Naur Format (p. 116).
1.5.3 Graph Theory
Graphs have long been known as mathematical constructs consisting of objects—called nodes or vertices—and links—called edges or arcs—between them. Over the ages, mathematicians have built up a large body of theory about graphs: for instance, algorithms to traverse all nodes, connectivity theorems, isomorphisms between two graphs. All this has been brought to use in graph grammars.
Graphs grammars specify languages, usually graphical (visual) languages, by a set of rules that describe how an existing graph can be changed and added to. Most graph grammars start with an empty graph, and the rules specify how the expressions in your language can be generated. For more information, see Background on Graphs and Trees (p. 50).
1.5.4 The Visual-Languages Community
Visual-language design is our last background area. Although it started later than textual languages, the advance of visual languages has also been investigated for a few decades now. This area can be roughly divided into two parts: one in which grammars are based on graphs and one that does not use graph grammars.
In this book, I do not use visual language, the term commonly used in this community. Instead, I use graphical language, a phrase I find more appropriate because textual languages are also visual, while at the same time most nontextual languages are denoted by some sort of nodes with edges or connectors between them: in other words, denoted like a graph.
Non-Graph-Grammar Based
The non-graph-grammar-based research is concerned mostly with scanning and parsing visual-language expressions: in other words, diagrams. Several formalisms for denoting the rules associated with scanning and parsing have been proposed: Relational Grammars [Weitzman and Wittenburg 1993], Constrained Set Grammars [Marriott 1995], and (Extended) Positional Grammars [Costagliola and Polese 2000]. For a more extensive overview, see Marriott and Meyer [1997] or Costagliola et al. [2004].
Virtually all work in this area focuses on the recognition of basic graphical symbols and grouping them into more meaningful elements, although some researchers stress the fact that more attention needs to be paid to language concepts, which in this field is often called semantics. Often, these semantics are added in the form of attributes to the graphical symbols. This is very different from the metamodeling point of view. For instance, although UML is a visual language, its metamodel does not contain the notions of box, line, and arrow.
Common to all formalisms is the use of an alphabet of graphical symbols. These graphical symbols hold information on how to materialize the symbol to our senses, such as rendering info, position, color, and border style. Next to the alphabet, various types of spatial relationships are commonly defined, based on position (left to, above), attaching points (end point of a line touches corner of rhombus), or attaching areas by using the coordinates of the symbol's bounding box or perimeter (surrounds, overlaps). Both the alphabet and the spatial relationships are used to state the grammar rules that define groupings of graphical symbols.
Although the work in this area does not use graph grammars, the notion of graphs is used. The spatial relationships can be represented in the form of a graph, a so-called spatial-relationship graph [Bardohl et al. 1999], in which the graphical symbols are the nodes, and an edge between two nodes represents a spatial relationship between the two symbols.
Graph-Grammar Based
The graph-grammar community has also paid attention to visual-language design, most likely because the graph formalism itself is a visual one. In this field, more attention is paid to language concepts. In fact, the graph-grammar handbook states that the graphical symbols needed to materialize the language concepts to the user are attached as attribute values to the graph nodes representing the language concepts, an approach that is certainly different from the non-graph-grammar-based field.
Another distinctive difference is that the non-graph-grammar-based approach uses grammar rules with only one nonterminal on the left-hand side: that is, grammar rules in a context-free format. Graph grammars, on the other hand, may contain rules that have a graph as the left-hand side: that is, grammar rules in a context-sensitive format.
A large number of tools are able to create development environments for visual languages. These include, among others, DiaGen [Minas and Viehstaedt 1995], GenGed [Bardohl 1999], AToM3 [de Lara and Vangheluwe 2002], and VL-Eli [Kastens and Schmidt 2002].