Coding Style Guidelines
Consistency is absolutely a prerequisite for maximizing maintainability and reusability. These general guidelines for coding style can form the basis of a set of standards that will help ensure that all developers in a projector, better, in all projects across an organizationwrite code consistently.
- Use well-formed HTML.
- Pick good names and ID values.
- Indent consistently.
- Limit line length.
- Standardize character case.
- Use comments judiciously.
Use Well-formed HTML
Although Web browsers are generally forgiving and can ignore many mistakes, rendering most HTML as the document author intended, it is still a good idea to use well-formed HTML code, for a number of reasons.
Well-formed markup code is a concept that has gained importance with increased implementation of XML. While browsers did not, in general, enforce HTML language rules very closely, XML parsers do. Code is considered well formed when it is structured according to the rules for XML 1.0. These rules relate to character case, tags, nesting, and attribute values.
In general, when most browsers encounter an unrecognized or extraneous tag, they ignore them. However, different browsers might deliver results in differentand unpredictableways. In addition, future versions of browsers might adhere to standards more closely than do current versions. Finally, code that includes such elements can be harder to read and understand, making maintenance more difficult.
Lowercase namesTo be well-formed, element and attribute names must be in all lower case. In versions through 4.01, HTML is not case-sensitive. However, XML is case-sensitive, and it follows that the XHTML 1.0 recommendation is also case-sensitive. So, to ensure that code keeps working and to maximize reusability, this must be planned for.
Closing tagsAll nonempty elements must have corresponding closing tags. Empty elementsthose previously signified with a single tag, such as <hr> and <br>must be followed immediately by a corresponding closing tag, or the tag must end with "/". For example, <hr></hr> and <hr/> are both examples of well-formed code.
Nested elementsAll nested attributes must be properly nestedfor example:
<center><b>Some text</b></center>
Note that the <b> tag and its corresponding closing tag, </b>, are both nested inside the <center> and </center> tags.
If elements overlap, then they are not properly nested, as illustrated in the following code:
<center><b>Some text</center></b>
While many browsers have accepted overlapping elements and given the expected results, they have always been, strictly speaking, illegal in HTML, and future versions of browsers might not support them.
Attribute valuesAttribute values, even numeric attributes should be quotedfor example:
<input name="txtName" type="text" size="1">
Code validation: Another step toward improving HTML code is to validate it against a formal published grammar and to declare this validation at the beginning of the HTML document. For example, the following line declares validation against the public HTML 3.2 Final grammar:
<!doctype html public "-//W3C//DTD HTML 3.2 Final//EN">
A list of formal published grammars is available from the W3C at http://validator.w3.org/sgml-lib/catalog. The W3C also has a public HTML validation service at http://validator.w3.org/.
Pick Good Names and ID Values
Use a consistent scheme for assigning the value of name and ID properties. They should be as short as reasonably possible, but without giving up descriptive power. Also, use mixed-case property values to help readability (see Listing 2). In this code snippet, the check box names express not only what the purpose of the element is, but also information about the element's type. The code also illustrates the use of mixed case to help readability.
Listing 2: Example of Good Element Names
<b>Member? </b><input type="Checkbox" name="cbIsMember"><br> <b>Admin? </b><input type="Checkbox" name="cbIsAdministrator"><br> <b>Owner? </b><input type="Checkbox" name="cbIsOwner"><br>
HTML primarily refers to elements by their name property, while DHTML and client-side scripts use the ID property. Although DHTML documents IDs must be unique in the document, in general, there is no reason not to use the same value for an element's name and ID properties. Using the same value for these properties can reduce confusion that might arise when mixing HTML and client-side scripting.
Indent Consistently
Use indentation consistently to enhance the readability of the code. When elements carry over more than one line of code, indent the contents of elements between the start tag and the end tag. This will make it easy to see where the element begins and ends. Also, use indentation to align code at attribute names (see Listing 3).
It is a good idea to use no more than two to four spaces for each level in indentation, so as not to use up all the available line length in indentation. If possible, set up the development tool to convert tabs to spaces so that the indentation will be the same when the source is viewed in different editors or as printed output.
Listing 3: Indent Code Consistently
<table width="80%"> <tr> <td> <form name="frmLogin" action="login.asp"> <b>Login: </b><input name="txtLogin" type="text" size="25"><br> <b>Password:</b><input name="txtPwd" type="password" size="25"> <input type="Submit" value="Login"> </form> </td> </tr> <tr> <td align="center" valign="top"> <p>To log into the system, enter your user name and password in the text boxes. Then click the "Login" button. </p> </td> </tr> </table>
Limit Line Length
Break up lines when they run too long. It is much easier to read and understand code when you can see the entire line at once. When lines of code are so long that the reader must scroll right and left to read them, it requires much more cognitive effort to understand what the code is doing. Alternatively, in some applications, long lines might wrap to the next line at the nearest word break. In either case, source code is much easier to read and understand if the developer takes explicit control of line length.
HTML is not sensitive to line breaks, so the developer can break lines at will between keywords for readability. For example, Listing 4 illustrates a code snippet in which two elements have word-wrapped to the next line because they were two long for the editor window.
Listing 4: HTML Source Code with Uncontrolled Line Breaks
<td valign="Center"> <input type="Text" Length="45" name="txtName" language= "JavaScript" onclick="return NameValid();"><br> <input type="Text" Length="35" name="txtAddress" language="JavaScript" onclick="return AddrValid();"> </td>
Compare this with Listing 5, where the developer took explicit control of line length. Here the code is much easier to read because the developer used line breaks and indenting to visually organize the source code.
Listing 5: HTML Source Code with Explicit Line Breaks
<td valign="Center"> <input type="Text" Length="45" name="txtName" language="JavaScript" onclick="return NameValid();"><br> <input type="Text" Length="35" name="txtAddress" language="JavaScript" onclick="return AddrValid();"> </td>
Keep the limitations of printed output in mind as well. Lines longer than 80 characters will often wrap in printed output without consideration for word breaks, making source code very difficult to read.
Standardize Character Case
Source code is easier to read if the developer has applied a consistent set of rules for the use of character casefor example, the use of lower case exclusively for HTML tags. When scanning source code, the reader can unconsciously apply a visual filter, focusing attention on the HTML keywords.
The approach taken in code that appears in this article is to use all lowercase letters for HTML tags and the names of its attributes, while using mixed case and a modified form of Hungarian Notation for some attribute values (see the sidebar entitled "Hungarian Notation").
Hungarian Notation
Hungarian Notation is a convention for naming identifiers that adds a prefix to the name to provide information about the type and scope of the identifier. Dr. Charles Simonyi, a Microsoft Chief Architect at the time, introduced Hungarian Notation in the early 1980's. Long an internal Microsoft standard, variants of the convention have been widely adopted outside of Microsoft as well.
As an example of a simplified Hungarian Notation scheme, variables that contain a string could be prefixed with the character s, and a variable with global scope could be indicated with a g prefix. In this case, then, the variables sTemp and gsName in source code would be immediately identifiable as string variables with local and global scope, respectively.
In general, HTML is not a typed language, and Hungarian Notation plays a more important role in other types of Web development. However, in some cases it can add to readability. For example, the names or IDs of form elements are likely candidates for a modified form of Hungarian Notation. The prefix "btn" or "cmd" might be used for an input button. Text boxes might be prefixed with "txt," and check boxes might be prefixed with "chk" or "cb."
Use Comments Judiciously
Good comments can be invaluable for understanding and maintaining code. However, the unique nature of HTML introduces a trade-off between the value of thorough comments and the efficiency of the Web application.
The Web server reads in the HTML code and sends it as a stream of text over the network to the browser. Only after arriving at the client does the browser parse and interpret the HTML code, displaying the visible elements and ignoring the comments. The obvious implication is that the comments add nothing to the document as the browser displays it, yet they add to the processing overhead on both the server and client computers, and they increase the amount of data transferred. With almost 50 percent comments, Listing 6 illustrates what is probably excessively commented code.
Listing 6: Heavily Commented HTML Code
<!-- Form for input of security groups --> <form name="Form1" action="https://http://www.mydomain.com/input.asp"> <!-- Is user a member? --> <b>Member? </b><input type="Checkbox" name="CB1"><br> <!-- Is user an administrator? --> <b>Admin? </b><input type="Checkbox" name="CB2"><br> <!-- Is user an owner? --> <b>Owner? </b><input type="Checkbox" name="CB3"><br> <!-- Send form contents to input.asp --> <input type="Submit" Value="Submit"> <!-- Clear the current form contents --> <input type="Reset" Value="Clear"> </form>
The trick is to find an appropriate level of commenting that balances these two issues. It is a good idea to comment the major logical flow and document sections to help readers quickly gain an overview of the code. Also comment dependencies and assumptions. Consistently following the other design and coding guidelines as suggested in this articleespecially the ones related to naming and metadatawill help create self-documenting code.
Listing 7 illustrates how fewer comment lines and more descriptive element names can combine to provide effective documentation with a lot less overhead.
Listing 7: Lightly Commented HTML Code
<!-- Form for input of security groups --> <form name="frmSecurityGroups" action="https://http://www.mydomain.com/input.asp"> <b>Member? </b><input type="Checkbox" name="cbIsMember"><br> <b>Admin? </b><input type="Checkbox" name="cbIsAdministrator"><br> <b>Owner? </b><input type="Checkbox" name="cbIsOwner"><br> </form>
Conclusion
This article concludes our introduction to HTML with a presentation of some valuable guidelines for working with HTML documents and code that will help maximize their maintainability and reusability. Of central importance is the need to understand HTML and its role in Web applications, to plan ahead for maintainable and reusable code, and to adopt a consistent policy on coding style.
The next article in this series kicks off our exploration of Cascading Style Sheets, a method for specifying and encapsulating display rules that can be used to modify the appearance and behavior of the Web page elements.