An Introduction to Web Publishing in HTML
- What HTML Is—And What It Isn't
- The Current Standard: XHTML 1.0
- What HTML Files Look Like
- Using Cascading Style Sheets
- Programs to Help You Write HTML
- Summary
- Workshop
After finishing up the discussions about the World Wide Web and getting organized, with a large amount of text to read and concepts to digest, you're probably wondering when you're actually going to get to write a Web page. That is, after all, why you bought the book. Wait no longer! Today, you get to create your very first (albeit brief) Web page, learn about HTML (the language for writing Web pages), and learn about the following:
What HTML is and why you have to use it
What you can and cannot do when you design HTML pages
What HTML tags are and how to use them
How you can use style sheets to control look and feel of your pages
What HTML IsAnd What It Isn't
Take note of just one more thing before you dive into actually writing Web pages. You should know what HTML is, what it can do, and most importantly what it can't do.
HTML stands for Hypertext Markup Language. HTML is based on the Standard Generalized Markup Language (SGML), a much larger document-processing system. To write HTML pages, you won't need to know a whole lot about SGML. However, knowing that one of the main features of SGML is that it describes the general structure of the content inside documentsrather than its actual appearance on the page or onscreendoes help. This concept might be a bit foreign to you if you're used to working with WYSIWYG (What You See Is What You Get) editors, so let's go over the information carefully.
HTML Describes the Structure of a Page
HTML, by virtue of its SGML heritage, is a language for describing the structure of a document, not its actual presentation. The idea here is that most documents have common elementsfor example, titles, paragraphs, and lists. Before you start writing, therefore, you can identify and define the set of elements in that document and give them appropriate names (see Figure 3.1).
Figure 3.1 Document elements.
If you've worked with word processing programs that use style sheets (such as Microsoft Word) or paragraph catalogs (such as FrameMaker), you've done something similar; each section of text conforms to one of a set of styles that are predefined before you start working.
HTML defines a set of common styles for Web pages: headings, paragraphs, lists, and tables. It also defines character styles such as boldface and code examples. These styles are indicated inside HTML documents using tags. Each tag has a specific name and is set off from the content of the document using a notation that I'll get into a bit later.
HTML Does Not Describe Page Layout
When you're working with a word processor or page layout program, styles are not just named elements of a pagethey also include formatting information such as the font size and style, indentation, underlining, and so on. So, when you write some text that's supposed to be a heading, you can apply the Heading style to it, and the program automatically formats that paragraph for you in the correct style.
HTML doesn't go this far. For the most part, HTML doesn't say anything about how a page looks when it's viewed. HTML tags just indicate that an element is a heading or a list; they say nothing about how that heading or list is to be formatted. So, as with the magazine example and the layout person who formats your article, the layout person's job is to decide how big the heading should be and what font it should be in. The only thing you have to worry about is marking which section is supposed to be a heading.
NOTE
Although HTML doesn't say much about how a page looks when it's viewed, cascading style sheets (CSS) enable you to apply advanced formatting to HTML tags. Many changes in HTML 4.0 favor the use of CSS tags. And XHTML, which is the current version of HTML, eliminates almost all tags that are associated with formatting in favor of Cascading Style Sheets. I'll talk about both XHTML and CSS later today.
Web browsers, in addition to providing the networking functions to retrieve pages from the Web, double as HTML formatters. When you read an HTML page into a browser such as Netscape or Internet Explorer, the browser interprets, or parses, the HTML tags and formats the text and images on the screen. The browser has mappings between the names of page elements and actual styles on the screen; for example, headings might be in a larger font than the text on the rest of the page. The browser also wraps all the text so that it fits into the current width of the window.
Different browsers running on diverse platforms might have various style mappings for each page element. Some browsers might use different font styles than others. For example, a browser on a desktop computer might display italics as italics, whereas a handheld device or mobile phone might use reverse text or underlining on systems that don't have italic fonts. Or it might put a heading in all capital letters instead of a larger font.
What this means to you as a Web page designer is that the pages you create with HTML might look radically different from system to system and from browser to browser. The actual information and links inside those pages are still there, but the onscreen appearance changes. You can design a Web page so that it looks perfect on your computer system, but when someone else reads it on a different system, it might look entirely different (and it might very well be entirely unreadable).
NOTE
In practice, most HTML tags are rendered in a fairly standard manner, on desktop computers at least. When the earliest browsers were written, somebody decided that links would be underlined and blue, visited links would be purple, and emphasized text would appear in italics. They also made similar decisions about every other tag. Since then, pretty much every browser maker has followed that convention to a greater or lesser degree. These conventions blurred the line separating structure from presentation, but in truth it still exists, even if it's not obvious.
Why It Works This Way
If you're used to writing and designing documents that will wind up printed on paper, this concept might seem almost perverse. No control over the layout of a page? The whole design can vary depending on where the page is viewed? This is awful! Why on earth would a system work like this?
Remember in Day 1, "The World of the World Wide Web," when I mentioned that one of the cool things about the Web is that it's cross-platform and that Web pages can be viewed on any computer system, on any size screen, with any graphics display? If the final goal of Web publishing is for your pages to be readable by anyone in the world, you can't count on your readers having the same computer systems, the same size screens, the same number of colors, or the same fonts that you have. The Web takes into account all these differences and enables all browsers and all computer systems to be on equal ground.
The Web, as a design medium, is not a new form of paper. The Web is an entirely different medium, with its own constraints and goals that are very different from working with paper. The most important rules of Web page design, as I'll keep harping on throughout this book, are the following:
Do design your pages so that they work in most browsers.
Do focus on clear, well-structured content that's easy to read and understand.
Don't design your pages based on what they look like on your computer system and on your browser.
Throughout this book, I'll show you examples of HTML code and what they look like when displayed. In examples in which browsers display code very differently, I'll give you a comparison of how a snippet of code looks in two very different browsers. Through these examples, you'll get an idea for how different the same page can look from browser to browser.
NOTE
Although this rule of designing by structure and not by appearance is the way to produce good HTML, when you surf the Web, you might be surprised that the vast majority of Web sites seem to have been designed with appearance in mindusually appearance in a particular browser such as Microsoft Internet Explorer. Don't be swayed by these designs. If you stick to the rules I suggest, in the end, your Web pages and Web sites will be even more successful simply because more people can easily read and use them.
HTML Is a Markup Language
HTML is a markup language. Writing in a markup language means that you start with the text of your page and add special tags around words and paragraphs. The tags indicate the different parts of the page and produce different effects in the browser. You'll learn more about tags and how they're used in the next section.
HTML has a defined set of tags you can use. You can't make up your own tags to create new appearances or features. And just to make sure that things are really confusing, various browsers support different sets of tags. To further explain this, take a brief look at the history of HTML.
A Brief History of HTML Tags
The base set of HTML tags, the lowest common denominator, is referred to as HTML 2.0. HTML 2.0 is the old standard for HTML (a written specification for it is developed and maintained by the W3C) and the set of tags that all browsers must support. In the next few days, you'll primarily learn to use tags that were first introduced in HTML 2.0.
The HTML 3.2 specification was developed in early 1996. Several software vendors, including IBM, Microsoft, Netscape Communications Corporation, Novell, SoftQuad, Spyglass, and Sun Microsystems, joined with the W3C to develop this specification. Some of the primary additions to HTML 3.2 included features such as tables, applets, and text flow around images. HTML 3.2 also provided full backward-compatibility with the existing HTML 2.0 standard.
NOTE
The enhancements introduced in HTML 3.2 are covered later in this book. You'll learn more about tables in Day 8, "Tables." Day 12, "Multimedia: Adding Sounds, Videos, and More," tells you how to use Java applets.
HTML 4.0, first introduced in 1997, incorporated many new features that gave you greater control than HTML 2.0 and 3.2 in how you designed your pages. Like HTML 2.0 and 3.2, the W3C maintains the HTML 4.0 standard.
Framesets (originally introduced in Netscape 2.0) and floating frames (originally introduced in Internet Explorer 3.0) became an official part of the HTML 4.0 specification. Framesets are discussed in more detail in Day 15, "Working with Frames and Linked Windows." We also see additional improvements to table formatting and rendering. By far, however, the most important change in HTML 4.0 was its increased integration with style sheets.
NOTE
If you're interested in how HTML development is working and just exactly what's going on at the W3C, check out the pages for HTML at the Consortium's site at http://www.w3.org/pub/WWW/MarkUp/.
In addition to the tags defined by the various levels of HTML, individual browser companies also implement browser-specific extensions to HTML. Netscape and Microsoft are particularly guilty of creating extensions, and they offer many new features unique to their browsers.
Confused yet? You're not alone. Even Web designers with years of experience and hundreds of pages under their belts have to struggle with the problem of which set of tags to choose to strike a balance between wide support for a design (using HTML 3.2- and 2.0-level tags) or having more flexibility in layout but less consistency across browsers (HTML 4.0 or specific browser extensions). Keeping track of all this information can be really confusing. Throughout this book, as I introduce each tag, I'll let you know which version of HTML the tag belongs to, how widely supported it is, and how to use it to best effect in a wide variety of browsers.