Cleaning Your Web Pages with HTML Tidy
Introduction
After many years and the efforts of countless evangelists, web standards are finally being taken seriously by people who build web pages, or any other kind of HTML document. But badly-formed HTMLthe kind that doesn't conform to the standards laid down by the World Wide Web Consortiumis still a problem. You've probably seen what I'm talking about all over the web: closing tags that are MIA, proprietary extensions like <font> and <center>, and other constructs that break in all but one or two browsers.
So how do you get around the problem of bad HTML? You could use one of the many applications or online services that validate HTML syntax. More often than not, though, these applications and services are good but not great. Most will check HTML but not correct it. If you have a lot of files, you must check each file and make corrections by hand. This takes a lot of time and effort.
Or you could turn to HTML Tidy.
HTML Tidy (hereafter just Tidy) is free software, weighing in at under 500KB, and it doesn't just check HTML files; it fixes the problems it findsand does a whole lot more. Tidy is an anachronism in the world of the graphical user interface. It's a command-line application, meaning that you have to type a string of commands to get Tidy to run. It sounds like an old-fashioned way of doing things; in fact, it's anything but. The command-line interface gives Tidy a great deal of flexibility.
This tutorial teaches you the basics of working with Tidy. I can't cover all of the aspects of Tidy in this article, but I can give you enough information to set you on the road to mastering the software. You'll learn how to run the program, use Tidy's options at the command line, and use Tidy with configuration files to make your work more efficient. I'll even point you to some web editing software in which Tidy is integrated.
NOTE
This article only looks at using Tidy at the Windows or Linux command line. However, the syntax for other operating systems is the same.