The Fragility Factor
It may seem that the title of this book is an oxymoron. How can something as ad hoc and unstructured as Web scraping be coupled with something so formal and structured as a pattern? Ideally, the previous discussion of how mashups work under the hood will have made you more comfortable with the technology.
If you think reverse-engineering Web pages still doesn't sound like the type of rock-solid approach that a professional developer should be using, I don't blame you. One of the core tenets of software engineering is that applications should behave in a reliable and predictable manner. Web harvesting—although a great deal more reliable than screen scraping—is inherently unstable if you don't control the Web sites from which you extract data. Because you can't determine when a scrape-based solution might break, you should never employ this approach on a mission-critical system.
If you have the chance to help your firm gain a competitive advantage or reduce costs—even if just for a limited time—you should explore the opportunity. There is nothing wrong with an application that has a short lifespan, so long as you don't create a situation where the cost of remediating or retiring the solution exceeds the achieved benefit. The rapid speed with which mashups can be developed means occasional remediation isn't a time-consuming task. Plus, quick release cycles translate into more chances for exploratory development, which in turn can lead to the discovery of new uses or solutions.
The patterns in this book all adhere to this basic premise. You won't find examples of settling stock trades or sending online payments, even though mashups can facilitate those tasks. It's simply irresponsible to use the technology in this manner. Like any development effort, a mashup solution will require regular maintenance over its lifetime. Unlike with traditional applications, you may not be able to determine the time when this work will be required. Web Service APIs can change, RSS feeds can be restructured, or site redesigns may temporarily toss a monkey-wrench into your application's internal workings. Because of these possibilities, you should implement mashup-based solutions only where you can tolerate temporary downtime that may occur at unexpected intervals.
The fragility score is an ad hoc28 rating based on a number of factors:
- A mashup pattern that relies on a single Web site (e.g., Infinite Monkeys, Time Series, Feed Factory, Accessibility, API Enabler, Filter, Field Medic) is less fragile because there is only a single point of potential failure.
- A multisite-based pattern (e.g., Workflow, Super Search, Location Mapping, Content Migration) is more fragile with each additional site that it leverages.
- Mashups that employ Web harvesting are generally more fragile than those that use feeds (RSS, Atom). Feeds are, in turn, more fragile than Web Service APIs. APIs are the most stable integration point because they reflect a site's commitment to expose data and functionality.
- Mashups that mine data from "hobby" sources have a greater risk of failing. For example, obtaining local weather data from the U.S. government-funded National Oceanic and Atmospheric Administration's (NOAA) weather site (http://www.nws.noaa.gov/) is probably a safer bet than obtaining the information from your local high school or radio station. For-profit sites may exert legal pressure to halt mashups (see the Sticky Fingers anti-pattern).
- Mashups that use boutique data not widely available on the Internet are at high risk. What are your alternatives if the site suddenly vanishes one day?
Each pattern template described in this book contains a fragility score ranging from 1 glass (the least fragile) to 5 glasses (the most fragile). No pattern receives a score of zero, because even the most rigorously tested mashup-backed application always has some degree of brittleness.
The fragility score is ultimately intended to encourage thought about mashup stability. It's possible to have five sites in a multisite pattern that change less frequently than an individual Web page used in a single-site pattern. This is particularly true when vendor products and internally created systems are involved. The user interfaces of commercial and in-house applications aren't frequently redesigned. Public Web sites, in contrast, must constantly reinvent themselves in the battle to attract eyeballs.
If you create a mashup-based solution and don't acknowledge that it encapsulates some degree of uncertainty, you are just kidding yourself. Worse, you are deceiving your users, who will not be pleased when the system "mysteriously" fails one day.
In case you think only mashups have this Achilles' heel, keep in mind that any distributed system (which is what a mashup is) contains an inherent level of risk. Each additional component and the infrastructure that connects it represent another potential point of failure. So before you think, "Why the heck would I build something that might break?" consider how you have handled similar situations in the past. You can address many of these fragility issues by thinking about redundancy, monitoring, and notification up front.