How to Improve Data Quality
"Virtually everything in business today is an undifferentiated commodity, except how a company manages its information. How you manage information determines whether you win or lose."
—Bill Gates
Everybody wants better quality of data. Some organizations hope to improve data quality by moving data from legacy systems to enterprise resource planning (ERP) and customer relationship management (CRM) packages. Other organizations use data profiling or data cleansing tools to unearth dirty data, and then cleanse it with an extract/transform/load (ETL) tool for data warehouse (DW) applications. All of these technology-oriented data quality improvement efforts are commendable—and definitely a step in the right direction. However, technology solutions alone cannot eradicate the root causes of poor quality data because poor quality data is not as much an IT problem as it is a business problem.
Other enterprise-wide disciplines must be developed, taught, implemented, and enforced to improve data quality in a holistic, cross-organizational way. Because data quality improvement is a process and not an event, the following enterprise-wide disciplines should be phased in and improved upon over time:
- A stronger personal involvement by management
- High-level leadership for data quality
- New incentives
- New performance evaluation measures
- Data quality enforcement policies
- Data quality audits
- Additional training for data owners and data stewards about their responsibilities
- Data standardization rules
- Metadata and data inventory management techniques
- A common data-driven methodology
Current State of Data Quality
We repeatedly run into a common example of data quality problems when trying to speak with a customer service representative (CSR) of a bank, credit card company, or telephone company. An automated voice response system prompts you to key in your account number before passing your call to a CSR. When a person finally answers the call, you are asked to repeat your account number because the system did not pass it along. Where did the keyed-in data go?
Another more serious data quality problem involves a report in 2003 about the federal General Accounting Office (GAO) not being able to tell how many H-1B visa holders worked in the U.S. The GAO was missing key data and its systems were not integrated. This presented a major challenge to the Department of Homeland Security, which tried to track all visa holders in the U.S.
According to Gartner, Inc., Fortune 1000 enterprises may lose more money in operational inefficiency due to data quality issues than they spend on data warehouse and CRM initiatives. In 2003, the Data Warehouse Institute (TDWI) estimated that data quality problems cost U.S. businesses $600 billion each year.
At an Information Quality Conference in 2002, a telecom company revealed that it recovered over $100 million in "scrap and rework" costs, a bank claimed to have recovered $60 million, and a government agency recovered $28.8 million on an initial investment of $3.75 million. Clearly, organizations and government are slowly realizing that data quality is not optional.
Many companies realize that they did not pay sufficient attention to data while developing systems during the last few decades. While delivery schedules have been shrinking, project scopes have been increasing, and companies have been struggling to implement applications in a timeframe that is acceptable to their business community. Because a day has only 24 hours, something has to give, and what usually gives is quality, especially data quality.