Quick-and-Dirty Data Cleaning
You can do a pretty good job of cleaning data with just Microsoft Word or Microsoft Excel by using their search, sort, and filter features. For example, if you have an address list in Excel, you can find most of the duplicate addresses by sorting the list by street number. This approach will group most duplicate addresses. A similar sort by last name will give you many of the duplicate names.
Word’s find-and-replace feature is a fast way to eliminate duplicate spaces in documents. Replacing two spaces with a single space will quickly sort out entries that are hard to find by manually scanning the list. Running the find-and-replace operation repeatedly will weed out cases in which more than two spaces have been inserted.
While you can do a lot of this work with Word, Excel is generally better for that purpose, and Microsoft Access is better yet, because Excel and Access allow you to write more elaborate filters. By using a little creativity with the words and phrases you’re searching, you can make Word a better tool for whole-text search. If you want to search multiple documents at once, you can combine them into a single long Word document for search purposes.
While the possibilities are limited only by your imagination, these are pretty primitive tools for cleaning data. If you have a lot of data, or if you need to do elaborate cleaning, consider a data cleaning program.