I have worked on many projects over the years and the success or failure of those projects boils down to one thing: Data. One of the most under estimated parts of a project is gathering and cleaning the data. Management is focused on the final reporting, sponsors are focused on the cost, IT is focused on the technology, and users are focused on the interface; but none of these things can work correctly without clean quality data sources.

Why Is the Data So Important?

A question that should always be asked at the start of a project is, “Why is this system being built?” It might be used to input sales, or ship orders, or record service time, but what is its ultimate benefit to the business? It’s benefit should be to provide metrics for business leaders to make decisions. Sure, it may have other important functionality, but one reason to automate a process is to get the data business leaders need to answer questions like: what areas or products have our highest profit margin? Who is selling the most products? Who are our best customers? Is customer service treating the customer right? In order to answer these questions, quality data must be captured. Having incorrect or incomplete data will only produce inaccurate analysis and reporting. Inaccuracies in reporting can cause business leaders to make decisions that lower company profits or lead the business in the wrong direction. Even a small misstep can take a lot of time and manpower to correct.

Data may come from many sources, but can be categorized into three areas. Depending on the data integrity of the sources, there may be different levels of cleaning that need to be performed to achieve “grade A” data.

Common Data Sources

1. External to company – Records are sent from a vendor or partner company supplying details on orders or sales. Make sure that records contain all the fields needed, the data is clean, complete, correctly ordered and there are keys to join this data to other related datasets. It may be difficult to get corrections made to data at this source.

2. Internal to company – Records are sent from another department or division within the company. These may be supply manufacturing costs, quantity produced/sold, sales or client information. Make sure that records contain all the fields needed, the data is clean, correctly ordered and there are keys to join this data to other related datasets. It should be easier to work with these source owners if corrections are needed.

3. Data entry – Records are keyed in by customer service or the sales department through applications or tools like Salesforce. This source is one of the most difficult to get clean, consistent data from. Application changes may need to be made to ensure the clean entry. These can cause delays as changes may take months to complete.

Getting the data sources lined up can be very time consuming if changes are required. Source systems may take months to accommodate a change request. Find the problems and start addressing them early.

One common solution to data problems is to push them off until later in the project because deliverables and timelines have to be met. “We’ll come back and fix that later,” or “It’s not that big of a problem. It will be fixed in phase 2,” are often costly oversights on management’s part. When that data issue is not fixed or the project is rushed into production early, these seemingly little issues can cause major problems. Below are a few examples that will hopefully drive my point home.

A Costly Problem

Applications that are used to enter data into a system should be thoroughly tested to make sure validation and record creation is correct. It’s difficult, time consuming, and expensive to fix data issues made months ago by an application bug.

A few years ago I went to a client to help resolve problems in a CRM billing system. While working on the issue, I uncovered an unknown issue where the flag that tracked the progression of a service ticket through the system was being set incorrectly and causing tickets to be lost in the system. After the application bug and records were fixed, it was discovered that $75,000 of service tickets had not been billed. Many of the tickets were no longer eligible to bill, but $57,000 was successfully recouped.

This problem could have been avoided with better testing, but also with a report tracking ticket creation, billing and settlement. Dirty data can be a problem affecting decision-making, but lost data can quickly affect the bottom line.

Not Quite There Yet

Another time, I went to work on a project for a charitable organization which was doing a 300 million dollar fund drive over a number of years. They were moving the system from a mainframe to a client-server platform. I had to read through the old program code and created a new reporting system. While reviewing the old code (which had been in place for years) I found an issue in the way records were categorized and totaled. The bug was causing certain types of donations to be counted twice. The result of this little bug was that reporting was overstating donations by $3 million dollars.

A Taxing Situation

I joined the project team of a client who was writing their own payroll system so they would not have to lease costly software. The deadline was getting close and the staff was working hard to meet it. I worked on the reporting team. The new reporting was used to validate the system and any problem that was associated with the reporting until it could be traced back to a problem with data entry or system processing. Once the problem was located it was quickly fixed. The bigger problem was that management had already decided to start using the system and had begun running payrolls with the new system. Things hummed along well and the system was up and working … until tax time came around. When the client went to run W2s for clients they discovered that although the bugs had been fixed in the system, often the data was not. Thousands of W2s were incorrect and the staff worked around the clock to fix data problems, rerun months of payroll, and reproduce thousands of W2s. I don’t know what the total cost was to the company, but it did cost a few jobs.

I hope you can see why data quality is so important and should not be left as a bottom task on the project plan. Quality source data is a critical factor for a successful system. Management is counting on the correct reporting and analysis to guide the direction of the company. Just remember the old adage: garbage in = garbage out.


Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *