A de-duplication of master data may take weeks or even months, even with the help of a master data cleansing company. While it’s a matter where accuracy and quality of the final results are placed higher on a priority list than duration (and even cost), time still matters. We have a couple of pieces of advice which aspects to pay attention to in order to speed the process up and get a positive outcome as fast as possible.
Duplicates? Why Bother About It?
A customer was reached by a company’s targeted advertising campaign. He paid for a good and wanted to have it delivered to his home.
After a couple of weeks, he contacted the company’s support service as he had some issues with the product. According to the company’s policies, the support service transferred feedback remarks of the customer to a company’s production unit to get the product revised and improved.
So, the customer’s name is in five databases at least now — marketing, sales, logistics, support, and production departments. That’s how a duplicate occurs.
Over time, without a centrally governed process of capture and distribution of data, this duplication will end up to be a problem. Even if it is recorded in the same way and the same format across enterprise’s information systems and thus is relatively easy to detect and delete, it requires storage facilities and entry efforts by staff.
But in reality, a record exists in various forms in multiple databases which leads to errors and flawed analytical pictures when a company tries to get insights from its clients databases.
In our case, the name of the customer may be written down by each department in various forms if no corporate rules for data entry exist. This difference may cause the marketing department to contact the wrong consumers with thanks for feedback which the production unit found of high value — just because the production department identified him in another way as he was identified by the marketing one.
Companies increasingly want to get their databases to be clean from duplicates as inflow of data to corporate systems in modern business practices surged dramatically compared with just a period of 20 years ago and continues to rise. A moment comes when a company is buried under an enormous amount of incorrect and inconsistent records.
What Does Removal Of Duplicates Look Like?
Since the time when master data management (MDM) emerged as an important management area, a lot of standard tools for deleting the same data across an enterprise have arrived to market.
While they are standard which means they can be used for various organizations every case is unique requiring a strategy and tactics being designed on an individual basis.
These strategies and tactics make allowance for an organization’s structure, business goals, locations of departments, software, and other features. So, our first advice on how to get rid of duplicates fast is a thorough audit that will tell the answers to critical questions and help develop the most optimal process.
At this stage, you should deeply engage all stakeholders, from the guardians of the data to employees who periodically use databases and managers of executive level. A communication plan is needed that would inform the stakeholders in detail which operations and when will be done and what are the results expected for each group of users. The absence or lack of communication leads to failure or at least delays in the process.
Then technical operations begin. A typical set of these operations include comparing data, matching it, removal of what’s identified as duplicates and then consolidating.
Comparing and matching operations are being accomplished with automated tools that use advanced algorithms that in their turn implement a set of criteria (like measurements converted into base units and item classes, including synonyms as well as part numbers excluding symbols in case of Synopps) for identification.
For instance, Synopps’s software detects attributes of data and then compares it with the attributes of other data to identify the same characteristics. If the characteristics are identical, then we have a duplicate.
Deleting duplicates is possible both automatically and manually. In most cases, the two methods are combined as even sophisticated algorithms cannot ensure an absolute accuracy in recognition of the same data.
Unfortunately, not all duplicates are exact, there are partial matches. For instance, an algorithm may not be able to understand whether Dr. Strange and James Strange and Jim Strange are the same person, especially in case some information on this item, for example, home address is missing in some records or misspelled.
Consolidated records are data clean from duplicates that were merged and stored in one place. This place can serve as a single source of truth, or so-called “golden record” in future to help validate all new information inflowing into the company and correct it automatically or give tips for staff when putting an entry in a manual fashion.
Speeding the process of de-duplication up also comes through automation. Design of algorithms that meet the customer’s needs sometimes get things moving faster several times.
Creating new software or tuning existing tools, as we at Synopps do, to fit the company’s needs for de-duplication frequently helps eliminate the human factor from the process. That, in its turn, leads not only to less time needed for the whole project, but also reduces the risk of human error.