When was the last time you went to import data into a new platform, only to find that the data was a complete mess? For most companies, the answer to this question is “the last time we imported data.”
You might find a third of the names aren’t properly capitalized, there are issues with addresses, some fields aren’t present, or there are styling issues throughout your dataset. Inconsistencies in data are a reality of data collection. They can’t be avoided, particularly when you're relying on humans to enter the data by hand at some point in the process. These reasons (and many others) are why so many companies turn to tools like Insycle to help them clean up their data and install improved data-quality policies.
Unfortunately, trying to clean that data by hand takes much longer than you initially thought it would — often full days or even weeks. In fact, it’s so time-consuming that many businesses find it easier to purge inconsistent records and move on rather than fix issues manually. But losing that data and the valuable insights it provides can be a huge loss.
The better option? Read these tips on how to keep your data consistent so that your HubSpot database is always in tip-top shape. In this article, we’ll outline the steps that you can take to ensure that your data is clean, free of inconsistencies, and ready for importing or sharing among your teams. This step-by-step guide will help you address some of the most common data consistency issues and design a data cleansing process that delivers results.
Why Should I Address Inconsistent Data?
You can’t really afford to ignore it — inconsistent data means problems for your business. Bad data costs companies around the world $3 trillion per year. Some studies have shown that bad data could potentially cost companies as much as 10–25% of their revenue.
We’re in a data-driven era, where datasets are shared between marketing, sales, and customer service teams to increase organizational effectiveness and facilitate alignment. When you have high levels of inconsistent data, it’s not just one team taking the hit. It’s everyone. The entire company. The fallout from that bad data extends to every corner of your business that the data touches.
While you can’t hope to avoid inconsistent data completely, you can improve your data quality management and cleansing processes to make your business more effective and make inconsistent data less of a headache throughout your organization. Through the use of tools and strategic techniques, you can help to alleviate the issues that bad data causes.
Phase #1: Fix Formatting and Case Issues
The most common data inconsistency issue that you’ll find in any dataset is formatting or case issues. This is especially true when data has been entered by hand. For instance, expecting leads to use the same formatting or correct case when submitting a form on your website is unreasonable. Just because a lead can’t be bothered to capitalize their name doesn’t necessarily mean they aren’t a great lead.
You don’t want to connect formatting requirements to the form and reduce the number of submissions you receive. In situations like this, your best course of action is to simply clean the data after it has been submitted.
When you collect data from a large number of sources, it’s difficult to facilitate uniformity without placing serious restrictions on how that data is entered or collected. eConsultancy studied the most common ways companies collect data, and most of them involve a human entering data.
However, formatting and case issues don’t just apply to customer-entered data. All datasets will contain these errors. For that reason, fixing these errors is a good starting point when cleansing a dataset. By fixing these issues early, you’ll fix a large percentage of the total data inconsistencies contained within the set, and removing them will make it easier to spot other issues.
Here are common formatting and case issues that you’re sure to run into in any dataset:
Proper case for first and last contact names
This is perhaps the most common error in all datasets. It may seem like a small inconsequential error, but it’s important to make sure these are fixed. Would you rather receive an email and be addressed as “Bob” or “bob" ? The latter comes across as unprofessional and is a clear indicator of automation, which will hurt conversion rates and your reputation.
Phone number formatting
There are many ways to format a phone number:
Also, are the phone numbers 10 digits or 11 digits? Do they have a “1-” in front of the number? Even then, phone numbers might be separated by office, mobile, and home phone numbers. The right formatting makes a huge difference, as it will ensure that the phone numbers are compatible with any systems that use them and will make things easier for your teams that routinely pull numbers to contact customers and prospects.
Mailing address formatting
Ensuring your address formatting is correct is critical. If you’re going to be mailing important information to prospects, customers, or employees, make sure that you don’t waste your budget and time mailing to an improperly formatted address. Cases like records that contain a partial or full two-part zip code are important to take into account.
Email address formatting
Being able to reach your customers through email is critical, but email formatting problems are quite common in most datasets. You may find that emails don’t have the proper “email@example.com” formatting. There may be spaces. Someone may have submitted an email address as “name -at- domain.com” or something similar. There may be typos or extra whitespace. If your emails aren’t formatted correctly, you’ll find a larger percentage of your emails bouncing or experiencing delivery issues, which hurts your company’s reputation among email providers.
Formatting and case issues are the low hanging fruit of data cleansing. These errors are typically easy to spot, easier to deal with (using tools like Insycle), and have a big impact on the effectiveness of the dataset.
Phase #2: Remove Whitespace and Unwanted Characters
Now let's tackle two separate issues. One is quite easy to spot with a quick glance at your data, and the other is harder to spot without macros or tools: removing whitespace and removing unwanted characters.
Removing whitespace and unwanted characters from your datasets is critical to appropriately search and filter the data. Unwanted characters can lead to compatibility issues or put your company in positions where you do damage to your reputation. Both are common problems but can have a seriously negative impact on your ability to use your data correctly.
An extra space in a data field is a very common issue. Maybe the user accidentally hit the space bar after entering their data. They could have hit the space bar before entering their data as well, placing the whitespace at the beginning of the entry.
Another common issue comes from a user hitting the space bar twice between words when they only meant to hit it once. Whitespace can cause formatting and usability issues in some situations but can be hard to spot without the help of a tool or macro.
Remove unwanted characters
You’ve seen it before — special characters that have no place in a dataset but are always present in at least a few fields. They might look something like this: Ã, ¢, â, ê. In most cases, these characters aren't manually entered but are caused by encoding issues that arise when you save, import, or export data.
In some cases, you’ll not only have to remove the characters but replace the field with the correct entry, which may require that you refer to a previous version of the dataset. If you’re removing these characters by hand, they should be relatively easy to spot.
Fixing these issues are too often low on the priority list but can have a big impact on the usability of the data within the set.
Phase #3: Consolidate and Standardize to Improve Filtering
Data standardization is critical for making the most of any dataset, and it’s not uncommon to see multiple fields attempt to describe the same thing in different terms. Two prospects that are listed within your dataset as “Chief Executive Officer” and “CEO” hold the same position but wouldn’t be featured in the same list if you filtered your data by job title — and that’s a problem.
Consolidating and standardizing similar fields makes your data more searchable and more useful to your teams. There are a few common data fields that experience this problem:
Consolidate and standardize job titles
Job titles are perhaps the most common field for standardization issues. There are a few reasons for this. First, there are a lot of acronyms to describe different job titles, as shown in the CEO/Chief Executive Officer example above. That example extends to many job titles.
Another common issue for standardization in job title data comes from the unique titles used to describe the same (or similar) position within the company. What's the difference between a Content Marketing Manager and a Content Manager? Or a Chief Marketing Officer and Marketing Director?
While, yes, there may be nuances between these positions across companies, there's often a lot of overlap. Consolidating and standardizing similar job titles will make things easier for your marketing, sales, and customer service teams that attempt to engage with prospects and customers.
Consolidate and standardize industry
Industry is another common data field where standardization issues arise. Competing companies might separately describe themselves as being in the “tech,” “software,” or “SaaS” industries.
Determining how you would like to categorize different companies in your dataset is critical for delivering relevant marketing and sales materials. Later, standardization is important for helping your customer service teams solve customer problems and meet their needs.
Associate companies to contacts
Over time, it’s common for HubSpot users to find that they have a lot of disconnected contacts and companies within their data. Regularly working to ensure the proper associations are in place will help your marketing, sales, and service teams better serve your customers. Failing to do so can make it difficult to find the right person to contact within the company and hamper your personalization efforts.
Consolidation and standardization issues don’t only apply to job titles and industries, but both provide the most common examples. In fact, consolidation is incredibly important in any field where a user (or any person) entered the information by hand. Standardization helps limit errors and miscues in your communication down the line and helps provide a more accurate representation of the companies contained within your data.
Phase #4: Remove Data to Lower Costs and Reduce Bloat
In this phase of the data cleansing process, we’ll move away from focusing on errors and issues within your dataset and focus on some strategies that can help you clean your data to lower costs and improve effectiveness.
Companies that store a lot of data and collect that data over long periods of time will find that a percentage of that data will age-out or become less useful. Prospect and customer databases double in size every 12–18 months on average. There's no reason to keep useless data sitting in your datasets, as it will only serve to bog you down.
Keeping your database clean and effective will help you lower costs on data storage and marketing campaigns and save your employees a great deal of time in the long run.
As you look for outdated or unhelpful entries to remove, there are a few key areas you should start with:
Remove contacts that bounce
You don’t want to continually try to reach out to someone who is never going to reply. Worse, high bounce rates hurt your standing with email providers and reduce the deliverability of your email marketing materials. The same can be said for disconnected or incorrect phone numbers — reaching out and never receiving a reply is a drain on your resources and budget.
Remove contacts from free email domains (Gmail/Hotmail)
You have to be careful with this one, as some of your subscribers will use free domain addresses instead of business domain email addresses legitimately when engaging with your materials. You don’t want to remove perfectly good leads from a list, but some percentage of contacts with free email domain addresses will be useless and a drain on your resources.
Many of the contacts with free email domains in your contact list are using “throwaway” emails. These are emails that they sign up for specifically to sign up for mailing lists so that they can avoid cluttering their important business email addresses. Contacts that use free email domains like Gmail and Hotmail in this way are unlikely to interact with your marketing emails, and you may be better served to take them off your list. Another significant portion of these emails will be completely fake, used by spammers to deliver messages en masse via website forms.
Use your analytics to determine which of the contacts in your database with addresses from free email providers are legit. They'll be the ones that open your emails and interact with your content.
Remove contacts that have unsubscribed
Under the CAN-SPAM act of 2003, it's illegal for companies to continue to send marketing materials to prospects that have unsubscribed from their mailing list. It’s easy to see how this might become a problem when companies have their data separated into several different platforms. A quick import of outdated data that includes unsubscribed individuals can quickly result in some pretty severe violations of the CAN-SPAM act. You should always ensure that the dataset you're using to send out your marketing communications uses your most updated subscriber information.
Remove contacts that haven’t engaged recently
If you have continually delivered new materials to a contact over a long period of time and they've failed to engage with those materials, there isn’t much point in keeping their data on hand and taking up space. You don’t want to continue sending emails to someone that subscribed to your mailing list seven years ago and hasn't engaged with a new email in the last four years. Your emails are likely hitting their “Spam” folder anyway.
While the timing of when it's appropriate to remove disengaged contacts is a policy that should be decided within your company, having a cutoff for disengagement is essential to ensuring your databases don’t become bloated with outdated information or uninterested contacts.
Remove redundant legacy fields
Sometimes multiple fields exist for the same purpose, for historical reasons, or for no reason. For example, someone created a form and added a new field to capture persona or company size, and a field like this already exists. Now you have values in multiple fields. Wouldn’t it be nice to consolidate them all to the right field and eliminate the redundant fields? Removing these fields can help reduce database bloat and filtering errors.
Phase #5: De-duplicate Entries
Duplicate data is a serious problem for any company that collects a large amount of data. Duplicate data occurs when an exact copy for a record within your dataset is created as a separate entry within the same database.
Whenever you import data into HubSpot or transfer data between platforms, there's a high probability that at least some data duplication will occur. Experts have found that duplication rates between 10%–30% are not uncommon for companies without data quality initiatives in place.
Duplicate data poses a number of serious concerns. First, the costs and lost productivity of dealing with duplicate data adds up over time. When your teams are updating two different entries for the same record, you quickly lose sight of which entry is the correct one to reference, making it difficult to maintain a single customer view.
Imagine a sales rep going to contact a prospect, only to find two different entries for the same prospect, each containing contradictory data. Now that sales rep has to spend their time sorting things out. This same scenario happens over and over again at companies with data duplication issues and can be a huge time-sink.
Taking the time to identify and consolidate duplicate records within your database will save you a lot of time and headaches in the long-term. Insycle can be used to identify and merge duplicate companies and contacts using a variety of fields in your dataset.
How Insycle Can Help You Cleanse HubSpot Data
Modern companies are moving data around on a daily basis. They transfer data internally—between their sales, marketing, customer service, finance, HR, and operations teams. They move their data between SaaS solutions and software suites they use to power their business, many of which won’t offer a simple way to manage data within the platform. They add and update records on a daily basis. This leads to many companies spending a great deal of time identifying issues like the ones covered in this article and correcting them by hand.
Insycle is a painkiller for these headaches. It helps HubSpot users improve their data cleansing operations by providing them the tools they need to identify and remedy common data issues efficiently.
Insycle isn’t a one-trick pony, either. It’s designed to integrate deeply into your data management operations and facilitate improvements throughout every department that uses your data within your company.
Here’s how companies are using Insycle today:
- One-off, periodic, and continuous data cleanse: Your databases are constantly being updated. New data is added or removed all the time, and with each new addition, issues can arise. Insycle helps you ensure that every new import is clean.
- Day-to-day operations: Insycle is the Swiss Army Knife of data cleansing. Whether you're importing data from events, bulk updating records, comparing a CSV list to Hubspot data, or filtering contacts for campaigns — Insycle not only helps you clean existing entries, but also helps you ensure that your records remain clean through daily usage.
- Improved data filtering and manipulations: Ditch VLOOKUP for good. Insycle provides an improvement and alternative to Excel VLOOKUP and and other functions that we commonly use to filter and manipulate data. Insycle is intuitive and simple to use, ensuring that your teams are better able to keep your data clean more reliably and with less Excel time.
Ready to get started? Learn more here about how Insycle can help HubSpot users cleanse data and improve their data operations on the whole.
Originally published Dec 18, 2018 9:00:00 AM, updated January 03 2019