Home
Customers
What Caused HubSpot's Outage and How We'll Prevent It in the Future

What Caused HubSpot's Outage and How We'll Prevent It in the Future

Download Now: Free Marketing Plan Template

JD Sherman

Updated: September 12, 2018

Published: September 07, 2018

On Wednesday, Dharmesh stood on stage in front of a crowd of 24,000 people at INBOUND and introduced The Customer Code, one of the core tenets being: “Own your mistakes." This week, we put that principle to use.

Early Thursday morning EDT we experienced an outage that affected some of our Marketing Hub Enterprise customers. Our engineering team resolved the underlying issue fairly quickly, then spent the remainder of the day resolving the effects of the outage. They met this morning to conduct an analysis of what happened and what we can learn from it.

We are sorry, and I want to provide more detail about what caused this issue and how we are going to prevent it in the future.

The Root Cause

HubSpot just rolled out a substantial number of new features to our Marketing Hub Enterprise customers. With this rollout, we also want to make it possible for customers at the starter and professional levels to try the full enterprise feature set on a self-serve basis.

On Wednesday, our engineering team began the required infrastructure work that would eventually support the improved enterprise trial experience. The plan was to launch the new trial experience before the end of September.

While making the first of these changes, we inaccurately tagged all existing Marketing Hub Enterprise portals as trial portals. This in itself did not cause a problem, as those portals still had their enterprise features and access.

However, many of those portals had originally started as trial portals, and when they were created they were given a trial expiration date by the trial system. That expiration date was never removed when the portal was upgraded to a Marketing Hub Enterprise portal. Again, this was not a problem in itself and is something that has existed in our system for some time.

But on Thursday morning, a daily process designed to turn off expired trials came across these Marketing Hub Enterprise portals, which were now tagged as trials with expiration dates in the past. The process determined that they were expired trials and downgraded them to free portals, removing all enterprise-level functionality from those portals.

The affected portals immediately lost their automation, CMS hosting, ads, email, and other functionality that is part of the enterprise tier. The most serious result of this was that customers’ websites, blogs, landing pages, and forms hosted on HubSpot immediately stopped working. For customers that have embedded forms in external pages, leads were still captured and did not experience any downtime.

Recovery

After internal monitoring systems alerted us to outages within our content system, our engineering team began working to identify and resolve the problem. The issue was quickly identified, and our top priority was to restore these portals to the full functionality they had before this error. This process was completed by approximately 7:00 AM EDT Thursday morning.

Phase two was fixing the consequences of the downgrade process. Since it is not common for a downgraded portal to upgrade without human intervention, our system was not prepared to seamlessly and automatically “reupgrade” many of these “new” enterprise portals. As a result, our affected customers experienced lost site settings and pages, and found themselves with disconnected domains and workflows that weren’t executing.

This clean-up work was the most challenging for our team, and it’s where our customers felt the most pain as they waited for key pieces of their marketing stack (like websites and landing pages) to be restored. And as they waited, many of our customers’ customers were seeing an error message that made it appear as if the fault was with our customers, not us (more on how we’re solving for this below).

For the rest of the morning and into the afternoon, we were working to fix all of these issues and restore the sites back into the same state that they were in before the downgrade.

So, why did this take so long? Don’t we have backups, and shouldn’t this have been a simple matter of rolling back by an hour or so?

Yes, we do have backups, and we did use some of them.

If we had simply reverted to backups, however, HubSpot customers that were unaffected by the issue would have lost all of the work completed in the four hours between the changes going live and our discovering the issue, and we had no way to calculate the potential impact this would have on their businesses. So we decided not to rely on backups alone.

Instead, we needed to take a much more surgical approach. We needed to figure out how to revert specific customers’ data back while preserving all other customers’ work. And we had to do this across many different areas of the product while our customers were also working hard to get things up and running for their customers.

The recovery process involved our engineering team building and deploying a series of targeted scripts while also working closely with our customer support team to identify and resolve issues alongside affected customers.

Future Prevention

First, I want to be very clear that the website downtime some of our customers experienced is unacceptable. This doesn’t just impact our customers. It impacts their customers and prospects as well.

Something that we emphasize at HubSpot is that we’re going to make mistakes, and that’s natural — but when we make mistakes, we make sure we learn from them to avoid repeating similar issues in the future. Our engineering team is doing everything possible to ensure that this will not happen again.

There are several big changes we’re making to help prevent this in the future. First, in the short term, we’ve removed all the old trial expiration dates from our system so that no more paid portals can be considered expired again. In addition, we’re sunsetting our old system for managing trials and moving to a more modern system that manages trial status in its own database. As part of this modernization, we’re going to simplify how we handle deactivation and make sure we only take simple, reversible actions until a much longer grace period has elapsed. We’re also going to improve our systems to detect anomalous changes to our product configuration code and halt any modifications that make too many changes at one time.

Second, we’re going to invest in tools that can quickly revert data for a subset of customers and ensure that systems that update data automatically carefully consider the current state of the data before making any changes. We’re going to work to improve how we handle reactivation so that if a portal loses access due to a legitimate cancellation, expired trial, or bug, we’ll be able to recover that portal back to its full state much more gracefully and quickly.

Third—and this is a small but important change—we’re going to fix how we communicate errors to our customers’ customers so that no error message ever makes it appear that our mistake appears to be our customers’ mistake.

Again, we’re sorry.

If you’re experiencing any additional issues related to this outage, please call HubSpot Support at 1-888-HUBSPOT x3.

The Complete List of December 2019 Product Updates

Dec 20, 2019
HubSpot Academy 2019 Free Online Course Roundup

Dec 13, 2019
A Closer Look at Multi-Touch Revenue Attribution Models in HubSpot

Dec 12, 2019
How To Create A Product Demo Video

Nov 25, 2019
3 Ways to Add Automation to Your HubSpot Ads Strategy

Nov 18, 2019
Learn How to Build a Marketing Campaign in Marketing Hub Starter

Nov 06, 2019
Inbound Reporting Podcast: Exploring the HubSpot Report Builder

Nov 05, 2019
Inbound Reporting Podcast: A Tour of Your HubSpot Reporting Tools

Oct 29, 2019
How Admins Can Effectively Manage Data in HubSpot

Oct 24, 2019
Inbound Reporting Podcast: Managing Your HubSpot and Integration Data

Oct 22, 2019

Blogs

Blogs

Marketing

Sales

Service

Website

AI

Instagram Marketing

Customer Retention

Email Marketing

SEO

Sales Prospecting

Newsletters

Newsletters

The Hustle

Masters In Marketing

The Pipeline

Videos

Videos

The Hustle

Marketing with HubSpot

My First Million

Marketing Against the Grain

HubSpot

Podcasts

Podcasts

My First Million

Goal Digger

The Hustle Daily Show

Another Bite

Business Made Simple

Marketing Against the Grain

Online Marketing Made Easy

The Product Boss

Nudge

Side Hustle Pro

Outbound Squad

Resources

Resources

Academy

Templates

Ebooks

Kits

Tools

HubSpot Products

The HubSpot Customer Platform

Overview of all products

Marketing Hub

Sales Hub

Service Hub

Content Hub

Operations Hub

Commerce Hub

About HubSpot

Contact Us

Customer Support

Log in

日本語

Deutsch

English

Español

Português

Français

What Caused HubSpot's Outage and How We'll Prevent It in the Future

The Root Cause

Recovery

Future Prevention

Don't forget to share this post!

Related Articles

The Complete List of December 2019 Product Updates

HubSpot Academy 2019 Free Online Course Roundup

A Closer Look at Multi-Touch Revenue Attribution Models in HubSpot

How To Create A Product Demo Video

3 Ways to Add Automation to Your HubSpot Ads Strategy

Learn How to Build a Marketing Campaign in Marketing Hub Starter

Inbound Reporting Podcast: Exploring the HubSpot Report Builder

Inbound Reporting Podcast: A Tour of Your HubSpot Reporting Tools

How Admins Can Effectively Manage Data in HubSpot

Inbound Reporting Podcast: Managing Your HubSpot and Integration Data