Intervals Service Interruption Update

Michael Payne | May 29th, 2019 | ,

Yesterday we experienced an unexpected service interruption. In the spirit of openness and transparency we wanted to communicate what exactly happened.

What happened?

At approximately 6:00 PM (PDT) on May 28, 2019, our network probes and monitoring systems reported a problem with network availability. Our system administration team immediately started an investigation and isolated the problem to a configuration issue with the load balancer technology we utilize for high availability. While we were working to resolve the issue, connectivity was unavailable for customers. At approximately 12:50 AM (PDT) on May 29, a solution was implemented. Consistent uptime is our number one priority with Intervals. You can monitor the uptime report on our status page at any time. We sincerely apologize for any inconvenience this may have caused.

How was the problem resolved?

The source of the problem was emergency maintenance performed by our hosting company, IBM. This emergency maintenance inadvertently caused the load balancer technology we utilize to not perform properly. We utilize redundant load balancers but the issue impacted primary and secondary load balancers. As IBM actively investigated and tried to troubleshoot the load balancers we decided to commission new load balancers and brought them into production. Deploying new load balancers did take additional time and required updating DNS but we believe it was the best long term solution.

What did we learn?

During this interruption we were focused on two things: fixing the problem and responding to customer support inquiries as quickly as possible. We notified all administrator level users of the problem, updated twitter during the incident, and followed-up with administrators with a summary email similar to this blog post. In hindsight we should have notified additional user levels. We tend to error on the side of caution with notifications since Intervals is quasi white-labeled but based on the feedback we received via email we should notify additional users levels.

We are stable and working expected, but since we had to update DNS there will be residuals until DNS is fully propagated. The solution involved updating DNS to route traffic to different servers. DNS updates can be tricky because DNS servers are cached around the globe by local DNS servers and can take some time for all DNS servers to see the update. If you are continuing to experience problems we recommend flushing your DNS cache or rebooting your computer. That may help.

For flushing cache, here are some instructions on how to do this on Windows:
https://support.4it.com.au/article/flush-dns-windows-10/

And here are instructions for OS X:
https://help.dreamhost.com/hc/en-us/articles/214981288-Flushing-your-DNS-cache-in-Mac-OS-X-and-Linux

What’s next?

We will thoroughly analyze the situation and our redundancy policies and make any necessary adjustments to prevent this type of problem in the future. If you have any questions or concerns please contact our support team. We’d be more than happy to provide you with any more information you might need. We appreciated the patience and understanding extended to us during this interruption in service.

Leave a Reply

Intervals Blog

A collection of useful tips, tales and opinions based on decades of collective experience designing and developing web sites and web-based applications.

What is Intervals?

Intervals is online time, task and project management software built by and for web designers, developers and creatives.
Learn more…

John Reeve
Author Profile
John Reeve

John is a co-founder, web designer and developer at Pelago. His blog posts are inspired by everyday encounters with designers, developers, creatives and small businesses in general. John is an avid reader and road cyclist.
» More about John
» Read posts by John

Jennifer Payne
Author Profile
Jennifer Payne

Jennifer is the Director of Quality and Efficiency at Pelago. Her blog posts are based largely on her experience working with teams to improve harmony and productivity. Jennifer is a cat person.
» More about Jennifer
» Read posts by Jennifer

Michael Payne
Author Profile
Michael Payne

Michael is a co-founder and product architect at Pelago. His contributions stem from experiences managing the development process behind web sites and web-based applications such as Intervals. Michael drives a 1990 Volkswagen Carat with a rebuilt 2.4 liter engine from GoWesty.
» More about Michael
» Read posts by Michael

help.myintervals.com
Videos, tips & tricks