Following the maintenance to migrate all services to our new private cloud environment, we discovered that some of our phone number records had not synchronized to the new private cloud environment. This was preventing inbound calls from being delivered to the affected users. We started work right away to resync our numbers to restore service to those affected users.
This was complicated by the nature of the phone number records on our system: there is a global phone number database that tells the system which account or subaccount to send calls to for a particular number, and an account-level database that tells the system where to route those calls once they’ve reached the specific account. The earliest sync issues that we discovered were numbers that were present in the global database but not at the account level, and that’s where we began with the repair process.
Once we resynced those numbers, we discovered that there were also some scattered numbers that were in the individual accounts databases but had not synced to the global database, to the same effect from an end user’s perspective. So rather than pushing those records down from the global database to the account level, we then had to push them from the account level back up to the global. This account-by-account process was slower and was part of the reason that some sites had service restored earlier than others (other reasons being that some sites were not affected at all, or were only affected by the global-to-account sync issue that was fixed earlier).
The migration to the new private cloud infrastructure carried some significant risks, and it did cause disruptions, for which we are truly sorry. This particular outage’s root cause was unique to this migration process, and as a result it will not occur again.