Microsoft MFA Meltdown
Microsoft reveal the causes of its multi-factor authentication service issue which effected a number of its customers worldwide last week.
Microsoft’s Azure team went public with three unearthed independent causes that it discovered after investigating the November 19th worldwide multi-factor-authentication outage that plagued a number of its customers. Microsoft also found monitoring gaps that resulted in Azure, Office, 365, Dynamics and other Microsoft users not been able to authenticate for much of that day.
Root one: the cause showed up as a latency issue in the MFA front-ends communication to its cache services.
Root two: was a race condition in processing responses from MFA back-end server. These two causes were introduced in a code update rollout which began in some datacenters on Tuesday the 13th November and completed in all datacenters by Friday the 16th November.
Officials from Microsoft said that root cause three was triggered by root cause two which resulted in the MFA back-end being unable to process any further requests from the front-end, even though it seemed to be working fine based on Microsofts monitoring. Customers from European, Middle Eastern and African (EMEA) and Asian Pacific (APAC) were hit first. As the day went on Western Europe and the American datacenters were hit.
Even after engineers applied a hotfix which allowed front-end servers to bypass the cache, the issues still persisted. Officials acknowledged telemetry and monitoring wasn’t working as expected.
Microsoft identified a number of next steps to improve the MFA service which include: A review of its update-deployment procedures (target completion date December 2018) A review of monitoring services (target completion date December 2018) A review of the containment process which will help avoid propagating an issue to other datacenters (target completion date January 2019) and an update to the communications process for the Service Health Dashboard and monitoring tools (target completion date December 2018).
Microsoft Apologize to all customers that have currently been effected but there has been no mention of any plans for any financial compensation.