On Tuesday morning, Cloudflare experienced a major outage that disrupted services for Elon Musk’s X and numerous websites, apps, and video games. The company has now acknowledged that the failure stemmed from a programming mistake on their part.
According to reports, many Americans woke up to error messages and dysfunctional websites. The cause of this widespread issue was quickly linked back to Cloudflare, a service used by various companies to safeguard their servers from the broader Internet.
In a recent blog post, Cloudflare clarified that the outage was due to an internal programming fault and insisted that it was “not caused, directly or indirectly, by any type of cyber-attack or malicious activity.”
The outage resulted from internal changes made to the database permissions related to its bot management system. This adjustment inadvertently caused the database to produce a “feature configuration” file that was twice its expected size. As this oversized file spread through Cloudflare’s global network, it exceeded size limits coded into the software, leading to the complete shutdown of the bot management module and subsequent failures of Cloudflare’s main traffic proxies, which manage customer traffic.
Things were complicated further by the inconsistency of the failure, since the permissions change only partially updated the database. Consequently, large files were generated sporadically every five minutes during database queries on both updated and unupdated sections of the database cluster. This initial inconsistency led Cloudflare engineers to suspect a distributed denial of service (DDoS) attack, although this theory was later ruled out after a more thorough investigation.
While many customers were directly affected, the impact extended to third-party services that integrate with Cloudflare, like the customer login system using Turnstile CAPTCHA.
Cloudflare’s engineers were able to resolve the outage by around 10 a.m. ET by preventing the generation of these oversized files and manually implementing stable versions across the network. Although the company announced that services were restored, many websites continued to face issues for several hours post-fix. The general consensus is that the outage lasted about six hours, with one industry expert estimating costs to Cloudflare’s customers at $15 billion per hour.
In light of these events, Cloudflare’s CEO, Matthew Prince, offered an apology, stating that this outage was intolerable, especially considering Cloudflare’s critical role in the internet landscape. The company is now conducting a comprehensive internal review to improve its processes, enhance system resilience against future configuration issues, optimize debugging and visibility, and establish more precise feature kill switches.
For additional details, click here for Cloudflare’s blog.





