SELECT LANGUAGE BELOW

Amazon Holds Urgent Engineering Meeting Following AI-Related System Failures

Amazon Holds Urgent Engineering Meeting Following AI-Related System Failures

Amazon Faces Service Disruptions Due to AI Code Problems

Amazon’s e-commerce sector called an urgent meeting, gathering numerous engineers to delve into a series of recent outages, some of which are linked to AI coding tools.

The Financial Times reported that Amazon’s online retail platform has been troubled by repeated service failures in recent months. This situation has led the company’s executives to arrange prompt, detailed discussions with engineering teams to investigate what internal documents have referred to as a “wide-explosive” incident related to AI-generated code changes.

A document from a press briefing indicated that the company recognized a trend of incidents influenced by several factors, notably changes in its systems that were created by AI. It specifically pointed to new applications of generative AI, where best practices and protections aren’t fully in place yet, as a notable concern.

In an internal email, Dave Treadwell, Amazon’s senior vice president and a former Microsoft engineering head, addressed this troubling trend with his team. He acknowledged, “As you all know, our site and the related infrastructure have not been performing well recently.”

This gathering marked an enhancement to the company’s usual weekly operations review called This Week in Stores Tech (TWiST). Treadwell noted that the meeting aimed to investigate the underlying causes of these recent issues and to lay out immediate steps to avert future interruptions. He encouraged everyone to participate, although attendance is typically optional.

One of the most prominent outages recently occurred when Amazon’s main website and shopping app were down for nearly six hours earlier this month. This disruption barred customers from completing purchases or accessing fundamental account features, such as checking product prices and account details. Amazon attributed this failure to the introduction of faulty software code.

While the briefing materials didn’t specify which incidents would be scrutinized in the meeting, they did highlight plans for stricter monitoring on AI-assisted code changes. Going forward, junior and mid-level engineers will need approval from senior engineers before implementing such updates.

Amazon Web Services (AWS) has faced difficulties tied to AI as well. The cloud sector encountered at least two separate issues involving AI coding tools that the company is deploying throughout its teams.

A particularly notable incident at AWS happened in mid-December, when an engineer utilized the Kiro AI coding tool, which inadvertently led to deleting and recreating the entire environment. This resulted in a 13-hour outage for the cost calculator utilized by AWS clients. Amazon described the problem as largely limited, affecting only one service in mainland China, and noted that a second incident related to AI didn’t impact customer services.

These recent technical challenges come amidst significant job reductions at Amazon. Numerous engineers have shared with the Financial Times that as the workforce shrinks, they are witnessing an uptick in Sev2 incidents, which necessitate rapid responses to avert product outages. Earlier this year, the company laid off 16,000 employees, adding to previous job cuts from prior years.

Amazon has pushed back against claims linking layoffs to the rise in system failures. The company stated that availability reviews are standard operational practices aimed at continuous improvement. “TWiST is a regular weekly meeting with key retail technology leaders and teams to assess operational performance across stores,” the company explained.

Facebook
Twitter
LinkedIn
Reddit
Telegram
WhatsApp

Related News