Amazon Web Services experienced a 13-hour interruption to one system used by its customers in mid-December after engineers allowed its Kiro AI coding tool to make certain changes, according to four people familiar with the matter.
The people said the agentic tool, which can take autonomous actions on behalf of users, determined that the best course of action was to “delete and recreate the environment”.
Amazon posted an internal postmortem about the “outage” of the AWS system, which lets customers explore the costs of its services.
Multiple Amazon employees told the FT that this was the second occasion in recent months in which one of the group’s AI tools had been at the centre of a service disruption.
Rafe Rosner-Uddin
Not surprising that these kinds of outages happen when you hand over operations to unreliable systems without proper controls. Having invested astronomical amounts in AI technologies, management is all too eager to expand their use and to demonstrate tangible return-on-investment — beyond the cost reductions from constant layoffs. The rapid and uncontrolled push for AI in critical software infrastructure has been blamed for the multiple bugs in Windows updates over the past year as well; Microsoft even officially admitted some of the issues and committed to working on fixes.