OpenAI Updates ChatGPT Atlas To Counter Prompt Injection

OpenAI deployed a security update to ChatGPT Atlas after identifying a new class of prompt injection attacks against its browser agent, the company said this week.

The update included an adversarially trained model and additional safeguards designed to prevent malicious instructions embedded in web content from redirecting agent behavior.

ChatGPT Atlas’s agent mode allows the system to browse the web, click links, and type into pages on your behalf. That capability expands the attack surface.

Prompt injection attacks work by hiding instructions inside emails, documents, or webpages that the agent processes, potentially causing it to take unintended actions such as sending emails or sharing sensitive data.

The newly identified attacks were discovered through OpenAI’s internal automated red teaming system.

The system uses reinforcement learning to train an AI attacker that repeatedly probes the browser agent for weaknesses.

According to the company, this approach uncovered multi-step exploits that could steer the agent into completing harmful workflows over many actions, rather than triggering a single incorrect response.

OpenAI said the findings were fed directly into a rapid response loop.

Engineers retrained the agent against the discovered attacks and adjusted system-level defenses, then deployed the updated model to all ChatGPT Atlas users.

The company stated that the same process can be applied to analyze real-world attack attempts and expedite mitigation.

OpenAI described prompt injection as an ongoing security challenge rather than a problem with a final fix.

The company stated that it plans to continue expanding automated red teaming, adversarial training, and monitoring to reduce the likelihood and impact of future attacks as browser-based agents become more widely adopted.

Our Key Takeaways:

  • OpenAI released a security update to ChatGPT Atlas after internal testing revealed new prompt injection attacks against its browser agent.

  • The attacks exploited the agent’s ability to process untrusted web content and perform actions similar to a logged-in user.

  • OpenAI said it will continue using automated red teaming and rapid model updates to address emerging agent security risks.

You may also want to check out some of our other recent updates.

Wanna know what’s trending online every day? Subscribe to Vavoza Insider to access the latest business and marketing insights, news, and trends daily with unmatched speed and conciseness! 🗞️

Subscribe to Vavoza Insider, our daily newsletter. Your information is 100% secure. 🔒

Subscribe to Vavoza Insider, our daily newsletter.
Your information is 100% secure. 🔒

Share With Your Audience

Read More From Vavoza...

Wanna know what’s
trending online?

Subscribe to access the latest business and marketing insights, news, and trends daily!