What is OpenAI’ AI agent Operator?

What is OpenAI’s AI agent Operator?

Artificial intelligence is evolving at an unprecedented pace, and we have already entered the phase that many call the "third wave" of AI evolution. This wave marks the emergence of something much more complex than generative AI that once amazed us so much. And it is Agentic AI. Agentic AI is the new level of AI sophistication where systems, known as AI agents, are designed to be able not only to analyze data, respond to prompts, and generate content but also to autonomously think, decide, and act. In our previous blog, we have already discussed in detail what AI agents are and what is awaiting us with their evolution across industries.

Introduction of OpenAI's Operator

Sam Altman, the co-founder and CEO of OpenAI, in September 2024 outlined a five-stage progression of AI capabilities:

Step 1: Chatbots, i.e. AI with conversational language.
Step 2: Reasoners, which are reasoning chatbots
Step 3: AI agents, systems that can take actions.
Step 4: Innovators, AI that can make scientific discoveries. and, finally,
Step 5: Entire organizations consisting of AI agents that can take on a wide range of tasks, from automating customer service to managing complex workflows.

The jump from chatbots to reasoners with models like o1 from OpenAI or R1 from DeepSeek

marks a significant leap with its ability to show step-by-step reasoning and potential to revolutionize fields like education, scientific research, and complex problem-solving. And while the public was thinking about the potential timeline for the transition to AI agents, at the end of January, OpenAI presented its first general-purpose AI agent - Operator capable of performing complex tasks on the Internet in real-time. The agent is already available to users with a Pro subscription ($200/month) in the US, and OpenAI promises to roll it out to plus users later. So, we've already reached stage 3 with AI agents acting on our behalf, and this pace of the ongoing AI revolution is totally fascinating.

As the official OpenAI website reports, Operator can automate everyday tasks: book travel or events, order groceries, make online purchases or restaurant reservations, and more. For example, with the help of Operator, you can order delivery, plan a trip, book a table, and so on. Users can customize Operator with instructions for all sites or specific sites, for example, to set a preference for their favorite airline and specific amenities and select fully refundable hotels or hotels that offer free breakfast.

Task categories, including shopping, delivery, dining, and travel, are presented within Operator interface. When users activate Operator, a window appears on the right. It shows a dedicated web browser the agent uses to perform a task, with explanations of specific actions the agent is performing. As Operator uses its own dedicated browsers, users can still control their screen while Operator is running and perform other tasks in parallel.

Operator allows users to create a saved task for workflows they want to repeat regularly (e.g., summarize the news, order a restaurant). It is possible to save a task directly from a conversation by clicking Save Task or from the Settings menu. These saved tasks appear on the home page, making it easy to launch them with a single click.

How it works?

Operator is currently based on the new Computer-Using Agent model (CUA) and combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning to independently control a computer, i.e., perform complex tasks and interact with a browseк similarly as humans do. Operator "sees" interfaces through screenshots. Taking screenshots allows it to navigate graphical user interfaces (GUIs), such as buttons, menus, and text fields and reason about what actions to take. Screenshots from the computer are added to the context of the model, providing a visual snapshot of the current state of the computer. By working directly with the visual interface, it can take actions without using OS- or web-specific APIs. If it encounters challenges or makes mistakes, Operator can leverage its reasoning capabilities to self-correct. If the task is complex, the agent gets stuck or needs assistance, it simply hands control over to the user. OpenAI also says that Operator can perform multiple tasks at once.

The CUA model is trained to ask for user confirmation before finalizing tasks with external side effects, such as submitting an order or sending an email. This way, users can double-check the model's work before any action becomes permanent. Moreover, before sensitive actions, such as entering passwords and payment details, Operator transfers control to the users. In this takeover mode, Operator does not take browser screenshots. This approach enhances privacy for passwords and other data entered by the user. Once users finish the step that Operator prompts them to complete, they can return control to Operator. It will then seamlessly resume its automated workflow from where it left off. Operator also blocks malicious requests and prohibited content, ensuring a secure experience.

How can OpenAI's Operator compare to similar solutions?

OpenAI was actually not the very first to explore AI agents. Similar systems have previously been released by its competitors, Anthropic and Google. Specifically, in October 2024, Anthropic, the Amazon-backed AI startup behind the Claude chatbot that was founded by ex-OpenAI research executives, announced significant updates to its AI models, introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. Additionally, Anthropic has launched a public beta of a "computer use" feature, enabling Claude to interact with a computer to complete complex tasks like humans do, i.e. via screen navigation, cursor movement, clicking buttons, real-time internet browsing, and typing text. This experimental capability is available through the API, with companies like Asana, Canva, DoorDash, and Replit exploring its potential to automate complex, multi-step tasks.

Similarly, in December 2024, Google unveiled its AI agent, Project Mariner. Project Mariner is a research prototype by Google DeepMind that integrates strong multimodal understanding with reasoning to automate tasks via the browser. It uses the Gemini 2.0 model to interpret complex instructions, interact with websites, and automate tasks like data entry. It also offers real-time navigation and provides transparency into its decision-making process. Now Project Mariner is undergoing testing by a group of selected users (you can sign up for the waiting list here).

Operator, now publicly available for all GPT Pro subscribers, demonstrates leading performance across several benchmarks designed to evaluate AI agents' proficiency in interacting with digital interfaces:

(CUA) OSWorld Benchmark: CUA achieved a 38.1% success rate in performing complex tasks involving operating system navigation and file management. It surpasses previous AI models but remains below human performance at 72.4%.
WebArena Benchmark: CUA recorded a 58.1% success rate in navigating simulated offline websites, such as e-commerce platforms and content management systems, to complete real-world tasks.
WebVoyager Benchmark: CUA achieved an 87.0% success rate on this benchmark, which involves interacting with live websites like Amazon and Google Maps.

These results indicate that CUA sets a new standard in both computer use and browser use benchmarks. However, while CUA outperforms previous AI models in these benchmarks, there is still a performance gap compared to human capabilities, particularly in complex computer-use scenarios.

The Security Challenges Behind Operator's Delayed Release

OpenAI has been delaying the development of an AI agent compared to rivals like Google and Anthropic, because the company spent a long time working on the security of their agent. In particular, the main problem was the so-called prompt injection attacks, when malicious sites steal users’ data through the agent. Users may be unaware because they cannot control what data is absorbed from websites or their computers. Given ChatGPT's widespread use, it was crucial that OpenAI addressed these vulnerabilities and felt confident that Operator was secure enough to release as a research preview.

So, OpenAI took the time to design Operator with strong security features. Operator uses tools to spot suspicious behavior, pause actions if needed, and continuously update its safeguards to keep users safe. To prevent prompt injection attacks, OpenAI uses an approach when the agent, in fact, does not get access to your computer, but it deploys a virtual machine right inside the chat, and all actions take place there.

Collaborations

The agent already works with popular services like Tripadvisor, Booking.com, DoorDash, eBay, Instacart, StubHub, OpenTable, Priceline, Uber, Hipcamp, and several other companies that act as early contributors to the agent development. Thereby, OpenAI also aims to ensure that Operator respects these businesses' terms of service agreements.

Limitations

In this research preview stage, Operator has a number of limitations: it cannot perform complex tasks, has reasonable rate limits (such as limits on the number of concurrent tasks and minutes of usage), and requires user supervision for particularly sensitive actions (e.g., conducting financial transactions). At this release stage, Operator also refuses to perform tasks like sending emails (even though the CUA is capable of this) and deleting calendar events. Besides, currently, Operator cannot reliably handle many complex or specialized tasks, such as managing calendars, creating slideshows, or interacting with highly customized or non-standard web interfaces. It may also get “stuck” if it runs into a password field or CAPTCHA check and will ask the user to take over when such a situation occurs. As Operator evolves, its capabilities will expand, but for now, these limitations help ensure accuracy, reliability, and safety in its operation.

Future Concerns

The future of consumer brand marketing may shift as AI tools like OpenAI's Operator take decisions on their own and choose where to search for services, bypassing traditional ads or platforms. With AI agents directing traffic, brands may need to explore partnerships with AI companies and focus on optimizing interfaces for AI accessibility. The way consumers interact with brands could change significantly, and adapting to these new dynamics will be essential for marketers.

Final thought

Operator is not just another powerful release of OpenAI; it is, in fact, a transition to a new stage of AI development. AI agents like Operator have the potential to reshape how we interact with the digital world, from personal task automation to enterprise-level efficiency. Of course, OpenAI’s Operator is still not at the human level and is currently in a research preview stage, but it is already a very significant achievement, paving the way for more advanced AI-driven workflows. As such systems evolve, they can and will impact industries like travel, e-commerce, healthcare, finance, and customer service, so businesses and consumers alike will need to adapt.

Charlie Lambropoulos

02/17/2025

What is OpenAI’ AI agent Operator?

What is OpenAI’s AI agent Operator?

Recommended articles

Enter your information to start building!