By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
Online Tech Guru
  • News
  • PC/Windows
  • Mobile
  • Apps
  • Gadgets
  • More
    • Gaming
    • Accessories
    • Editor’s Choice
    • Press Release
Reading: Meet The AI Agent With Multiple Personalities
Best Deal
Font ResizerAa
Online Tech GuruOnline Tech Guru
  • News
  • Mobile
  • PC/Windows
  • Gaming
  • Apps
  • Gadgets
  • Accessories
Search
  • News
  • PC/Windows
  • Mobile
  • Apps
  • Gadgets
  • More
    • Gaming
    • Accessories
    • Editor’s Choice
    • Press Release

Vivo V50 Elite Edition India Launch Date Leaked; Design Said to Differ From Vivo V50 Model

News Room News Room 9 May 2025
FacebookLike
InstagramFollow
YoutubeSubscribe
TiktokFollow
  • Subscribe
  • Privacy Policy
  • Contact
  • Terms of Use
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
Online Tech Guru > News > Meet The AI Agent With Multiple Personalities
News

Meet The AI Agent With Multiple Personalities

News Room
Last updated: 16 April 2025 17:52
By News Room 4 Min Read
Share
SHARE

In the coming years, agents are widely expected to take over more and more chores on behalf of humans, including using computers and smartphones. For now, though, they’re too error prone to be much use.

A new agent called S2, created by the startup Simular AI, combines frontier models with models specialized for using computers. The agent achieves state-of-the-art performance on tasks like using apps and manipulating files—and suggests that turning to different models in different situations may help agents advance.

“Computer-using agents are different from large language models and different from coding,” says Ang Li, cofounder and CEO of Simular. “It’s a different type of problem.”

In Simular’s approach, a powerful general-purpose AI model, like OpenAI’s GPT-4o or Anthropic’s Claude 3.7, is used to reason about how best to complete the task at hand—while smaller open source models step in for tasks like interpreting web pages.

Li, who was a researcher at Google DeepMind before founding Simular in 2023, explains that large language models excel at planning but aren’t as good at recognizing the elements of a graphical user interface.

S2 is designed to learn from experience with an external memory module that records actions and user feedback and uses those recordings to improve future actions.

On particularly complex tasks, S2 performs better than any other model on OSWorld, a benchmark that measures an agent’s ability to use a computer operating system.

For example, S2 can complete 34.5 percent of tasks that involve 50 steps, beating OpenAI’s Operator, which can complete 32 percent. Similarly, S2 scores 50 percent on AndroidWorld, a benchmark for smartphone-using agents, while the next best agent scores 46 percent.

Victor Zhong, a computer scientist at the University of Waterloo in Canada and one of the creators of OSWorld, believes that future big AI models may incorporate training data that helps them understand the visual world and make sense of graphical user interfaces.

“This will help agents navigate GUIs with much higher precision,” Zhong says. “I think in the meantime, before such fundamental breakthroughs, state-of-the-art systems will resemble Simular in that they combine multiple models to patch the limitations of single models.”

To prepare for this column, I used Simular to book flights and scour Amazon for deals, and it seemed better than some of the open source agents I tried last year, including AutoGen and vimGPT.

But even the smartest AI agents are, it seems, still troubled by edge cases and occasionally exhibit odd behavior. In one instance, when I asked S2 to help find contact information for the researchers behind OSWorld, the agent got stuck in a loop hopping between the project page and the login for OSWorld’s Discord.

OSWorld’s benchmarks show why agents remain more hype than reality for now. While humans can complete 72 percent of OSWorld tasks, agents are foiled 38 percent of the time on complex tasks. That said, when the benchmark was introduced in April 2024, the best agent could complete only 12 percent of the tasks.

Share This Article
Facebook Twitter Copy Link
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Spotify’s iPhone app could soon sell audiobooks with links, too

News Room News Room 9 May 2025
FacebookLike
InstagramFollow
YoutubeSubscribe
TiktokFollow

Trending

Clair Obscur: Expedition 33 Patch 1.2.3 Is a Big One, Nerfs Maelle’s Game-Breaking Stendahl Build

Clair Obscur: Expedition 33 developer Sandfall Interactive has released Patch 1.2.3 for its well-received role-playing…

9 May 2025

OpenAI Said to Be Working on Weekly and Lifetime ChatGPT Subscription Plans

OpenAI might be working on introducing more duration-based subscription plans. A tipster shared strings of…

9 May 2025

Samsung’s Tri-Fold Phone Tipped to Use Silicon-Carbon Battery; Could Share Features With Galaxy Z Fold 7

Samsung is gearing up to unveil its first tri-fold phone, possibly called the Samsung G…

9 May 2025
News

The Best Mac Accessories to Amplify Your Workstation

More Good AccessoriesPhotograph: AmazonThe list above has been carefully curated to include our favorites. But we test so many gadgets! Here are more accessories worth exploring.Anker MagGo Magnetic Charging Station…

News Room 9 May 2025

Your may also like!

News

Review: Therm-a-Rest NeoLoft Sleeping Pad

News Room 9 May 2025
News

X notifications are broken | The Verge

News Room 9 May 2025
PC/Windows

Lenovo Legion 9i With Intel Core Ultra 9 Chip, Up to GeForce RTX 5090 Laptop GPU Announced

News Room 9 May 2025
Gaming

GTA 6’s delay won’t change the launch calculus for most games | Opinion

News Room 9 May 2025

Our website stores cookies on your computer. They allow us to remember you and help personalize your experience with our site.

Read our privacy policy for more information.

Quick Links

  • Subscribe
  • Privacy Policy
  • Contact
  • Terms of Use
Advertise with us

Socials

Follow US
Welcome Back!

Sign in to your account

Lost your password?