In 2024, “Agent” has become the hottest keyword in the AI world.
From OpenAI’s simple GPTs to Anthropic’s autonomous computer use capabilities, and recently AI startup /dev/agents securing a $500 million valuation with their Agent operating system, companies are all seeking to identify AI’s next concrete application.
In China, Zhipu AI presented their own solution last month – AutoGLM.
If taking out your phone, opening an app, clicking search, and entering keywords usually takes four or five steps, with AI, these operations have truly become a matter of just one command.
Today, Zhipu AI launched a series of multi-terminal Agent products in Beijing.
Users only need to input commands, and GLM can understand instructions, plan tasks, recognize interface elements like windows, graphics, and text, and perform automated operations – as if entering an era where AI takes control of devices.
At the Agent OpenDay event, Zhipu AI CEO Zhang Peng used AutoGLM to create a WeChat group live and sent WeChat red packets to hundreds of attendees, both on-site and online through digital codes. If you managed to grab a red packet, feel free to share your joy in the comments.
- AutoGLM: Mobile platform (currently open to Android), capable of executing complex operations over 50 steps, suitable for price comparison, navigation, social media engagement, etc.
- GLM-PC: Desktop platform (currently open to Mac systems), designed as a productivity tool to free up office workers’ hands, with remote computer control via mobile
- AutoGLM-Web: Web platform, supporting autonomous navigation of dozens of websites including Baidu Search, Zhihu, Github, etc.
The most vivid illustration is the transition from Chat to Act – AI is everywhere, but so are Agents. In other words, from “thinking for us” to “doing for us,” Agents are redefining smart devices.
Experience links:
- AutoGLM: https://agent.aminer.cn/
- GLM-PC: https://cogagent.aminer.cn/home
- AutoGLM-Web: https://new-front.chatglm.cn/webagent/landing/index.html?channel=ads_news_openday
While Other AIs Chat, These AIs Help Me Work Smarter
AI Helps Me Work Smarter? AutoGLM Lets Me Browse Social Media and Buy Coffee While Relaxing
In our previous article, we experienced how Zhipu’s AutoGLM can take control of our phones.
From automatically sending WeChat messages to browsing Taobao… things we used to do manually are now handled entirely by AutoGLM. Plus, this AI worker has upgraded today, with significantly improved capabilities.
We got early access to try these latest AI tools.
[For example, when Yuval Noah Harari, author of “Sapiens,” recently published his latest work “Homo Deus,” why not let AI help me browse Pinduoduo and Taobao to see which platform offers better value.]
I only need to speak, and AI does all the legwork, though there was a minor issue with mixing up book titles if you look carefully.
Don’t worry if you’re in a noisy environment where speaking isn’t convenient.
AutoGLM also comes with a “silent mode” where you can type commands, and before executing tasks, AutoGLM gives users a 3-second “regret window” to cancel or adjust the execution.
There’s good news for fan community members – the newly upgraded AutoGLM can even handle super topic check-ins.
Taking Li Xingliang’s super topic check-in as an example, you only need to input your command to the AutoGLM floating window, and AI will handle the entire process. You only need to “show up” when sensitive information is involved, saying goodbye to the anxiety of missing check-ins with just one click.
Plus, these daily tasks can be set as shortcuts for one-click execution.
Don’t underestimate this feature – for the chosen office workers, the afternoon scheduled coffee order is a true “life-saving device.” No need for daily repeated settings; just save the coffee ordering command to save considerable effort.
Choose the random mode, and AI will make all decisions for you, opening a coffee mystery box, but when it comes to sending, ordering, and payment operations, AutoGLM will actively return control to you.
Cross-application collaboration is a major highlight of this upgrade.
Apple’s AI has shown us the importance of system-level AI breaking down application walls, and now with AutoGLM, we can achieve similar effects. For example, I can have AI search for a garlic vegetable hearts recipe on Xiaohongshu and successfully share it to Moments.
The newly added AI navigation feature is also practical. Want to go to Canton Tower? Just tell AutoGLM, and AI will arrange everything clearly for you.
Unfortunately, AutoGLM only supports Android systems.
However, Zhipu is now opening up AutoGLM’s user beta test spots and will further optimize features and user experience, expecting to launch soon as a truly open product for general C-end users.
Zhipu Qingyan’s plugin AutoGLM-Web has also launched AutoGLM functionality starting today.
According to reports, AutoGLM-Web supports autonomous navigation of dozens of websites including Baidu Search, Weibo, Zhihu, Github, etc.
In the official demo, AutoGLM-Web automatically completed “searching for Mango TV on Baidu, opening ‘Goodbye My Love’, playing the latest episode, and sending bullet comments” – all without user intervention.
From Mobile to Desktop, Letting AI Be My Office Assistant
Compared to AutoGLM, GLM-PC provides more workplace-oriented functionality on the desktop.
GLM-PC is currently designed for Mac computers with M-series chips, with M1 and M3 series devices being most recommended. Enter your desired operation in the dialog box, and GLM-PC will evaluate tools and decide on an operation plan.
Of course, when encountering sensitive operations, GLM-PC will automatically pause, waiting for user operation or confirmation.
Want to know what’s trending on Bilibili? GLM-PC quickly helps you find the first “must-watch” entry, saving you aimless browsing time.
Need to schedule a meeting with Zhang San? Sending a WeChat message is handled by AI. Even with page overlays, it can precisely locate WeChat’s search box.
It can help you book Tencent Meetings and send meeting invitations to participants. It’s recommended to save this set of “operation recipes” to improve work efficiency through process optimization.
As an editor, my favorite feature is having it help me organize overseas AI news. After giving the command, AI opens the browser, enters the URL, and delivers a clear news summary.
Also, if you’re new to Mac after switching from Windows, you might find yourself fumbling with system changes.
Now GLM-PC is your “lifesaver” – whether adjusting display modes or other settings, leave all your requests to it. Letting AI handle the troublesome tasks while keeping the joy for yourself is the correct way to be a life winner.
GLM-PC has another feature that could be called a “game-changer.”
First, enable “suspension mode” in GLM-PC settings, then log in to “https://cogagent.aminer.cn/m” on your phone with a verification code, and your phone can even remotely control your computer.
Specifically, you can remotely send command messages to GLM-PC, letting it perform computer operations. GLM-PC returns a screenshot after each step, and for sensitive operations, it waits for user confirmation before proceeding.
In the live demonstration, Zhang Peng also successfully sent files through WeChat on the computer by issuing commands to cogagent through the GLM-PC webpage on his phone.
In fact, when AI begins to truly “work” rather than just “chat,” it marks AI applications entering a “down-to-earth” practical stage. You could say that when AI really starts solving daily tasks, it transforms from a toy into a real productivity tool.
This might be what AI technology should really look like.
Phone Use Moment
In these two months, the mobile phone industry has intensively launched multiple new products, with one trend worth noting: although AI phones haven’t gained widespread consumer recognition, system-level AI has become a highlight feature of various manufacturers’ OS, which is actually a sign of Agent’s landing and popularization.
Whether it’s vivo’s Blue Heart Little V and the “Phone GPT” demonstrated at the launch event that can order meals with AI, Huawei HarmonyOS’s Xiao Yi and intent framework, or Honor’s YOYO intelligent agent, they share the same essence as Zhipu’s Agent released today:
Letting AI mimic the human Plan-Do-Check-Act cycle to operate devices like humans do.
As Zhipu AI CEO Zhang Peng mentioned at today’s launch event, current Agent capabilities are more like adding an intelligent scheduling layer between users and applications, connecting all applications and even all devices.
This can be seen as an embryonic form of the general-purpose LLM-OS (Large Language Model Operating System). Zhipu calls this Agent interaction the construction of GLM-OS, which will greatly impact human-computer interaction forms.
OpenAI founding member and AI technology expert Andrej Karpathy has also discussed LLM OS multiple times, believing that large models are, in a sense, a new type of computer and operating system that can connect various software and hardware, as well as peripheral devices composed of all modal information, and execute various tasks through function calls.
In traditional operating systems, you need to build a bunch of peripherals around the CPU, such as mouse and keyboard, disk storage, and cache space.
In LLM OS, the large model itself is the central processor. I/O peripherals are no longer just mouse and keyboard, as LLM can accommodate more modal data input and output. Meanwhile, the external tools called by large models will upgrade from traditional software to intelligent agent tools.
Cross-application operation is a crucial component, meaning Agents can achieve more complex autonomous continuous operations and possibly move towards true commercial implementation.
At the beginning of the year, we predicted that large models would become the new operating system for smartphones, with Natural User Interface (NUI) gradually replacing the existing Graphical User Interface (GUI).
Whether services provided by various internet companies can be integrated might be the biggest obstacle to achieving this type of interaction in the future. But both smartphones and applications will eventually be stage products in human development history.
Current Agent interaction is still in its early stages. Against the backdrop of Scaling Law hitting bottlenecks in the industry, how can Agents become true productivity tools and take on a higher proportion of work decisions?
Zhipu AutoGLM technical leader Liu Xiao told APPSO that pre-training will definitely continue, but there will be a new logic for algorithm and data training.
Zhipu AI CEO Zhang Peng also told APPSO that the team is relatively optimistic about the space for Scaling Laws and hopes to explore more possibilities under new paradigms and ecosystems.
This year, many manufacturers have unanimously used autonomous driving to describe the intelligence level of AI terminals, with OpenAI also dividing AI into five levels from L1 to L5.
Different from OpenAI, Zhipu defines the five stages of large model development as: L1 Language Ability, L2 Logic Ability (multimodal capability), L3 Tool Use Ability, L4 Self-Learning Ability, L5 Scientific Law Exploration.
Zhang Peng believes that large models have initially acquired some of humans’ abilities to interact with the physical world. “Agent will greatly enhance L3 tool use ability while initiating exploration of L4 self-learning ability.”
From Phone Use, Computer Use, Car Use to All Device Use, large models’ thinking ability and Agent interaction are gradually influencing how we use smart devices.
Having AI send WeChat messages and likes may seem of limited practical significance now, but just as AlphaGo’s chess prowess wouldn’t significantly impact society, Google DeepMind’s AlphaFold can predict the structure of almost all proteins, helping treat and research numerous diseases.
The paradigm shift behind it is the lever that drives updates to human lifestyles. Agent moves large models from Chat to Act, gradually revealing the so-called AI terminal form, rather than just a change in naming.