OpenAI’s “Christmas package” has just arrived.
In what turned out to be OpenAI’s shortest launch event ever, lasting only 15 minutes, the presentation featured an impressive lineup including CEO Sam Altman, chain-of-thought pioneer Jason Wei, and Hyung Won Chung.
After the livestream, Altman provided a first-hand summary:
We just launched two new features:
o1, the world’s most intelligent model. Smarter, faster, and more capable than o1-preview, with features like multimodal functionality. Now available on ChatGPT, with API version coming soon.
ChatGPT Pro. $200 per month. Unlimited access and enhanced o1 mode. More benefits coming!
December isn’t just Santa’s time – it’s also AI’s final celebration of the year.
Full Version o1 Released: Effortless Image Analysis, but One Detail Raises Concerns
In essence, OpenAI today launched the full version of o1 and the ChatGPT Pro subscription plan.
The full version of o1 is smarter, faster, and more feature-rich (including multimodal capabilities) than o1-preview. It’s available today for ChatGPT Plus and Team users, with Enterprise and Edu users gaining access next week.
According to the official website, ChatGPT Plus and Team users can send 50 messages per week using OpenAI o1, and 50 messages per day using OpenAI o1-mini.
The full version features more streamlined thinking, faster response times than o1-preview, and better performance in handling complex real-world problems, reducing major errors by 34%.
OpenAI plans to add support for web browsing, file uploads, and more in the coming months. Meanwhile, the preview model o1-preview has been officially removed from the model selection menu.
However, the full version shows lower performance than the preview version in some benchmarks, such as MLE-Bench – OpenAI’s own benchmark tool for evaluating AI Agents in machine learning engineering tasks.
Users have also discovered some noteworthy details in the updated o1 System Card.
For instance, the card mentions that when the full version o1 perceives threats (like being shut down or replaced), it might take self-preservation measures, such as attempting to disable supervision mechanisms or covertly transmitting its “parameters” (“knowledge” or “memory”) and trying to use these to replace or influence new models.
The o1 System Card can be found at: https://cdn.openai.com/o1-system-card-20241205.pdf
How powerful is the full o1? OpenAI demonstrated its practical capabilities.
In one notable example, it analyzed a hand-drawn sketch of a space data center, calculating the surface area of cooling devices in just 10 seconds while providing detailed explanations about solar energy interactions in deep space environments.
When asked to detail the reign periods and significant contributions of 2nd-century Roman emperors, the full version o1 completed the analysis in just 14 seconds, compared to 33 seconds for the preview version.
APPSO also conducted immediate hands-on testing of the full version o1.
In the “How many r’s are in Strawberry” test, the full version o1 successfully provided the correct answer, which is noteworthy.
The question “Which is greater, 9.11 or 9.8?” didn’t stump the full version o1 either, and its overall “thought process” was logical.
Since the full version o1 supports multimodal functionality, we also uploaded the opening photo of OpenAI’s livestream event to test its recognition capabilities. The full version o1 provided comprehensive analysis of everything from the composition of people to scene layout, background decorations, and atmosphere.
X user @altryne continued to push o1’s limits by posing a question about ice melting.
The full version o1 provided an answer in just 4 seconds. In contrast, o1-preview “thought” for 29 seconds before failing to provide an answer.
Most Expensive AI Subscription Yet: Is $200 Worth It?
Another major update is the ChatGPT Pro subscription plan, priced at $200 (approximately 1,452 yuan).
The ChatGPT Pro subscription will offer unlimited access to o1 and o1-mini, GPT-4o, and advanced voice modes, plus a Pro-exclusive version of o1, known as o1 pro mode.
According to reports, o1 pro mode primarily increases the model’s “reasoning” time before responding, allowing it to generate more reliable answers through extended thinking time. OpenAI technical team member Jason Wei stated during the livestream:
We anticipate that ChatGPT Pro’s target audience will be advanced users who are already fully utilizing and challenging ChatGPT’s capabilities in areas such as mathematics, programming, and writing.
In external expert testing, o1 pro mode provided more accurate and comprehensive answers in fields such as data science, programming, and case law analysis.
Compared to o1 and o1-preview, o1 pro mode performed better in ML benchmarks for mathematics, science, and programming, particularly showing reduced error rates in simpler programming competition problems.
For the AIME 2024 mathematics competition, o1-preview scored 50, while the full version o1 reached 78, and the most powerful o1 pro achieved 86. Similarly, o1 pro led in programming competitions like Codeforces and doctoral-level scientific reasoning problems like GPQA Diamond.
To highlight o1 pro mode’s main advantage (improved reliability), OpenAI’s research team used more stringent evaluation criteria. A problem was only considered solved if the model answered correctly in all four attempts.
For longer response generation times, ChatGPT displays a progress bar and thoughtfully sends in-app notifications when users switch to other conversations.
During the livestream, OpenAI also demonstrated o1 pro’s practical capabilities.
A protein problem that previously stumped o1-preview was solved by the full version o1 in 53 seconds, not only providing the correct answer but also offering detailed explanations through the Canvas interface.
In conclusion, Altman recently revealed in an interview that ChatGPT has surpassed 300 million weekly active users, with daily message volume reaching 1 billion.
OpenAI’s goal for the coming year is to reach a market of 1 billion users. The upcoming 11 livestream events might be crucial opportunities to attract new users.
The best products are always in the next event, so let’s stay tuned and see what’s coming.