Following ChatGPT’s full integration with Apple’s ecosystem yesterday, OpenAI has unveiled another major update.
Today, ChatGPT introduced video calling and screen sharing capabilities, along with a special holiday voice feature called “Santa Mode”.
This means ChatGPT isn’t just articulate anymore – it can now “see the world.” When faced with uncertainty, you can simply make a “video call” to ChatGPT, and it might be better equipped to help solve your problems.
These features will roll out to all Team users and most Plus and Pro subscribers over the next week. Paid users in the EU will need to wait a bit longer.
ChatGPT, which supports over 50 languages, will be able to understand visual scenes in real-time, help solve problems, and even act as an AI tutor to help you master new skills.
During this similarly brief 20-minute presentation, OpenAI’s Chief Product Officer Kevin Weil, along with Jackie Shannon, Michelle Qin, and Rowan Zellers, demonstrated what this newly “sighted” ChatGPT can do.
For instance, if you’ve just gotten a pour-over coffee setup but don’t know where to start, you might want to video call ChatGPT.
It can guide you through each step based on the equipment in front of you, from placing the filter paper to pouring hot water and adding ground coffee – a complete hands-on tutorial.
Stuck? Just ask Teacher GPT. This AI instructor not only answers every question but occasionally offers encouraging words, maximizing emotional value.
Beyond real-time video guidance, ChatGPT also supports screen sharing. Users can simply click the advanced voice mode icon in the bottom right corner and select screen sharing from the dropdown menu to receive targeted assistance.
When “seeing” someone in a Santa Claus costume joking about whether they qualify for a mall Santa position, GPT teacher offers appropriate wording suggestions and emotionally intelligent encouragement.
“Hey Kevin, your Santa costume really brings the holiday spirit. Keep practicing your ‘Ho Ho Ho,’ and you might soon become the mall’s Santa!”
In fact, OpenAI President Greg Brockman recently took an anatomy knowledge quiz with Anderson Cooper using the vision-enabled ChatGPT.
When Cooper drew body parts on the blackboard, ChatGPT could instantly “understand” what he had drawn.
“The position is very accurate, with the brain right there in the head. As for the shape, it’s a good start, but the brain is more oval-shaped.” ChatGPT could even sing the triangle area formula with a British accent.
However, ChatGPT later showed noticeable flaws when handling geometric problems, failing to spot a simple labeling error, indicating room for improvement in understanding plane geometry.
To celebrate the Christmas season, OpenAI has also launched a special “Santa” voice preset. Users can chat with Santa ChatGPT by clicking the snowflake icon on the main screen.
For example, you can ask Santa to tell a story.
The “Santa’s” signature “Ho Ho Ho~” is quite catchy and really sets the holiday mood.
During the live event, the host asked this “Santa” several questions, including but not limited to favorite Christmas traditions and favorite reindeer.
Interestingly, when Kevin Weil put on a Santa beard and asked for grooming tips, ChatGPT responded in Santa’s voice:
“Friend, that’s the most magnificent beard I’ve ever seen.”
This voice feature launches today, and to ensure every user can fully experience this holiday easter egg, OpenAI will reset users’ advanced voice usage quota. Even after using up the quota, users can continue interacting with “Santa” through the standard voice mode.
Perhaps due to the extended series of launches, user comments have focused on this Santa voice feature. The most typical example comes from X user @khoomeik.
Google Deepmind research scientist Jonas Adler directly challenged OpenAI:
“OpenAI’s ability to quickly respond to our product releases, seemingly always at the same time, is remarkable. However, I’m not impressed with their Santa mode as a response to Gemini 2.0, as it seems to lack the importance and seriousness that matches Gemini 2.0.”
Notably, yesterday, Google beat them to the punch by launching an AI product with visual understanding capabilities, which can comprehend and analyze users’ real-world scenarios, receiving widespread positive feedback from users.
Today, OpenAI followed suit, giving ChatGPT “eyes,” marking ChatGPT’s evolution from a relatively single-modal system to “visual-language multimodal understanding.”
In other words, ChatGPT will no longer be limited to user-input text instructions and information but can understand users’ contextual environment through vision, including computer screen pages, mobile camera captures, and even real-time images from other external devices.
Half a century ago, scientists at Xerox PARC Laboratory dreamed of a computer that could understand human behavior. Today, AI development is turning this dream into reality beyond the display screen.
From paper to keyboard, from binary to natural language, humans have been simplifying their communication with machines. ChatGPT’s visual capabilities show us the ultimate answer: enabling machines to “see” the world as humans do.
Considering that Altman and former Apple Chief Design Officer Jony Ive are developing smart AI hardware devices, I’m more excited to see this feature appear on their new hardware.
The moment AI opens its eyes, it finally enters the world as humans see it.