ChatGPT maker OpenAI has released the next major release: a model that is generative AI design code-named Strawberry Officially dubbed OpenAI O1.
To put it in more detail it is a set of models. Two are available for download on Thursdays in ChatGPT and through OpenAI’s API: O1-preview and the smaller o1-mini model, which is a more efficient model targeted at creating code.
You’ll need to subscribe to ChatGPT Plus or Team to view o1 in ChatGPT. ChatGPT client. Enterprise and educational users will have access in the next week.
It is important to note that the chatbot experience is essentially bare at the moment. In contrast, GPT-4o which is o1’s predecessor isn’t able to browse the web or analyse files at this time. It does come with tools for analyzing images, but these have been disabled in the absence of further tests. O1 has a rate limit; the limits for each week are 30 messages for preview and 50 for O1-mini.
Another disadvantage is that o1 is costly. It is very expensive. In the API version, o1-preview costs $15 per one million input tokens and $60 per million output tokens. It’s three times the cost of GPT-4o input, and 4x the output price. (“Tokens” are bits of raw data. 1 million equals approximately seven50,000 words.)
OpenAI promises to offer o1-mini access for all ChatGPT users that are free ChatGPT however, it hasn’t yet set the date of release. We’ll be able to hold them to it.
Chain of Reasoning
OpenAI o1 stays clear of some of the logic pitfalls that typically plague generative models. AI models, because it can effectively verify itself by taking more time to consider every aspect of a query. What makes O1 “feel” qualitatively different from other models that are generative AI models, is the capability to “think” before responding to queries, as per OpenAI.
If given the chance in which to “think,” o1 can analyze a task holistically taking a plan of action and then executing an array of actions over a long time frame that aids the model come to the answer. This makes o1 ideally suited to tasks that require the results of several subtasks like identifying privileged emails that are in the inbox of an attorney or coming up with a new product marketing strategy.
Through a string of blog posts on the X platform this Thursday Noam Brown, who is a researcher at OpenAI has stated that “o1 is trained with reinforcement learning.” This helps this system “to ‘think’ before responding via a private chain of thought” by rewarding when it can answer correctly and penalizations for failure the way, he added.
Brown spoke of the fact that OpenAI made use of a brand-new optimization algorithm as well as a training dataset that contained “reasoning data” and scientific research specifically tailored to reasoning tasks. “The longer [o1] thinks, the better it does,” Brown stated.
TechCrunch was not given the chance to test o1 before its launch; we’ll have access to the software as quickly as we can. According to a source that had access to have access to the software -the VP at Thomson Reuters, Pablo Arredondo, VP at Thomson Reuters — o1 is superior to previous OpenAI models (e.g. GPT-4o) for things like studying legal briefs and finding solutions to issues in Logic games for LSAT.
“We saw it tackling more substantive, multi-faceted, analysis,” Arredondo said to TechCrunch. “Our automated testing also showed gains against a wide range of simple tasks.”
In a test to qualify that was used to qualify for an exam to qualify for the International Mathematical Olympiad (IMO) which is an event for high school students the o1 team was able to correctly solve 83% of the problems, while GPT-4o was able to solve only 13% of the problems, according to OpenAI. (That’s not as remarkable when you consider that Google DeepMind’s latest AI was awarded an award of silver in an event similar to the IMO competition.) OpenAI claims that o1 ranked in the 89th percentile among participants which is higher than DeepMind’s flagship program AlphaCode 2. For what it’s worth the online programming competition rounds are also known as Codeforces.
In general, o1 will excel in that involve data analysis sciences, coding, and data analysis, OpenAI says. (GitHub has evaluated o1 using its AI code aid GitHub Copilot, which reports that the algorithm is skilled in optimizing algorithms and application code.) In addition, according to OpenAI’s benchmarking data, O1 has improved over GPT-4o in its multilingual abilities, specifically in languages such as Arabic in addition to Korean.
Ethan Mollick, a professor of management at Wharton, wrote his experiences with O1 after using it for about a month in a blog post in his blog. When he was faced with a difficult crossword puzzle O1 performed well according to him that he got all the answers right (despite being able to imagine a new clue).
Related: 7 Best Winston AI detection tool that Can Bypass Paragraph Rewriter
OpenAI O1 isn’t Perfect
There are a few disadvantages.
OpenAI can slow down compared to other models dependent on the queries. Arredondo claims that o1 could take about 10 seconds to respond to certain questions. It also shows its progress by showing an indicator for the subtask that it is currently working on.
Because of the unpredictable nature of the generative AI models, it is likely that o1 also has flaws and weaknesses. Brown acknowledged that o1 can trip into games of tic-tac-toe at times For instance. In a Technical paper, OpenAI said that it’s received anecdotal feedback from users that o1 tends to confuse (i.e. it can confidently create things out of) more than GPT-4o and more often than not admitting that it doesn’t know the answer to the question.
“Errors and hallucinations still happen [with o1],” Mollick writes in his blog. “It still isn’t flawless.”
We’ll certainly discover more about the many issues over time when we get an opportunity to put our O1 through the test.
The Fierce Rivalry
It would be disingenuous if did not mention that OpenAI isn’t the only AI vendor that’s investigating these kinds of reasoning techniques to increase the accuracy of models.
Google DeepMind researchers recently published the results of a study that found that by providing models with more time to compute and direction to meet requests as they are made in real-time, the performance of these models can be greatly improved without additional adjustments.
In a way, it illustrates the ferocity of competition, OpenAI stated that it opted against showing the rough “chains of thoughts” in ChatGPT partly because of “competitive advantage.” (Instead, the company decided to display “model-generated summaries” of the chains.)
OpenAI could be the first to come out with O1. However, if competitors adopt similar designs the real test for the company will be to make o1 more widely accessible, and at a price lower prices.
Then, we’ll be able to be able to see how quickly OpenAI can release improved versions of O1. The company has stated that it wants to test models of o1 that can reason for days, hours, or even weeks to improve their reasoning abilities.
Read more: