OpenAI's Strawberry o1 Surpasses PhD-Level Intelligence

2024-09-13 09:10

OpenAI의 새로운 AI 모델 Strawberry o1, 박사 수준을 뛰어넘다 — Image source: Unblock Media

- OpenAI Strawberry o1: Innovation in AI Reasoning Capabilities - Outstanding Performance in Solving Complex Problems OpenAI recently unveiled the OpenAI Strawberry o1 model, once again expanding the limits of AI development. This large language model has significantly enhanced critical thinking abilities through reinforcement learning. OpenAI o1 has overcome the limitations of GPT-4o, incorporating advanced ai reasoning capabilities through reinforcement learning. This model has a unique feature known as "chain of thought," which enables it to solve complex reasoning tasks step by step. This has demonstrated advanced capabilities in various programming contests, including program coding and solving math challenges. OpenAI o1 has shown outstanding benchmark performance across various benchmarks. It performed on par with the top 500 mathematics students in the USA Math Olympiad preliminary (AIME). While GPT-4o solved 12% of the problems, OpenAI o1 achieved a 74% success rate, and through consensus among multiple samples, it reached 93% accuracy. Additionally, it outperformed human experts in the GPQA Diamond benchmark, which involves scientific reasoning across physics, biology, and chemistry problems. The model's thought processes and "chain of thought" feature means that it can go through a step-by-step thinking process to find the optimal solution to complex problems. This allows the model to self-correct and try various strategies to ultimately arrive at more accurate solutions. This approach contrasts with traditional large language models (LLMs) that generate immediate responses. Its programming performance is also exceptional. OpenAI o1 achieved an Elo rating of 1807 in coding competition environments, surpassing 93% of human competitors. This shows a significant improvement compared to GPT-4o's rating of 808. Notably, OpenAI o1 demonstrated more than twice the performance of existing models in computer science-related tasks. OpenAI o1 also includes various improvements in AI safety. OpenAI compared and evaluated responses to different prompts from OpenAI o1-preview and GPT-4o, discovering that human evaluators tended to prefer the responses of OpenAI o1-preview when it came to natural language processing(NLP), analyzing data, coding, or solving math challenges. The features of OpenAI o1, which improve the reasoning system and problem-solving approach through reinforcement learning, mark an important advancement that opens a new era of critical thinking and performance enhancement in AI. This model will play a significant role in fields demanding critical thinking, such as science and engineering. In conclusion, OpenAI o1 will establish itself as a crucial tool in fields requiring long-term advanced technical tasks. This highly sophisticated model is expected to make a significant contribution to developing AI systems that align with human values and principles.