"Towards AGI"
Posts
OpenAI’s o3 Benchmark Controversy Sparks Comparisons to Theranos

OpenAI’s o3 Benchmark Controversy Sparks Comparisons to Theranos

Fred
January 21, 2025

A Thought Leadership platform to help the world navigate towards Artificial General Intelligence We are committed to navigate the path towards Artificial General Intelligence (AGI) by building a community of innovators, thinkers, and AI enthusiasts.

Welcome to Gen Matrix: Your Guide to GenAI Innovation and Adoption across Industries

Discover the trailblazers driving AI innovation with Gen Matrix.

Our platform showcases:

Organizations: Industry early adopters integrating Generative AI
Startups: GenAI innovators in hardware, infrastructure, and applications
Leaders: Influential figures shaping the GenAI ecosystem

Why Choose Gen Matrix?

Stay ahead with our comprehensive insights into the evolving Generative AI landscape. Coming Soon: Our inaugural Gen Matrix launches December 2024. Sign up now to access the report!

Click here for the Matrix report.

TheGen.AI News

OpenAI’s o3 Benchmark Controversy Sparks Comparisons to Theranos

The controversy surrounding OpenAI’s o3 benchmark results is drawing comparisons to the Theranos scandal, with allegations of misleading practices. The company claimed record-breaking performance on EpochAI’s FrontierMath benchmark, while reportedly having significant access to the test data and funding its creation.

Tamay Besiroglu, EpochAI’s associate director, admitted the organization was contractually restricted from disclosing OpenAI’s involvement. Furthermore, six mathematicians who contributed to the benchmark revealed they were unaware of OpenAI’s exclusive access.

“We made a mistake in not being more transparent about OpenAI’s involvement,” Besiroglu stated. He disclosed that OpenAI was barred from revealing its funding role and data access until after the o3 model's launch. Besiroglu acknowledged OpenAI’s access to a substantial portion of the FrontierMath dataset but clarified that an “unseen-by-OpenAI hold-out set” was used to verify the model’s performance.

Carina Hong, a Stanford PhD candidate, shared on X (formerly Twitter) that six contributing mathematicians were unaware of OpenAI’s exclusive access and expressed doubts about participating had they known.

In December, OpenAI announced that its o3 model achieved an unprecedented 25% accuracy on the FrontierMath benchmark, a significant leap from the previous high score of just 2%. This benchmark challenges AI models with exceptionally difficult mathematical problems.

However, skepticism arose when a footnote in the updated FrontierMath research paper acknowledged OpenAI’s role in supporting the benchmark. While Besiroglu claimed EpochAI’s benchmarks reduce data contamination by using novel problems, questions persist about the integrity of the process.

OpenAI also claimed the o3 model achieved 90% accuracy on the ARC-AGI benchmark, surpassing human performance. Yet, François Chollet, creator of the benchmark, dismissed these results, stating, “I don’t believe this is AGI—there are still simple ARC-AGI-1 tasks that o3 cannot solve.”

Mikhail Samin, executive director of the AI Governance and Safety Institute, criticized OpenAI’s track record, citing “a history of misleading behavior,” including alleged deception of its board and secret non-disparagement agreements with former employees.

Gary Marcus, a noted AI expert, has also been vocal about his doubts regarding OpenAI’s claims. He pointed out that no independent evaluations have been conducted to test the robustness of o3 across diverse problems.

Despite the controversy, OpenAI CEO Sam Altman has expressed enthusiasm for the upcoming release of o3 mini, set to launch in the coming weeks. As scrutiny intensifies, the fallout from the o3 benchmark debate could shape public and industry perceptions of OpenAI’s transparency and credibility.

Zynap Secures €5.7M to Revolutionize Cybersecurity with Gen AI

Barcelona-based cybersecurity startup Zynap, which leverages Generative AI to tackle cybercrime, has raised €5.7 million in funding. The round was co-led by Kibo Ventures (investors in Onum and Notpla) and Kfund (backers of Skillvue and MiLaboratories), with additional support from key business angels.

The investment will fuel the development of Zynap’s cutting-edge technology and drive its expansion initiatives.

As the cybersecurity landscape evolves with increasing digital threats and the growing complexity of cloud computing, IoT devices, and remote work, organizations are grappling with advanced challenges such as ransomware, social engineering, and state-sponsored attacks. This dynamic has driven rapid growth in the cybersecurity sector.

Zynap addresses these issues by employing Generative AI to proactively simulate cyberattack tactics, enabling organizations to defend against threats before they materialize.

Founded in 2024 by Daniel Solís, former CEO and founder of Blueliv, Zynap leverages his extensive experience in the cybersecurity space. The company also benefits from the guidance of advisors such as Pedro Castillo, founder of Onum and Devo, Spain’s first cybersecurity unicorn.

Zynap is redefining threat intelligence by offering actionable, hard-to-access insights tailored for enterprises and Managed Security Service Providers (MSSPs). Its innovative solutions help organizations stay ahead of evolving threats, providing measurable results and empowering informed decision-making.

By integrating Gen AI with advanced technologies, Zynap claims to deliver precise, proactive cybersecurity solutions, allowing businesses to mitigate risks before cybercriminals can strike. The company is currently developing a trial version of its product, which is expected to launch soon.

“Our mission is ambitious yet straightforward: to outsmart cybercrime. Tackling this challenge requires advanced technology, reliable data, and innovative approaches—not generic solutions,” said Daniel Solís, CEO and founder of Zynap.

Investors have expressed strong confidence in Zynap’s potential. Juan Santamaría, partner at Kibo Ventures, stated, “We’ve known the team for over a decade and are inspired to see their strengthened expertise coalesce around such a crucial product for the cybersecurity industry. We’re proud to support them again.”

Jaime Novoa, partner at Kfund, added, “Zynap’s extraordinary team, clear vision, and ability to address real challenges for enterprises and MSSPs make it a standout investment. We’re excited to be part of their journey.”

Learn how to make AI work for you

AI won’t take your job, but a person using AI might. That’s why 1,000,000+ professionals read The Rundown AI – the free newsletter that keeps you updated on the latest AI news and teaches you how to use it in just 5 minutes a day.

Infosys Expands AI Portfolio with 100+ Gen AI Agents Under Development

Infosys has developed four specialized small language models with 2.5 billion parameters each, focusing on banking, IT operations, cybersecurity, and general enterprise applications. These models incorporate proprietary datasets, as shared by Salil Parekh, CEO and MD of Infosys.

The company is actively creating over 100 new generative AI agents to deploy for client operations. Infosys is collaborating with its generative AI partner ecosystem to co-develop tailored solutions, many of which leverage partner platforms.

During the Q3 2025 earnings call on January 16, 2025, Parekh highlighted several examples of Infosys' AI initiatives:

A generative AI-powered research agent was created for a large technology company, enabling product support teams to generate detailed solutions within seconds.
For an audit agency, Infosys developed three intelligent audit agents to automate complex tasks for a professional services client.

These developments are part of Infosys' broader strategy under Infosys Topaz, its generative AI-powered services and solutions suite, which aims to strengthen enterprise AI capabilities.

Salil Parekh emphasized the growing demand for generative AI, noting its increasing adoption among clients. "While cost optimization remains a priority, we are observing significant investment in emerging growth areas such as AI, cloud adoption, cybersecurity, and data analytics," the company stated in its earnings release.

Addressing the nature of AI-driven work, Parekh explained that client AI programs are diverse, spanning multiple use cases. Unlike traditional technology implementations, AI spending is broad-based and driven by a mix of growth and cost-saving objectives. "As AI becomes more mainstream, we will better understand how clients classify its usage," he noted, adding that for now, AI adoption continues to expand across industries.

L’Oréal Taps Generative AI for Product Innovation and Sustainability Goals

L’Oréal is leveraging generative artificial intelligence to support its global team of 4,000 researchers in developing new products and improving existing ones. This initiative is part of a collaboration with IBM, which L’Oréal’s chief transformation and digital officer for research and innovation, Matthieu Cassier, described as a step forward that builds on the company’s extensive beauty science expertise and data structuring.

Through this partnership, L’Oréal will access AI models designed to accelerate discoveries in chemistry. These models will be trained on the company’s proprietary formulas to enhance customer satisfaction by creating more personalized products. Over the next few years, L’Oréal plans to use AI to revolutionize its formula discovery process, exploring renewable ingredients and optimizing production scalability.

This collaboration aligns with L’Oréal’s sustainability goals, including reducing waste and achieving the use of mostly recycled or renewable materials by 2030.

“This partnership will significantly enhance the speed and scale of our innovation and reformulation processes, ensuring products meet higher standards of inclusivity, sustainability, and personalization,” said Stéphane Ortiz, head of innovation métiers and product development at L’Oréal research and innovation.

Personalization remains a key focus for L’Oréal. The company recently announced a partnership with Korean startup NanoEnTek to introduce a device capable of providing a customized skin analysis in just five minutes. This device will offer insights into how specific ingredients address individual skincare needs and suggest proactive solutions for issues like dark spots and enlarged pores. The Cell BioPrint device will debut in Asia later this year.

“With the Cell BioPrint device, we empower individuals to gain deeper insights into their skin through biomarkers and take proactive steps to enhance their skin’s beauty and longevity,” said Barbara Lavernos, L’Oréal Group deputy CEO in charge of research, innovation, and technology.

TheOpensource.AI News

Chinese Startup Unveils Open-Source AI Model Rivaling OpenAI o1 at a Fraction of the Cost

Chinese AI startup DeepSeek has introduced its latest innovation, DeepSeek-R1, an open-source AI model that matches the performance of OpenAI’s o1 model in tasks like math, programming, and reasoning—while being 90-95% more cost-efficient, according to VentureBeat.

DeepSeek-R1 was launched alongside the publication of its model weights on the Hugging Face platform under an MIT license. This development marks a major milestone in open-source AI, offering a competitive alternative in the global race toward Artificial General Intelligence (AGI).

Based on DeepSeek's earlier V3 model, DeepSeek-R1 has delivered impressive results:

79.8% in the AIME 2024 math competition (compared to 79.2% by o1).
97.3% on the MATH-500 test (slightly higher than o1’s 96.4%).
A 2,029 score on Codeforces, surpassing 96.3% of human programmers.

While OpenAI o1 demonstrated slightly better performance in general knowledge (91.8% vs. DeepSeek-R1's 90.8% on the MMLU test), DeepSeek-R1 excelled in reasoning and programming tasks, showcasing a significant leap for China’s AI sector.

DeepSeek-R1 significantly reduces operational costs:

Incoming tokens: $0.55 per million (compared to $15 with OpenAI o1).
Outgoing tokens: $2.19 per million (compared to $60 with OpenAI o1).

This affordability, paired with comparable performance, positions DeepSeek-R1 as a compelling choice for developers and businesses seeking powerful AI solutions.

The foundation of DeepSeek-R1 lies in its precursor, DeepSeek-R1-Zero, which was trained exclusively through reinforcement learning (RL). This approach enabled the model to independently develop complex reasoning skills, achieving 86.7% accuracy on the AIME 2024 benchmark—on par with OpenAI o1-0912.

Early challenges such as language mixing and poor readability were resolved through a multi-stage process:

Fine-tuning the DeepSeek-V3 base model using starting data.
Reinforcement learning for reasoning tasks.
Retraining with supervised data across multiple domains, including real-world Q&A and self-awareness.

The result is DeepSeek-R1, which combines advanced reasoning, improved readability, and functionality.

The development of DeepSeek-R1 emphasizes transparency, with its training data and methodologies openly shared. This fosters collaboration in AI research and highlights the growing competitiveness of open-source AI models.

DeepSeek also showcased its capabilities on the DeepThink platform, similar to ChatGPT, which offers features like model weights, a code repository, and API integration.

Moreover, DeepSeek’s efforts to create smaller yet highly effective models were demonstrated with the distilled Qwen-1.5B model, which outperformed larger models like GPT-4o and Claude 3.5 Sonnet in math tasks. DeepSeek-R1 represents a significant advancement in open-source AI, providing a competitive and cost-effective alternative in the AI landscape.

Chinese AI Startup MiniMax Launches Open-Source Models to Compete with Global Leaders

Shanghai-based artificial intelligence startup MiniMax has unveiled a new lineup of open-source models, intensifying competition among Chinese technology firms aiming to deliver cost-effective AI solutions comparable to leading U.S. offerings.

On Tuesday, MiniMax introduced its MiniMax-01 large language model (LLM) family, which includes:

MiniMax-Text-01: A general-purpose foundational model.
MiniMax-VL-01: A multimodal model with visual capabilities.

These LLMs, which power text-generating AI tools like ChatGPT, were benchmarked against industry-leading AI models in tasks such as math problem solving, domain knowledge, instruction-following, and minimizing factual errors. MiniMax’s results, shared via its official WeChat account, indicate its models perform on par with top global competitors.

The launch of MiniMax’s advanced AI models comes shortly after Hangzhou-based rival DeepSeek garnered international attention with its open-source V3 model in December. China's rapidly evolving and competitive AI market has prompted both startups and tech giants to release cutting-edge models at an accelerating pace.

On the same day, Hong Kong-listed SenseTime unveiled a new “unified large model” that combines text and image processing with reasoning capabilities. SuperCLUE, a benchmarking platform for Chinese AI models, ranked SenseTime’s new multimodal model as a top performer.

MiniMax’s models also match the performance of closed-source systems, such as Google’s Gemini, Anthropic’s Claude, and OpenAI’s ChatGPT, which traditionally dominate industry benchmarks like Chatbot Arena by UC Berkeley researchers.

Despite technological progress, Chinese AI startups face difficulties in monetization. Major players like ByteDance, the owner of Doubao (China’s most popular chatbot app in December), can afford to offer their AI tools for free, leveraging extensive resources to reach millions of users. Startups, by contrast, must balance innovation and revenue generation to remain sustainable.

MiniMax’s Talkie, a companion app akin to Character.ai, has been a primary source of revenue for the company. However, it was delisted from Apple’s App Store in the U.S. late last year due to unspecified technical issues, though it remains available on Google Play for Android users.

MiniMax’s advancements highlight the growing competitiveness of China’s AI sector, with open-source models increasingly challenging the dominance of closed-source solutions in the global market.

TheClosedsource.AI News

OpenAI’s ChatGPT API Vulnerability Enables Potential DDoS Attacks, Researcher Claims

A cybersecurity researcher has uncovered a vulnerability in OpenAI’s ChatGPT API that could be exploited to launch distributed denial of service (DDoS) attacks on websites. The flaw reportedly allows the chatbot to send thousands of network requests to a website using the ChatGPT crawler. The issue, rated with high severity, remains unresolved, with no updates from OpenAI on when it might be addressed.

Germany-based security researcher Benjamin Flesch outlined the vulnerability in a GitHub post earlier this month, including a proof-of-concept code that demonstrates how 50 parallel HTTP requests can be sent to a test website.

The flaw lies in how the API handles HTTP POST requests to the endpoint https://chatgpt.com/backend-api/attributions. This endpoint is typically used to send data to a server and create new resources. According to Flesch, OpenAI’s API fails to validate if a hyperlink to the same resource appears multiple times in a list of URLs submitted as a parameter.

Additionally, hyperlinks to the same website can be written in various formats, enabling the API’s crawler to make multiple parallel requests. Flesch also pointed out that OpenAI does not enforce a limit on the number of hyperlinks that can be included in a single request, making it possible for malicious actors to overwhelm a website’s server by sending thousands of requests.

Flesch rated the vulnerability with a high 8.6 CVSS score, citing its network-based nature, low execution complexity, lack of required user privileges, and significant potential impact on website availability.

The researcher claims to have contacted OpenAI and its partner Microsoft (as ChatGPT API is hosted on Microsoft servers) through multiple channels since discovering the flaw in January. He reported the issue to:

OpenAI’s security team.
OpenAI employees through direct reports.
OpenAI’s data privacy officer.
Microsoft’s security and Azure network operations teams.

Despite these efforts, Flesch claims the vulnerability remains unresolved, and OpenAI has yet to acknowledge its existence. Meanwhile, OpenAI is set to release its o3 Mini reasoning AI model in the coming weeks, and the company is reportedly preparing for the launch of advanced AI agents. However, this unresolved security issue may raise concerns about the robustness of OpenAI’s infrastructure.

Don’t miss out on the insights driving the future of Artificial Intelligence! Join a community of researchers, developers, and AI enthusiasts to stay ahead of the curve in Generative AI. Each edition delivers exclusive updates, expert analysis, and thought-provoking discussions straight to your inbox. Subscribe today and be part of the journey toward AGI innovation.

Contact us for any paid collaborations and sponsorships.

Unlock the future of problem solving with Generative AI!

If you're a professional looking to elevate your strategic insights, enhance decision-making, and redefine problem-solving with cutting-edge technologies, the Consulting in the age of Gen AI course is your gateway. Perfect for those ready to integrate Generative AI into your work and stay ahead of the curve.

In a world where AI is rapidly transforming industries, businesses need professionals and consultants who can navigate this evolving landscape. This learning experience arms you with the essential skills to leverage Generative AI for improving problem-solving, decision-making, or advising clients.

Join us and gain firsthand experience in how state-of-the-art technology can elevate your problem solving skills using GenAI to new heights. This isn’t just learning; it’s your competitive edge in an AI-driven world.