- "Towards AGI"
- Posts
- New Open-Source AI Model Self-Corrects to Prevent Hallucinations
New Open-Source AI Model Self-Corrects to Prevent Hallucinations
Reflection 70B, built on Meta's open-source Llama model, is intended for use in HyperWrite's primary product.
A Thought Leadership platform to help the world navigate towards Artificial General Intelligence We are committed to navigate the path towards Artificial General Intelligence (AGI) by building a community of innovators, thinkers, and AI enthusiasts.
Whether you're passionate about machine learning, neural networks, or the ethics surrounding GenAI, our platform offers cutting-edge insights, resources, and collaborations on everything AI.
What to expect from Towards AGI: Know Your Inference (KYI): Ensuring your AI-generated insights are accurate, ethical, and fit for purpose Open vs Closed AI: Get expert analysis to help you navigate the open-source vs closed-source AI GenAI Maturity Assessment: Evaluate and audit your AI capabilities Expert Insights & Articles: Stay informed with deep dives into the latest AI advancements But that’s not all!
We are training specialised AI Analyst Agents for CxOs to interact and seek insights and answers for their most pressing questions. No more waiting for a Gartner Analyst appointment for weeks, You’ll be just a prompt away from getting the insights you need to make critical business decisions. Watch out this space!!
Visit us at https://www.towardsagi.ai to be part of the future of AI. Let’s build the next wave of AI innovations together!
TheGen.AI News
New Open-Source AI Model Self-Corrects to Prevent Hallucinations

When prominent AI companies like Anthropic or OpenAI announce upgraded models, they often generate significant attention due to the widespread influence of AI on both individual users and workplaces. It seems like AI technology is advancing rapidly. However, a new AI from New York startup HyperWrite is drawing attention for a different reason. Their model, Reflection 70B, incorporates a new open-source error-trapping system designed to reduce common "hallucination" issues, which have affected chatbots like ChatGPT and Google Gemini — including one instance where Gemini incorrectly advised people to put glue on pizza.
Reflection 70B, built on Meta's open-source Llama model, is intended for use in HyperWrite's primary product, a writing assistant that adapts to user needs and helps with creative tasks. What sets Reflection 70B apart is its "reflection-tuning" system, which the company's CEO and co-founder, Matt Shumer, calls a breakthrough in preventing AI hallucinations. Shumer explained on social media that unlike other generative AI models that often fail to recognize their errors, Reflection 70B can identify its mistakes and correct them before providing an answer. This "reflection" system allows the AI to review its own output, spot errors, and learn from them in real-time.
The concept of self-improving AI isn't entirely new. In April, Meta's Mark Zuckerberg suggested that their Llama model should be able to train itself by experimenting with different approaches to a problem, identifying the most accurate output, and using that to enhance its training in a feedback loop. Reflection 70B builds on this idea but takes a more direct approach to correcting hallucinations and misinformation. It focuses on fixing mistakes within its outputs rather than just improving future training data. As an example, Shumer shared a conversation where Reflection 70B initially misidentified the number of "Rs" in the word "strawberry." After reviewing its own output, the AI corrected itself, acknowledging there are actually three "Rs."
The issue of AI accuracy and reliability is becoming increasingly important as more people turn to AI for information and opinions on critical topics. Ensuring that future AI systems align with human interests is a growing concern, with the EU, U.S., and UK recently signing agreements to promote AI safety. However, creating effective regulations is a challenge, as lawmakers must grapple with the complex math and logic that underpin AI models like ChatGPT and Reflection 70B. For example, upcoming AI laws in California require disclosures for AI models trained on computers capable of performing 100 septillion floating-point operations per second—a staggering amount of data. Hopefully, lawmakers can handle this complexity better than ChatGPT handles counting the "Rs" in strawberry.
Why DataGemma is a Game-Changer for Generative AI with RAG Integration?

The rising technique of retrieval-augmented generation (RAG) is moving beyond enterprise use and gaining wider attention. Recently, Google introduced DataGemma, a fusion of its open-source large language models (LLMs), Gemma, and the public data resource, Data Commons. DataGemma uses RAG methods to retrieve data before providing answers to queries. In one approach, full RAG cites data sources, delivering more detailed and informative responses. Google's closed-source model, Gemini 1.5, supports DataGemma, leveraging a "long-context window" for better recall, with its context window handling up to 128,000 tokens, and in some cases, up to a million. This large capacity allows more data from Data Commons to be processed and retained when responding to queries.
Google explains that DataGemma retrieves relevant data from Data Commons before generating a response, reducing hallucinations and improving accuracy. While RAG is commonly used to link LLMs with internal corporate data, Google’s use of Data Commons represents the first large-scale RAG implementation in cloud-based generative AI. Data Commons, an open-source platform that collects data from public institutions like the United Nations, serves as the fact-checking backbone for Google’s "retrieval-interleaved generation" (RIG), which integrates real-world stats into AI-generated answers.
The research behind DataGemma is still ongoing, and more details can be found in a formal research paper by Google researcher Prashanth Radhakrishnan and his team. Google notes that further testing and development are required before DataGemma is publicly integrated into both its open-source Gemma models and the closed-source Gemini model.
According to Google, the use of RIG and RAG has already led to better output quality, reducing hallucinations in tasks such as research, decision-making, and general information queries.
In a related move, OpenAI recently introduced its "Strawberry" project, which includes two models utilizing the "chain of thought" technique. This method guides the AI to explicitly outline the reasoning behind its predictions.
DataGemma exemplifies how major AI companies like Google are expanding their offerings with advanced technologies that go beyond traditional LLMs.
How GenAI Powers Andretti Global’s Data-Driven Racing Strategy?

Sunday’s IndyCar championship finale in Nashville marked the close of the 2024 racing season and the third year of collaboration between Zapata AI, a leader in machine learning solutions, and the renowned racing team Andretti Global.
While the link between AI and IndyCar racing might not be immediately apparent, it becomes clearer when considering that each car in an IndyCar event is equipped with about 140 sensors, generating around one terabyte of data per race. To put this in perspective, one terabyte is equivalent to 500 hours of HD video, 17,000 hours of music, or 6.5 million electronic documents.
During a Zoom interview, Zapata AI's co-founder and CEO, Christopher Savoie, explained that his company's advanced quantum modeling supports industries like finance, the military, drug development, and manufacturing, but racing stands out. "Racing is where the rubber hits the road — literally. It’s the fastest data flow and the most sensor data you’ll find, driven by IoT," Savoie said.
He emphasized the high-stakes nature of racing, noting, "The decisions made in racing are often life or death for the drivers, which makes it an especially interesting space for AI models to operate."
Savoie described Zapata AI as a tech innovator that provides large enterprises and government agencies with industrial-grade AI solutions through its Orquestra platform, built for massive-scale generative AI applications utilizing both text- and data-based learning models. The company's technology stack is shown below.
Customer Service Leaders Face Challenges Scaling GenAI Pilots

A McKinsey & Company survey of customer service organizations highlights that leaders overwhelmingly recognize the importance of mastering digital transformation. More than half of respondents indicated that within the next three years, over 40% of inbound interactions will occur through digital channels. As the industry prepares for this surge in digital engagement, it is widely accepted that artificial intelligence will play a pivotal role in shaping the future of customer service.
Respondents reported that they are already utilizing AI in various applications, including chatbots, automated email responses, training support for call center agents, back-office analytics, and decision-making processes. However, over the past year, the introduction of advanced Generative AI tools, particularly large language models (LLMs) capable of interpreting and responding to unstructured text or speech, has opened new doors for AI in customer care. As a result, 80% of customer care leaders confirmed that they are either already investing in Generative AI or plan to do so soon, and the same percentage expected to increase their investment in GenAI over time, given its wide range of potential applications.
Despite this progress, many companies are still in the early stages of their GenAI adoption. A report from Capgemini Research Institute found that while 60% of organizations have implemented GenAI pilots or proofs of concept (PoCs) using enterprise data, 75% view scaling these initiatives as a major challenge. McKinsey’s survey also revealed that 24% of leaders are concerned about scaling their pilots into full-scale production.
Additionally, 22% of respondents pointed to a lack of AI maturity within their organizations, specifically a shortage of AI skills. Another 15% expressed uncertainty about how to design human-centered experiences for employees using AI tools.
McKinsey cautioned that many customer service organizations lack the critical skills needed to deliver top-tier service and transition to a digitally driven, AI-enabled environment. This skill gap is partly attributed to HR departments focusing more on addressing high employee turnover following the COVID-19 pandemic and less on training and upskilling. However, with staff attrition slowing down, two-thirds of leaders are now prioritizing employee development, with 21% using AI tools to enhance training and support for customer care teams.
The report concludes by emphasizing that Generative AI is setting new standards for performance, productivity, and personalization in customer care. It urges companies to rethink their care ecosystems, form a clear vision of evolving customer expectations, and embrace advanced AI technologies. The future of customer care is approaching, and leaders must respond with bold strategies and a commitment to swift transformation.
TheOpensource.AI News
Why Defining Open Source AI Could Unlock Endless Innovation?

A document released several weeks ago is set to have a major impact on the future of the internet. The Open Source Initiative (OSI) has unveiled a near-final definition of open source AI, designed to empower the broader AI developer community to drive innovation, much like the early days of the internet. Open source software forms the backbone of the internet infrastructure and most applications in use today, thanks to pioneers who ensured that such software would always be free to use and modify. This open approach fueled widespread adoption and innovation that underpins our digital lives.
The timing couldn’t be better, as a wave of AI models from big tech companies is being labeled as "open source," yet they often fall short of the original spirit of the open source movement. While it may seem like a minor issue, the use of vague or misleading terms around open source AI could undermine trillions of dollars in future innovation, leaving AI development dominated by a few large companies.
The stakes are high. A Harvard University study estimates that open source software has generated around $8 trillion in economic value, all based on the guarantees in the 1998 open source definition: that any software called open source will always be free to use, modify, and share. This assurance allows businesses, governments, and others to confidently build on open source software without fear of future restrictions or fees.
The same benefits could be realized with AI, but only if developers can freely access and modify all parts of an AI system. The phrase “all elements of an AI system” is crucial. Unlike traditional software, AI systems include not only code but also the models and training data that underpin them. The OSI’s new definition emphasizes that the code and models must be open, and the data must be transparent and reproducible. For true innovation, AI labs — including the major tech companies — must adopt this definition before labeling their work as "open source." Without this, developers may avoid using these models, stalling the open source AI movement.
Many large tech firms, such as Meta with its Llama model, have released AI models marketed as open source, allowing developers to build applications on them without the huge costs of developing from scratch. These models have powered valuable applications, from drug discovery to education. However, they are not fully open, which raises concerns about the sustainability of applications built on them. For example, Meta could change its licensing terms, limiting access to its models and potentially disrupting services built on them.
In a recent opinion piece, Mark Zuckerberg and Spotify CEO Daniel Ek defined open source AI as "models whose weights are publicly released with a permissive license," citing Llama as an example. However, this narrow definition could allow companies to stop releasing parts of their models if it no longer benefits them. This uncertainty threatens the long-term viability of applications built on such models, and the broader open source AI ecosystem.
Mozilla and Columbia University recently brought together experts to discuss what openness should mean in the AI era. They flagged the dangers of vague definitions of open source AI and warned against "open-ish" licenses, like Llama’s, which restrict free use to products with fewer than 700 million monthly users. Building a startup on software that might be locked down once it becomes successful is a real risk with these kinds of licenses.
The OSI’s draft definition aims to address these issues by setting clear standards for what constitutes open source AI, helping developers know which models they can trust. Examples of fully open AI models include EleutherAI’s GPT-NeoX-20B, released under the Apache 2.0 license, and the Allen Institute’s OLMo model, which provides full access to the code, data, and weights used in its development. Unlike Meta’s Llama, these models allow researchers to fully examine and adapt them to their needs.
Nonprofit organizations like EleutherAI and AI2 also provide confidence that these models will remain accessible and up-to-date, ensuring the longevity of applications built on them. This principle of sustained support has made open source projects like Linux and Apache staples of server infrastructure worldwide. Developers trust that these foundations will keep their software running in the public interest.
TheClosedsource.AI News
Why OpenAI’s New Model o1 Is a Game-Changer for AI Development?

Last weekend, I got married at a summer camp where we organized a series of Survivor-inspired games for our guests. One of the planned challenges was a memory game where teams had to memorize part of a poem and recreate it using wooden tiles. When my now-wife and I were planning in August, I thought OpenAI’s GPT-4o, the leading model at the time, would be perfect for generating a wedding-themed poem with specific letter constraints. However, GPT-4o consistently failed to meet the requirements, claiming the poem fit the constraints when it didn’t. After struggling with the model, we abandoned the poem idea and instead had our guests memorize colored tile shapes, which was a huge success alongside games like dodgeball, egg toss, and capture the flag.
Last week, OpenAI released a new model called o1, previously known as “Strawberry” or Q*, which significantly outperforms GPT-4o for such tasks. Unlike its predecessors, which excel in language tasks like writing and editing, OpenAI o1 is focused on multistep reasoning—ideal for complex subjects like advanced math, coding, and STEM-related questions. According to OpenAI, o1 uses a “chain of thought” approach, allowing it to learn from its mistakes, break down complex steps, and try new approaches when needed.
OpenAI’s tests have shown remarkable results. The model ranks in the 89th percentile on coding challenges from Codeforces and would be among the top 500 high school students in the USA Math Olympiad. It’s also trained to answer PhD-level questions across subjects like astrophysics and organic chemistry. In math Olympiad questions, o1 achieved 83.3% accuracy, compared to GPT-4o’s 13.4%. For PhD-level questions, it averaged 78% accuracy, outperforming human experts at 69.7% and GPT-4o at 56.1%. While it still made some mistakes with our wedding poem, using more Ts and Ss than instructed, it was far superior to GPT-4o.
Why does this matter? Up until now, most advancements in large language models (LLMs) have focused on language-driven tasks like chatbots and voice assistants. But these models often struggle with facts and lack the skills needed for important fields like drug discovery, materials science, and coding. OpenAI’s o1 represents a breakthrough by bringing “chain-of-thought” reasoning to a wider audience. Matt Welsh, an AI researcher and founder of Fixie, believes this raises expectations for what AI models can achieve.
However, some experts urge caution when comparing AI models to human abilities. Yves-Alexandre de Montjoye, a professor at Imperial College London, points out that it’s difficult to compare how LLMs and humans solve problems from scratch. Additionally, AI researcher François Chollet notes that while o1 is impressive, measuring its reasoning abilities is challenging. Did the model truly reason its way to the correct answer, or did it rely on its pre-existing knowledge base?
Lastly, the new model comes with a high cost. While it’s available to premium subscribers, developers using o1 through the API will pay three times as much as GPT-4o—$15 per 1 million input tokens versus $5 for GPT-4o. OpenAI suggests that for language-heavy tasks, GPT-4o remains the better option.
In related AI news, researchers from MIT and Cornell found that AI chatbots could help reduce belief in conspiracy theories by around 20%, and Google’s new tool, DataGemma, helps LLMs fact-check their responses by referencing reliable data sources. Meanwhile, OpenAI’s valuation has reached $150 billion as it seeks to raise $6.5 billion, though it may face challenges, with projected losses of up to $5 billion this year.
New OpenAI o1 Model Achieves 83% Success Rate in Math Olympiad Challenges

OpenAI is set to launch its latest artificial intelligence (AI) model, o1, in two weeks. This marks the introduction of a new class of reasoning-focused AI models and follows speculation about the "Strawberry" AI release. Along with o1, the company will also debut o1-mini, a lighter and more cost-effective version tailored for tasks such as coding and problem-solving.
What can o1 do?
o1 is designed to tackle complex, multi-step problems, particularly in areas like math and coding. It not only solves these problems but also explains its reasoning along the way, mimicking human thought processes. The model promises higher accuracy and significantly reduces hallucinations—instances where AI generates false or misleading information. Additionally, o1 can serve as a powerful tool for scientific research in fields like physics, chemistry, and engineering, where precise reasoning and problem-solving are critical.
How is o1 different from previous OpenAI models?
Unlike older models that rely on pattern-mimicking training, o1 uses reinforcement learning, where the system improves based on rewards or penalties. It also employs a "chain of thought" approach, which breaks down problems into logical, sequential steps, mimicking human cognition. According to OpenAI’s Chief Research Officer Bob McGrew, o1 significantly outperforms earlier models, especially in math tasks, solving 83% of the problems in the International Mathematics Olympiad, compared to GPT-4o’s 13%.
When will o1 be available?
ChatGPT Plus and Team users already have access to the o1 preview and o1-mini, while Enterprise and Edu users will gain access next week. OpenAI also plans to offer o1-mini to free users, though no specific date has been announced. For developers, o1-preview is priced at $15 per million input tokens and $60 per million output tokens—three times the cost of GPT-4o.
What are the limitations of o1?
o1 is more expensive and slower than previous models, and it’s not optimized for web browsing or processing files and images.
What does o1 mean for the future?
For OpenAI, o1 represents a step toward creating AI systems that act as autonomous agents, capable of making decisions, taking actions for users, and solving real-world problems, potentially revolutionizing industries like healthcare and engineering.
Sam Altman’s Top Strategy for a Life Without Regrets: Prioritize Taking Risks

In 2005, Sam Altman made a pivotal decision that would shape his career, though he may not have realized it at the time. He dropped out of Stanford to develop Loopt, a location-based social networking app, marking the first step in his journey toward co-founding OpenAI, the groundbreaking AI company behind ChatGPT.
Altman, now 39, described the decision to leave college as something that "seemed like a really fun thing to try" during a talk at John Burroughs School, his former high school in the St. Louis area. He emphasized that leaving school wasn't a permanent choice: if entrepreneurship didn’t pan out, he could always return.
"The key to most risks is that they’re not a one-way door," Altman explained. "You can try something, and if it doesn’t work out, you can undo it and try something else."
This perspective on risk-taking is shared by other business leaders, such as Amazon founder Jeff Bezos. During a "Lex Fridman Podcast" interview last year, Bezos referred to "two-way door" risks, which are reversible and easier to take. In contrast, "one-way door" risks are harder to reverse and should be approached more cautiously, as they may not allow a return.
Altman stressed the importance of taking risks, noting that avoiding them can lead to missed opportunities. "The real risk is not trying the things that could turn out great," he said, warning that failing to take chances can result in long-term regret. “You might look back 10, 20, or 30 years later and think, ‘I wish I had tried that thing I really wanted to do.’"
He also encouraged students to remain flexible, suggesting that sticking to the traditional path of going to college, getting a job, and staying there "forever" may no longer guarantee financial security as it once did. Altman pointed out that the conventional career path is increasingly challenged and predicted that AI will bring even more disruptions.
Younger generations seem to be embracing this mindset of career flexibility. According to data from Handshake, 43% of students graduating in 2025 expect to change career fields at least once. Christine Cruzvergara, Handshake’s chief education strategy officer, noted that this generation values having options, knowing they'll be working for a long time.
Altman’s advice encourages young people to take bold steps in their careers, embracing risk and adaptability as the professional landscape continues to evolve.
Unlock the future of problem solving with Generative AI!

If you're a professional looking to elevate your strategic insights, enhance decision-making, and redefine problem-solving with cutting-edge technologies, the Consulting in the age of Gen AI course is your gateway. Perfect for those ready to integrate Generative AI into your work and stay ahead of the curve.
In a world where AI is rapidly transforming industries, businesses need professionals and consultants who can navigate this evolving landscape. This learning experience arms you with the essential skills to leverage Generative AI for improving problem-solving, decision-making, or advising clients.
Join us and gain firsthand experience in how state-of-the-art technology can elevate your problem solving skills using GenAI to new heights. This isn’t just learning; it’s your competitive edge in an AI-driven world.
🦾 Master AI & ChatGPT for FREE in just 3 hours 🤯
1 Million+ people have attended, and are RAVING about this AI Workshop.
Don’t believe us? Attend it for free and see it for yourself.
Highly Recommended: 🚀
Join this 3-hour Power-Packed Masterclass worth $399 for absolutely free and learn 20+ AI tools to become 10x better & faster at what you do
🗓️ Tomorrow | ⏱️ 10 AM EST
In this Masterclass, you’ll learn how to:
🚀 Do quick excel analysis & make AI-powered PPTs
🚀 Build your own personal AI assistant to save 10+ hours
🚀 Become an expert at prompting & learn 20+ AI tools
🚀 Research faster & make your life a lot simpler & more…
In our quest to explore the dynamic and rapidly evolving field of Artificial Intelligence, this newsletter is your go-to source for the latest developments, breakthroughs, and discussions on Generative AI. Each edition brings you the most compelling news and insights from the forefront of Generative AI (GenAI), featuring cutting-edge research, transformative technologies, and the pioneering work of industry leaders.
Highlights from GenAI, OpenAI, and ClosedAI: Dive into the latest projects and innovations from the leading organizations behind some of the most advanced AI models in open-source, closed-sourced AI.
Stay Informed and Engaged: Whether you're a researcher, developer, entrepreneur, or enthusiast, "Towards AGI" aims to keep you informed and inspired. From technical deep-dives to ethical debates, our newsletter addresses the multifaceted aspects of AI development and its implications on society and industry.
Join us on this exciting journey as we navigate the complex landscape of artificial intelligence, moving steadily towards the realization of AGI. Stay tuned for exclusive interviews, expert opinions, and much more!