• "Towards AGI"
  • Posts
  • Apple Study Highlights Generative AI's Math Limitations

Apple Study Highlights Generative AI's Math Limitations

It could also make financial institution executives rethink their plans to implement AI for customer guidance.

A Thought Leadership platform to help the world navigate towards Artificial General Intelligence We are committed to navigate the path towards Artificial General Intelligence (AGI) by building a community of innovators, thinkers, and AI enthusiasts.

Whether you're passionate about machine learning, neural networks, or the ethics surrounding GenAI, our platform offers cutting-edge insights, resources, and collaborations on everything AI.

What to expect from Towards AGI: Know Your Inference (KYI): Ensuring your AI-generated insights are accurate, ethical, and fit for purpose Open vs Closed AI: Get expert analysis to help you navigate the open-source vs closed-source AI GenAI Maturity Assessment: Evaluate and audit your AI capabilities Expert Insights & Articles: Stay informed with deep dives into the latest AI advancements But that’s not all!

We are training specialised AI Analyst Agents for CxOs to interact and seek insights and answers for their most pressing questions. No more waiting for a Gartner Analyst appointment for weeks, You’ll be just a prompt away from getting the insights you need to make critical business decisions. Watch out this space!!

Visit us at https://www.towardsagi.ai to be part of the future of AI. Let’s build the next wave of AI innovations together!

Apple Study Highlights Generative AI's Math Limitations

A recent Apple study may prompt consumers to reconsider using ChatGPT and other Generative AI tools for financial advice. It could also make financial institution executives rethink their plans to implement AI for customer guidance.

According to a Motley Fool survey, a surprising 54% of Americans have used ChatGPT for finance recommendations, with Gen Z and Millennials being the most frequent users. Most inquiries involved credit cards and checking accounts. However, only half of the respondents expressed interest in seeking financial advice from ChatGPT in the future, and the satisfaction rate with its recommendations was moderate, averaging 3.7 on a 5-point scale.

Consumers identified three critical factors influencing their use of ChatGPT for financial advice: the accuracy of recommendations, the transparency of the logic behind them, and the ability to verify the information. However, Apple's research raises doubts about the reliability of Generative AI's reasoning capabilities, particularly in mathematical contexts.

Apple researchers found that large language models (LLMs) struggle with logical reasoning. When a single seemingly relevant clause is added to a question, LLMs perform worse, indicating fundamental issues in their reasoning that can’t be easily fixed. TechCrunch highlighted examples of simple math problems that AI models like Claude, Gemini, and Llama fail to solve correctly, attributing this to issues with tokenization, which disrupts the relationships between numbers.

Moreover, there's confusion around the term "machine learning," which often gets conflated with statistical methods like regression analysis. Machine learning, unlike these methods, involves a process of decision-making, error evaluation, and optimization. But when it comes to financial performance, people's spending habits often play a bigger role than investments, and spending is driven by emotional factors that machine learning models struggle to capture.

Salesforce Survey: 99% of Indian Leaders Prioritize Generative AI for Future Success

According to a recent Salesforce survey, an overwhelming 99% of Indian business leaders view generative AI as vital to future success, despite challenges like accessibility and governance. The survey, which included over 300 C-suite executives, underscores the growing recognition that AI integration is essential for staying competitive in an increasingly digital economy.

Confidence in AI is on the rise, with all respondents indicating that they would be comfortable delegating at least one task to AI within the next three years, without human intervention. However, several barriers to full adoption remain. These include accessibility concerns (38%), doubts about the accuracy of AI-generated outputs (34%), and a lack of governance structures (30%).

Salesforce highlighted that leadership will play a crucial role in managing risks related to data security and privacy as AI adoption progresses. Arun Parameswaran, Managing Director of Sales at Salesforce India, emphasized that the pressure on business leaders to rapidly and efficiently adopt AI is greater than ever.

It’s fantastic to see such a strong commitment to AI innovation, even as businesses work to overcome these challenges!

Introducing Aria: The Open-Source AI Challenging Big Tech

A new player has entered the artificial intelligence field, and it's fully open-source. Developed by Tokyo-based Rhymes AI, Aria is a multimodal large language model (LLM) capable of handling text, code, images, and video within a single architecture.

What stands out about Aria is not just its versatility but its efficiency. Unlike its larger multimodal counterparts, Aria is more energy- and hardware-efficient. This is due to its Mixture-of-Experts (MoE) framework, which functions like a team of specialized mini-experts trained to excel at specific tasks.

When Aria receives an input, only the relevant experts (a subset of the model) are activated, rather than engaging the entire model. This selective activation allows Aria to be more lightweight and efficient than models that process everything all at once. Specifically, Aria uses only 3.5 billion of its 24.9 billion parameters per token, significantly reducing computational load and improving task-specific performance.

This selective engagement also supports scalability, as new experts can be added for specialized tasks without overburdening the system.

Aria is notable for being the first open-source multimodal MoE. While there are existing MoEs, like Mixtral-8x7B, and multimodal LLMs, like Pixtral, Aria uniquely combines both architectures in one model.

In synthetic benchmark tests, Aria outperforms some prominent open-source models like Pixtral 12B and Llama 3.2-11B. Surprisingly, it even competes with proprietary models like GPT-4o, Gemini-1 Pro, and Claude 3.5 Sonnet, demonstrating multimodal performance on par with industry leaders like OpenAI.

Released under the Apache 2.0 license, Aria allows developers and researchers to modify and build upon it. It's a powerful addition to the growing pool of open-source AI models, performing on par with some of the more widely adopted closed-source counterparts like those from Meta and Mistral.

Aria excels across various tasks. In one experiment, it analyzed an entire financial report, accurately calculating profit margins and providing detailed data breakdowns. When tasked with visualizing weather data, it generated Python code for graph creation, including formatting details.

The model’s video processing is also impressive. In one test, Aria analyzed an hour-long video about Michelangelo's David, identifying 19 distinct scenes, complete with titles, start and end times, and descriptions—demonstrating context-driven understanding beyond simple keyword matching.

Aria also shines in coding, where it can analyze video tutorials, extract and debug code snippets. In one case, it identified and fixed a logic error involving nested loops, showing a deep grasp of programming concepts.

Open-Source AI Video Tool Pyramid Flow Launches Online

Pyramid Flow is an open-source AI model from China that can generate high-resolution (768p) virtual videos. Developed by a team of AI researchers from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications, the model is licensed under the MIT License and was trained on open-source datasets comprising around 10 million videos. 

The model excels in generating 384p videos, creating a five-second clip in under a minute using an A100 GPU. While it's gaining popularity in YouTube tutorials for its efficiency, Pyramid Flow also has limitations, particularly with certain text prompts, where results can be inconsistent—something common among generative AI tools.

One of its key advantages is its lower computational demand compared to other models, and its open-source nature allows developers to use it in both local and cloud-based applications without worrying about licensing. However, the creators have not addressed potential copyright issues related to using open-source datasets for generating virtual videos, a concern raised by some content creators. Pyramid Flow, though, could be valuable for fine-tuning content without needing third-party involvement.

OpenAI Unveils Swarm: A New Framework for Multi-Agent System Development

As the AI landscape shifts toward developing agent-based systems, OpenAI has introduced an open-source framework called Swarm. This new tool is designed for building, orchestrating, and deploying multi-agent systems. Though still in its experimental phase and not ready for production use, Swarm is now available on GitHub.

Developed by the OpenAI Solutions team, Swarm aims to streamline agent coordination and execution, making it lightweight, controllable, and easy to test. It operates through two key abstractions: Agents and handoffs. Each Agent carries out specific tasks and tools, and can transfer control to another Agent as needed.

These features enable flexible interactions among tools and networks of agents, supporting scalable, real-world applications with a simple learning curve. It’s important to note that Swarm Agents differ from Assistants in the Assistants API. While both use the Chat Completions API, Swarm remains stateless between calls and offers developers more control over the execution and context of tasks, unlike the fully-hosted Assistants API with built-in memory management.

Swarm is ideal for use cases involving multiple independent capabilities and instructions that don't fit into a single prompt. It's lightweight, scalable, and customizable for developers who need granular control over agent behavior.

In addition to Swarm, OpenAI has launched MLE-bench, a benchmark designed to evaluate AI agents' ability to complete machine learning engineering tasks. The benchmark includes 75 ML competitions from Kaggle, testing skills like model training and experiment execution. OpenAI evaluated various language models using open-source agent scaffolds, with the top-performing configuration — OpenAI's o1-preview paired with AIDE scaffolding — earning a Kaggle bronze medal in 16.9% of the challenges.

Adobe Rolls Out AI-Powered Video Tools in Competition with OpenAI and Meta

On Monday, Adobe announced the public distribution of its new AI model, Firefly Video Model, which generates video from text prompts. This move positions Adobe among other companies, such as OpenAI, ByteDance (owner of TikTok), and Meta, all of which have recently launched similar generative AI video tools. While Adobe is gradually giving access to those on its waiting list, it has not yet set a general release date.

Though Adobe hasn't revealed any companies currently using its video tools, it did mention that PepsiCo's Gatorade is utilizing its image generation model for custom bottle orders, and Mattel has been using Adobe's tools to design packaging for its Barbie dolls.

The Firefly Video Model is designed to be practical for everyday video creators and editors, with a particular focus on seamlessly integrating AI-generated footage with conventional video. Adobe's digital media CTO, Ely Greenfield, explained that they have focused on fine-grained control, teaching the model concepts familiar to video editors, such as camera positioning, angle, and motion.

To stay competitive against larger rivals, Adobe has prioritized building AI models based on data it has the legal rights to use, ensuring that the output can be utilized for commercial purposes.

Unlock the future of problem solving with Generative AI!

If you're a professional looking to elevate your strategic insights, enhance decision-making, and redefine problem-solving with cutting-edge technologies, the Consulting in the age of Gen AI course is your gateway. Perfect for those ready to integrate Generative AI into your work and stay ahead of the curve.

In a world where AI is rapidly transforming industries, businesses need professionals and consultants who can navigate this evolving landscape. This learning experience arms you with the essential skills to leverage Generative AI for improving problem-solving, decision-making, or advising clients.

Join us and gain firsthand experience in how state-of-the-art technology can elevate your problem solving skills using GenAI to new heights. This isn’t just learning; it’s your competitive edge in an AI-driven world.

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

In our quest to explore the dynamic and rapidly evolving field of Artificial Intelligence, this newsletter is your go-to source for the latest developments, breakthroughs, and discussions on Generative AI. Each edition brings you the most compelling news and insights from the forefront of Generative AI (GenAI), featuring cutting-edge research, transformative technologies, and the pioneering work of industry leaders.

Highlights from GenAI, OpenAI, and ClosedAI: Dive into the latest projects and innovations from the leading organizations behind some of the most advanced AI models in open-source, closed-sourced AI.

Stay Informed and Engaged: Whether you're a researcher, developer, entrepreneur, or enthusiast, "Towards AGI" aims to keep you informed and inspired. From technical deep-dives to ethical debates, our newsletter addresses the multifaceted aspects of AI development and its implications on society and industry.

Join us on this exciting journey as we navigate the complex landscape of artificial intelligence, moving steadily towards the realization of AGI. Stay tuned for exclusive interviews, expert opinions, and much more!