Towards AGI
Posts
Inside Gemini 2.5 Pro: How Google's Smartest AI Model Actually Works

Inside Gemini 2.5 Pro: How Google's Smartest AI Model Actually Works

This architectural enhancement allows Gemini 2.5 to tackle complex problems with greater accuracy and contextual understanding.

Shen Pandi
March 28, 2025

Gen AI: Google's AI Crown Jewel

What’s New: The Low-Cost LLM Revolution?

Open AI: The AI Agent Bargain?

OpenAI: Bad News for AI Artists!

Closed AI: Say Hello to Smarter Chats

Announcing Gen Matrix 2.0: Now Featuring AI’s Most Influential Investors!

towardsagi
29 followers
View more on Instagram

The AI revolution is accelerating, and Gen Matrix, your trusted compass in the AI-driven business landscape, is evolving with it.

We’re thrilled to launch the Second Edition of Gen Matrix, with a brand-new Investors category, because behind every groundbreaking AI innovation, there’s capital fueling its rise.

What’s New in Gen Matrix 2.0?

Investors Spotlight: Discover the top VCs, angels, and funds betting big on Generative AI, and learn where smart money is flowing.
Expanded Rankings: Updated leaderboards across Organizations, Startups, and Individuals, showcasing who’s pushing boundaries in 2024.
Deeper Industry Insights: Real-world case studies from banking, healthcare, retail, and tech, proving AI’s ROI, not just hype.

Google Just Upped The AI Ante? Meet Gemini 2.5 Pro, Its Most Advanced Model Yet

An artistic representation of Google's Gemini AI, symbolizing advanced AI capabilities.

Google has made a significant leap in artificial intelligence with the introduction of Gemini 2.5, its most advanced AI model to date. This latest iteration represents a major evolution in AI capabilities, introducing sophisticated "thinking" functionality that enables the model to process information through logical reasoning before generating responses.

This architectural enhancement allows Gemini 2.5 to tackle complex problems with greater accuracy and contextual understanding, marking a substantial improvement over previous versions. The flagship Gemini 2.5 Pro Experimental model demonstrates particularly impressive capabilities for handling intricate tasks.

According to Google's official announcement, this version significantly outperforms competitors on the LMArena leaderboard, which measures human preference in AI responses.

The model shows exceptional performance across multiple domains, including programming, mathematics, and scientific reasoning, achieving these results without relying on computationally expensive techniques that some competing models employ.

A key innovation in the Gemini 2.5 series is its built-in reasoning architecture. Unlike traditional language models that generate responses directly, these new models engage in an internal thought process before answering.

This fundamental redesign enables more sophisticated problem-solving abilities and allows the AI to maintain better contextual awareness throughout extended conversations.

Google has specifically highlighted the model's enhanced programming abilities as a major breakthrough. The 2.5 Pro version shows substantial improvements in web application development, code transformation, and editing compared to its predecessor.

These advancements position Gemini 2.5 as a powerful tool for developers and technical professionals, with Google promising even more coding-related improvements in future updates.

Google just launched Gemini 2.5 Pro
I tested it against o1-pro with the exact same prompt
— Flavio Adamo (@flavioAd)
9:16 PM • Mar 25, 2025

Currently, Gemini 2.5 Pro is available through Google AI Studio and the Gemini app for Advanced subscribers, with integration into Vertex AI coming soon. The model supports an expansive 1 million token context window, with plans to double this capacity to 2 million tokens in the near future.

Google will announce detailed pricing information and higher rate limits for enterprise-scale usage in the coming weeks, making this cutting-edge technology accessible for larger production implementations.

The Great AI Paradox: Bubble Worries Grow Even as Low-Cost LLMs Break Barriers

A roadmap to LLM models

When Chinese AI firm DeepSeek announced that its R1 large language model was developed for just $6 million, it sent shockwaves through the tech world. This figure—a fraction of the billions spent by U.S. giants like OpenAI—raised questions about whether AI development truly requires exorbitant investments.

Yet, skepticism persists, even as OpenAI reportedly prepares for a $40 billion funding round at a staggering $300 billion valuation. Meanwhile, AI chipmaker CoreWeave aims to revive the IPO market, betting on continued AI hype despite growing concerns over unsustainable spending.

Diffusion language models are SO FAST!!
A new startup, Inception Labs, has released Mercury Coder, "the first commercial-scale diffusion large language model"
It's 5-10x faster than current gen LLMs, providing high-quality responses at low costs.
And you can try it now!
— Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr)
9:23 PM • Feb 26, 2025

The debate over AI’s financial bubble is intensifying. Alibaba co-founder Joe Tsai recently warned of signs of an AI bubble in the U.S., while the "Magnificent Seven" tech stocks have underperformed this year.

Geopolitical tensions further complicate the landscape, with calls for stricter chip embargoes against China, in contrast to venture capitalists doubling down on Chinese AI startups.

Yet, for some researchers, DeepSeek’s breakthrough has unlocked new possibilities, proving that high-performance AI can be achieved without limitless budgets.

The $30 AI Experiment

A team at UC Berkeley took DeepSeek’s approach even further, replicating key aspects of its reasoning capabilities for just $30. Using two rented Nvidia H200 GPUs, they trained a scaled-down "3B" (3-billion-parameter) model on a simple math game called “Countdown”, where the AI must reach a target number through arithmetic operations.

Unlike traditional models that rely on vast datasets, this experiment focused on reasoning, a core innovation in both DeepSeek R1 and OpenAI’s latest models.

"DeepSeek R1 was the first open research to demonstrate how AI could 'think' before answering," said Jiayi Pan, leader of the TinyZero project. "But even $6 million was too expensive for us."

New discovery! LLMs are just like humans!
Overthinking GREATLY HURTS their performance
If we select the solution with the lower overthinking score. We improve model performance by almost 30% while reducing costs by 43% (o1_low)
Is reasoning really the future of LLMs? 🧵
— Alejandro Cuadron (@Alex_Cuadron)
10:48 PM • Feb 14, 2025

By reducing both model size and task complexity, the team found that emergent reasoning behaviors could still be observed, a discovery that challenges the assumption that only trillion-parameter models can achieve advanced cognition.

Alibaba Takes on Big Tech With New Open-Source AI for Cost-Effective Automation

A creative depiction related to Alibaba, highlighting innovation in e-commerce and technology.

Alibaba Cloud has unveiled its latest AI innovation, the Qwen2.5-Omni-7B, a cutting-edge multimodal model capable of processing text, images, audio, and video while generating real-time responses in both text and natural speech.

The announcement, made on Thursday, signals Alibaba’s aggressive push in China’s rapidly evolving AI landscape, where competition has surged following DeepSeek’s breakthrough open-source release.

This is wild.
Alibaba just open-sourced Wan 2.1, AI that generates videos from text & images, edits video, & creates audio!
The videos look absolutely insane.
1. Physical Simulation of water surface
— Min Choi (@minchoi)
2:11 PM • Feb 26, 2025

A Lightweight Yet Powerful AI Solution

Unlike many large language models (LLMs) that require massive computing power, Alibaba’s new model is designed for edge deployment, meaning it can run efficiently on devices like smartphones without sacrificing performance.

According to the company, this makes it ideal for developing cost-effective AI agents, particularly in voice-enabled applications. One potential use case includes assisting visually impaired users by providing real-time audio descriptions of their surroundings.

China’s AI Market Heats Up

The release comes amid an AI arms race among Chinese tech giants. Last week, Baidu introduced its own multimodal and reasoning-focused models, while Alibaba has been rapidly iterating its AI offerings, launching Qwen 2.5 in January and an upgraded version of its AI assistant, Quark, earlier this month.

To solidify its position, Alibaba has pledged a staggering $53 billion investment in cloud and AI infrastructure over the next three years, surpassing its total spending in the past decade.

Analysts, including Morningstar’s Kai Wang, believe Alibaba is well-positioned to capitalize on China’s AI boom, given its dual strength in data centers and proprietary LLMs.

DeepSeek V3-0324 Makes History as Top-Performing Open-Source AI Model

A visual concept representing DeepSeek, possibly linked to AI research and exploration.

In a groundbreaking development for open-source artificial intelligence, DeepSeek's V3-0324 has claimed the highest score among non-reasoning models on the prestigious Artificial Analysis Intelligence Index.

The model's seven-point improvement in benchmark testing allowed it to outperform established proprietary competitors, including Google's Gemini 2.0 Pro, Anthropic's Claude 3.7 Sonnet, and Meta's Llama 3.3 70B.

🚀 DeepSeek-V3-0324 is out now!
🔹 Major boost in reasoning performance
🔹 Stronger front-end development skills
🔹 Smarter tool-use capabilities
✅ For non-complex reasoning tasks, we recommend using V3 — just turn off “DeepThink”
🔌 API usage remains unchanged
📜 Models are
— DeepSeek (@deepseek_ai)
1:32 PM • Mar 25, 2025

While still trailing behind advanced reasoning models like DeepSeek's own R1 and offerings from OpenAI and Alibaba, this achievement demonstrates the increasing competitiveness of open-source solutions in time-sensitive applications where instantaneous responses are crucial.

The Evolving Open-Source Landscape

The AI sector is witnessing a significant shift as open-source frameworks increasingly challenge proprietary systems. While closed-source reasoning models still lead overall benchmarks, DeepSeek's progress demonstrates the rapid advancement of community-driven alternatives.

"V3-0324's achievement is arguably more impressive than R1's breakthrough," suggests Artificial Analysis. The model's MIT License makes it a versatile tool for developers and enterprises, though its substantial computational needs may limit broader accessibility.

With DeepSeek now pushing the boundaries of open-weight non-reasoning models and the anticipated R2 model on the horizon, the AI community is watching closely for the next potential leap in performance capabilities.

This development signals a new era where open-source solutions are becoming increasingly competitive with their proprietary counterparts across various AI applications.

Why It Matters?

For Leaders: Benchmark your AI strategy against the best.
For Founders: Find investors aligned with your vision.
For Builders: Get inspired by the individuals shaping AI’s future.
For Investors: Track high-potential opportunities before they go mainstream.

Subscribe to the newsletter for the Gen Matrix report.

OpenAI's Voice Assistant Just Got Scary Good at Conversation

OpenAI logo

OpenAI unveiled improvements to its Advanced VoiceA charming animation inspired by Studio Ghibli, featuring vibrant colors and whimsical details Mode on Monday, refining the real-time conversational capabilities of ChatGPT's voice assistant. The updates specifically target two key areas: reducing unnecessary interruptions and making interactions feel more human-like.

Key Improvements in the Latest Update

Reduced Interruptions: The AI now better recognizes natural speech patterns, allowing users to pause without being cut off mid-thought.
Enhanced Personality: Paid subscribers (Plus, Teams, Edu, Business, and Pro tiers) receive more engaging, concise, and creative responses.
Free Tier Access: All users now benefit from the improved interruption handling during voice conversations.

OpenAI researcher Manuka Stratta demonstrated the upgrades in a company video, highlighting how the assistant now accommodates natural conversation flow, including moments when users pause to gather their thoughts.

🚨 BREAKING: OpenAI just unveiled its new voice assistant.
THIS IS INCREDIBLE!
— Zain Kahn (@heykahn)
5:22 PM • May 13, 2024

Competitive Pressure in the Voice AI Space

The enhancements arrive as competition intensifies in the conversational AI market:

Startup Sesame (backed by Andreessen Horowitz) gained attention for its remarkably natural-sounding assistants, Maya and Miles.
Tech giant Amazon is preparing to launch an LLM-powered version of Alexa.
The updates suggest OpenAI is responding to user feedback about robotic interaction patterns that plagued earlier versions.

The company appears to be balancing accessibility (with free tier improvements) against premium features for subscribers in this increasingly competitive space.

Sam Altman Drops Bombshell for AI Art Fans: ChatGPT's Free Ghibli-Style Images at Risk

A charming animation inspired by Studio Ghibli, featuring vibrant colors and whimsical details.

The internet's latest obsession with transforming photos into dreamy Studio Ghibli-style anime portraits has hit a snag for ChatGPT users. OpenAI has temporarily restricted this popular feature while implementing new usage limits, leaving many fans disappointed.

So you’re telling me I can take a popular meme, transform it into Studio Ghibli style anime, and people will just like it?
— Kaz 🎙️ (@btcKaz)
4:00 PM • Mar 26, 2025

What's Changing:

Free users can no longer generate Ghibli-inspired images (effective immediately).
Coming soon: Daily cap of 3 image generations for free-tier accounts.
Human face transformations remain blocked (only general illustrations are allowed).

OpenAI CEO Sam Altman addressed the situation on X, explaining: "We're thrilled people love creating images, but our GPUs are overwhelmed. Temporary rate limits will help while we improve efficiency. Free users will soon get 3 daily generations - hopefully this won't last long!".

Behind the Restrictions:

The massive popularity of GPT-4o's image generator - especially for creating Ghibli-esque artwork - has strained OpenAI's processing capacity. The viral trend caused:

Unprecedented server loads.
GPU resource shortages.
Frequent error messages for free users.

It's been 24 hours since OpenAI unexpectedly shook the AI image world with 4o image generation.
Here are the 14 most mindblowing examples so far (100% AI-generated):
1. Studio ghibli style memes
— Barsee 🐶 (@heyBarsee)
1:43 PM • Mar 26, 2025

While the company works on technical optimizations, art enthusiasts will need to either:

Be selective with their 3 daily free creations.
Consider upgrading to a paid plan (which currently has no announced limits).
Explore alternative AI art tools.

No timeline has been given for when full access might be restored, but Altman promises improvements are coming. The situation highlights the challenges of meeting explosive demand for creative AI tools while maintaining system stability.

Stay in touch with us by subscribing to our newsletter to receive weekly pieces and updates.

Please rate our newsletter below. Your feedback matters to us.

How's your experience?

Thank you for reading

-Shen Pandi & Team