- "Towards AGI"
- Posts
- Nvidia Unveils Generative Physical AI and Agentic AI Breakthroughs at CES
Nvidia Unveils Generative Physical AI and Agentic AI Breakthroughs at CES
Nvidia defines physical AI as the integration of artificial intelligence into humanoids, industrial systems, and other devices.
A Thought Leadership platform to help the world navigate towards Artificial General Intelligence We are committed to navigate the path towards Artificial General Intelligence (AGI) by building a community of innovators, thinkers, and AI enthusiasts.
Welcome to Gen Matrix: Your Guide to GenAI Innovation and Adoption across Industries
Discover the trailblazers driving AI innovation with Gen Matrix.
Our platform showcases:
Organizations: Industry early adopters integrating Generative AI
Startups: GenAI innovators in hardware, infrastructure, and applications
Leaders: Influential figures shaping the GenAI ecosystem
Why Choose Gen Matrix?
Stay ahead with our comprehensive insights into the evolving Generative AI landscape. Coming Soon: Our inaugural Gen Matrix launches December 2024. Sign up now to access the report!
Click here for the Matrix report.
TheGen.AI News
Nvidia Unveils Generative Physical AI and Agentic AI Breakthroughs at CES
At CES in Las Vegas, Nvidia showcased a series of groundbreaking AI advancements, with a focus on generative physical AI poised to revolutionize factory and warehouse automation.
“Building AI factories requires an entirely new computing stack, with accelerated computing at data center scale,” said Rev Lebaredian, Nvidia’s VP of Omniverse and Simulation Technology, during a press briefing.
Nvidia defines physical AI as the integration of artificial intelligence into humanoids, industrial systems, and other devices. Unlike large language models (LLMs), which predict tokens (text) or pixels (images/videos), physical AI operates in three-dimensional environments, enabling applications in surgery, smart cities, and industrial automation. According to Lebaredian, investments in agentic AI today will lead to significant breakthroughs in physical AI innovation, with billions of physical and virtual robots expected to emerge.
Nvidia classifies robots into three categories: knowledge robots (agentic AI), generalist/humanoid robots, and transportation robots (autonomous vehicles). These robots utilize physical AI models to interact with their environments, executing actions based on instructions rather than generating text or images.
Lebaredian explained that physical AI will transform industrial markets by integrating AI into 10 million factories and 200,000 warehouses. He emphasized that Nvidia’s robotics and automotive businesses extend beyond hardware, focusing instead on AI factories that combine synthetic data and real-world data to train models for edge-case scenarios.
To accelerate this progress, Nvidia has introduced a “three computer solution”:
Nvidia DGX: AI training in data centers.
Nvidia Omniverse on OVX systems: Simulation and synthetic data generation.
Nvidia AGX: In-vehicle processing for real-time sensor data.
Nvidia unveiled Cosmos, a platform designed to streamline the training of physical AI models. Central to Cosmos are generative world foundation models (WFMs), which simulate real-world environments and predict outcomes using text, images, or videos. These models allow developers to create photoreal, physics-based synthetic data for training robots and autonomous vehicles, drastically reducing costs and time.
Cosmos supports:
Video search for specific scenarios (e.g., snowy roads, crowded warehouses).
Photorealistic video generation from controlled 3D environments.
Multiverse simulations to predict and evaluate AI model outcomes.
Cosmos WFMs, available under an open model license, can be tailored to developers' datasets, such as video recordings from autonomous vehicles or warehouse operations.
In addition to physical AI, Nvidia expanded its support for agentic AI, introducing new models, blueprints, and ecosystem updates. “AI agents represent the digital workforce that will assist and collaborate with us,” said Lebaredian.
Nvidia introduced the Nemotron family of models, optimized versions of Meta’s open-source Llama models. They are designed for efficiency and scalability, available in three variants:
Nano: Cost-efficient for PCs and edge devices.
Super: High performance on a single GPU.
Ultra: Maximum accuracy for data centers.
AI Blueprints and Partner Innovations
Nvidia revealed new AI blueprints:
PDF to podcast: Converts documents into interactive podcasts.
Video analytics agent: Enables video search, summarization, and report generation.
Nvidia partners also announced additional blueprints:
CrewAI: Code documentation for developers.
Daily: A voice agent blueprint.
LangChain: Enhanced report generation from web searches.
LlamaIndex: A document research assistant for blogs.
Weights & Biases: W&B Weave for debugging and tracking AI performance.
Accenture introduced an AI Refinery for Industry with 12 specialized agent solutions, addressing areas like clinical trials, asset troubleshooting, revenue management, and B2B marketing.
Nvidia CEO Jensen Huang emphasized the transformative potential of these innovations, predicting a future where AI-powered robots and systems reshape industries globally.
Beyond GenAI: Google DeepMind Ushers in the Era of GenWorld
On January 6, Google DeepMind announced the formation of a new team dedicated to developing "massive" generative models designed to "simulate the world." These models represent a significant leap in artificial intelligence (AI), enabling advancements in decision-making, planning, and creativity.
World models are computational frameworks that allow AI systems to understand and simulate real or virtual environments. They are instrumental in teaching AI to navigate spaces and have broad applications in robotics, gaming, and autonomous systems. For instance, autonomous vehicles rely on world models to simulate traffic and road conditions, while generalist AI robots use them to train in diverse settings. A common challenge in this field is the lack of rich and varied training environments for embodied AI.
DeepMind highlighted the importance of scaling AI models in its job postings, stating that pretraining on video and multimodal data is critical for achieving artificial general intelligence (AGI). These world models are expected to enhance areas such as visual reasoning, simulation, planning for embodied agents, and real-time interactive entertainment.
The new team will be led by Tim Brooks, who previously co-led the development of OpenAI’s video generation model, Sora. Job listings reveal that the team will build on the work of Google’s flagship AI models, including Gemini (a multimodal model), Veo (video generation model), and Genie (a world model).
DeepMind’s emphasis on world models aligns with efforts by other players in the AI industry. Last September, the startup World Labs emerged from stealth with $230 million in funding to develop large-scale world models. The company, founded by Stanford AI pioneer Fei Fei Li, is backed by industry leaders like Geoffrey Hinton, Marc Benioff, and Reid Hoffman, among others.
DeepMind has already made strides in world model development with tools like Genie and Genie 2. While Genie generated 2D worlds, Genie 2 can create dynamic 3D worlds that respond to user actions. Trained on extensive video datasets, Genie 2 compresses video frames into simplified representations using an autoencoder. A transformer model then analyzes these compressed frames to predict video progression, similar to text-generation processes in models like ChatGPT.
Genie 2 showcases capabilities such as object interaction, character animation, realistic physics, and agent behavior modeling. These generated environments can last up to a minute, with most spanning 10 to 20 seconds.
DeepMind’s expansion into world models further enhances its competitive edge against rivals like OpenAI, Meta, Microsoft, and Amazon in delivering AI solutions for enterprises. The company’s innovations, including the groundbreaking AlphaFold2—which solved a 50-year biochemistry challenge—have already earned CEO Demis Hassabis and John M. Jumper Nobel Prize recognition.
In addition to world models, DeepMind has explored other applications of AI. For example, a recent study described the Habermas Machine, a large language model trained to mediate controversial discussions, such as Brexit, by drafting group statements that reflect shared viewpoints.
DeepMind’s focus on world models positions it to redefine AI’s capabilities, particularly in environments requiring complex simulations and interactive solutions.
TheOpensource.AI News
4M Framework Redefines Multimodal AI with Open-Source Accessibility
EPFL researchers have introduced 4M (Massively Masked Multimodal Modeling), an advanced open-source framework for training versatile and scalable multimodal foundation models. Unlike traditional large language models (LLMs) like OpenAI's ChatGPT, which are trained solely on text data, 4M is designed to process and integrate various modalities such as images, videos, sounds, and even biological or atmospheric data, significantly expanding the capabilities of generative AI.
Generative AI models have transformed tasks by leveraging vast amounts of text data, but the future lies in multimodal models capable of processing diverse types of input and output. These models are essential for applications in robotics, gaming, autonomous systems, and more. For example, autonomous vehicles can use multimodal models to simulate complex road conditions, and generalist robots can train across diverse environments.
However, developing a single model that handles a variety of inputs (e.g., text, images, videos) and tasks has been challenging. Past attempts often resulted in performance trade-offs compared to specialized models, with critical information from certain modalities being overlooked.
Developed by EPFL’s Visual Intelligence and Learning Laboratory (VILAB) with support from Apple, 4M represents a leap forward in multimodal modeling. Unveiled at the NeurIPS 2024 Conference, the framework addresses key limitations of previous models.
“4M goes beyond language—it can interpret various modalities like visual data or sensory information, providing a richer understanding of the physical world,” said Assistant Professor Amir Zamir, head of VILAB. For instance, 4M can model an orange not just by describing it in text but also by processing its appearance (pixels) and touch (texture).
This comprehensive approach creates a more grounded representation of reality, addressing one of the major criticisms of LLMs: their lack of connection to the physical world.
Despite its advancements, 4M still faces obstacles. Zamir notes that instead of forming a truly unified representation of multiple modalities, the model appears to rely on separate, task-specific solutions that work together to solve problems. This approach, while effective, limits the model's ability to fully integrate its knowledge.
The team is working to overcome these challenges by creating a more unified and structured architecture. Their goal is to make 4M a generic, open-source model that can be customized for specific applications, such as climate modeling, biomedical research, and more. This adaptability would allow experts in various fields to tailor the framework to their unique requirements.
One of the team’s main goals is to make 4M widely accessible, enabling other researchers to refine and apply it to their domains. Doctoral assistants Oguzhan Fatih Kar and Roman Bachmann, co-authors of the 4M research paper, highlighted the importance of open-source frameworks in fostering innovation across diverse industries.
Zamir envisions a future where AI models move beyond text to include sensory inputs, enabling them to mimic the way humans combine multiple senses to understand the world. “While humans ground their knowledge in sensory experiences and later add structure through language, current AI models work in reverse. Our aim is to develop grounded, multimodal models that can serve as effective world representations,” he explained.
4M marks a significant step forward in bridging the gap between language-centric AI and truly multimodal, versatile systems. While challenges remain, the team is optimistic about the potential of this technology to reshape AI applications across industries.
Open Source vs. Geopolitics: Why Global Cooperation Matters
Despite numerous positive reflections on open source achievements throughout 2024, concerns about its future cast a shadow. Among these concerns, one of the most significant developments occurred on October 28, involving the exclusion of Russian Linux Kernel maintainers due to U.S. sanctions. This action, though partly reversed for one maintainer after confirming he was no longer employed by a sanctioned company, underscores the growing impact of geopolitics on open source.
This event highlights a troubling shift in how international relations intersect with open source communities. The exclusion was a direct result of U.S. sanctions, raising questions about how political decisions affect global collaboration. Open source, historically celebrated for transcending borders, now faces the challenge of navigating these geopolitical entanglements.
The debate over geopolitical restrictions isn't limited to the Linux Kernel maintainers. In August 2024, the Open Tofu project took a more sweeping approach, banning Russian IP addresses and developers from contributing. Dan Lorenc, reflecting on this decision, argued on LinkedIn that U.S.-based projects shouldn’t concern themselves with supporting Russian users amid geopolitical tensions. While Lorenc’s stance sparked debate, it underscores the growing intersection of geopolitics and open source participation.
Unlike nationality, which is determined by birth and often immutable, participation in open source is a choice. Contributors willingly engage in collaborative communities, united by shared values of problem-solving and innovation. Historically, open source has emphasized inclusivity, allowing anyone with the necessary skills and adherence to project rules to contribute, regardless of nationality.
However, these recent exclusions challenge this principle. Excluding contributors based on their location or nationality threatens the foundational idea that open source should not be constrained by political boundaries. As some have argued, “open source is not local source,” and it thrives on its global nature.
The impact of geopolitics on open source isn’t new. Post-Brexit, UK representatives faced restrictions on participating in EU-funded open source initiatives. These exclusions, driven by political decisions, demonstrate how geopolitics can undermine the collaborative ethos of open source. For those committed to global technological collaboration, these barriers are both frustrating and counterproductive.
The exclusion of Russian Linux Kernel maintainers highlights the complexities of applying export control regulations to open source. While the Linux Foundation has not provided a detailed explanation, the action reflects a broader trend of political influence in open source governance. Interestingly, the Linux Foundation’s website notes that U.S. export control regulations typically do not apply to open source software, as its public availability exempts it from such restrictions. Despite this, geopolitical considerations have led to selective enforcement in this case.
These events raise critical questions about the future of open source in a politically divided world. Open source’s strength lies in its ability to unite contributors across borders. Restricting participation based on geopolitics risks undermining this foundational principle. As discussions continue at forums like the State of Open Con, the open source community must confront these challenges to preserve its global, inclusive nature. Without a collaborative future that transcends borders, the full potential of open source—and its role in shaping AI and digital innovation—may remain unrealized.
TheClosedsource.AI News
OpenAI Faces Losses on $200/Month ChatGPT Pro Plan, Says CEO Sam Altman
OpenAI CEO Sam Altman revealed on Sunday that the company is losing money on its $200-per-month ChatGPT Pro plan, as user activity has exceeded expectations.
“I personally set the price,” Altman shared on X, “and thought it would generate some profit.”
Launched late last year, ChatGPT Pro provides access to an enhanced version of OpenAI's o1 "reasoning" AI model, o1 pro mode, and removes rate limits on several other tools, including the Sora video generator. However, the $2,400 annual price tag initially faced skepticism, and the value of o1 pro mode remains unclear. Despite this, Altman’s comments suggest that users who subscribed are heavily utilizing the service, to OpenAI's financial detriment.
This isn’t the first time OpenAI has set prices without rigorous analysis. In an interview with Bloomberg, Altman admitted that the original pricing for ChatGPT’s premium plan was somewhat arbitrary. “We tested $20 and $42. People found $42 too high but were happy with $20. We went with $20. It wasn’t based on a formal pricing study.”
OpenAI, which has raised around $20 billion since its inception, is still not profitable. Last year, the company reportedly faced $5 billion in losses against $3.7 billion in revenue. High expenditures on staff, office space, and AI training infrastructure, including an estimated $700,000 daily operating cost for ChatGPT, are significant contributors.
OpenAI recently acknowledged the need for “more capital than anticipated” as it plans a corporate restructuring to attract new investments. To achieve profitability, the company is reportedly considering raising subscription prices and introducing usage-based pricing for certain services.
Despite its current financial challenges, OpenAI projects an ambitious revenue target of $11.6 billion for 2025 and $100 billion by 2029, a figure comparable to Nestlé’s annual sales.
Sam Altman Predicts AI Agents Will Enter Workforce by 2025, Eyes Superintelligence
OpenAI CEO Sam Altman predicts that by 2025, the workforce may welcome its first AI agents. Marking ChatGPT's second anniversary in a blog post, Altman expressed confidence in AI's potential. "We believe that providing people with exceptional tools leads to great, widely-shared outcomes," he wrote.
Reflecting on OpenAI's journey, Altman noted the uncertainty they faced nine years ago. "We had no idea what we would become, and even now, we only partially know. AI development has been full of twists and turns, and we expect more ahead," he said.
Altman shared OpenAI’s vision for a future product centered on superintelligence. "With superintelligence, we can achieve almost anything. These tools could significantly accelerate scientific discovery and innovation, driving abundance and prosperity far beyond human capability alone," he explained.
While the concept might sound like science fiction, Altman embraced the audacity of the idea. "We’ve been in this position before, and we’re comfortable being here again. In the next few years, we’re confident everyone will see what we see—that acting with care while maximizing benefits and empowerment is crucial."
Altman emphasized that OpenAI's work is inherently unconventional. "OpenAI cannot operate like a normal company. We founded it nearly nine years ago because we believed AGI was possible and could become the most transformative technology in human history. Back then, few cared, and most thought we had no chance of success," he said.
Altman remains steadfast in OpenAI's mission, driven by a vision of AI's transformative potential for humanity.
Start learning AI in 2025
Everyone talks about AI, but no one has the time to learn it. So, we found the easiest way to learn AI in as little time as possible: The Rundown AI.
It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.
Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses – tailored to your needs.
Don’t miss out on the insights driving the future of Artificial Intelligence! Join a community of researchers, developers, and AI enthusiasts to stay ahead of the curve in Generative AI. Each edition delivers exclusive updates, expert analysis, and thought-provoking discussions straight to your inbox. Subscribe today and be part of the journey toward AGI innovation.
Contact us for any paid collaborations and sponsorships.
Unlock the future of problem solving with Generative AI!

If you're a professional looking to elevate your strategic insights, enhance decision-making, and redefine problem-solving with cutting-edge technologies, the Consulting in the age of Gen AI course is your gateway. Perfect for those ready to integrate Generative AI into your work and stay ahead of the curve.
In a world where AI is rapidly transforming industries, businesses need professionals and consultants who can navigate this evolving landscape. This learning experience arms you with the essential skills to leverage Generative AI for improving problem-solving, decision-making, or advising clients.
Join us and gain firsthand experience in how state-of-the-art technology can elevate your problem solving skills using GenAI to new heights. This isn’t just learning; it’s your competitive edge in an AI-driven world.