• Towards AGI
  • Posts
  • Million Dollar Question! Why Preplexity, ChatGPT Free For Indians

Million Dollar Question! Why Preplexity, ChatGPT Free For Indians

The Long Game of Offering Free AI to India's Masses

Here is what’s new in the AI world.

AI news: The Real Reason AI is Free for Millions in India

Hot Tea: Your Secrets Aren't Safe

OpenAI: The Unseen Dangers in Your Browser

Open AI: Open-Source AI for the World's Languages

Democratizing AI or Creating Dependence? The Free-Tool Push in India

In a major strategic push, leading AI companies are giving millions of Indians free access to their chatbots for a year.

OpenAI is offering its new, low-cost ChatGPT "Go" model, following similar moves by Google and Perplexity AI, which have partnered with major Indian telecom providers to bundle their AI tools with mobile data plans.

Analysts say this is not an act of generosity but a calculated, long-term investment. The goal is to get India's vast, young, and digitally-native population, most of whom are under 24, "hooked" on generative AI before introducing paid subscriptions.

India presents a unique opportunity with over 900 million internet users and some of the world's cheapest data. It’s an open digital market, unlike the more restricted environment in China, which allows global tech firms to onboard millions of new users.

ChatGPT Plans

This provides invaluable, diverse data to train and improve their AI models, creating case studies that can be applied worldwide.

While this is a win-win for AI companies, it raises significant data privacy concerns. Experts note that users often trade their data for free services, and India's current regulatory landscape is flexible enough to allow these rapid rollouts.

The country lacks a dedicated AI law, and its broader data protection act is not yet fully enforced, making it easier for companies to operate here than in regions with stricter rules like the EU or South Korea.

The ultimate bet for these companies is on scale.

Even if a small percentage of the millions of free users eventually become paying subscribers, it represents a massive return on their initial investment, mirroring the strategy that successfully brought cheap internet to the masses.

This land-grab strategy is now unfolding in the race for Artificial General Intelligence (AGI). Just as telecom companies built the foundational infrastructure for the internet, Towards AGI is building the foundational models and research infrastructure for the AGI future.

By providing open-access research and scalable training frameworks today, we are not just participating in the market; we are actively shaping the ecosystem that will define the next era of intelligence.

Journey Towards AGI

Research and advisory firm guiding industry and their partners to meaningful, high-ROI change on the journey to Artificial General Intelligence.

Know Your Inference

Maximising GenAI impact on performance and Efficiency.

Model Context Protocol

Connect AI assistants to all enterprise data sources through a single interface.

The Hidden Cost of Conversation: Data Privacy Risks in the Chatbot Era

You have likely used free online services for years, but as AI pioneer Geoffrey Hinton has pointed out, the real currency was never money; it was your data. Looking at today's landscape, you can see how valuable that data has become for tech giants.

Now, with India's government set to enforce the 2023 Digital Personal Data Protection Act (DPDP), you must ask a crucial question: Will this landmark law actually curb how your personal information is used, especially by the new wave of generative AI platforms like ChatGPT and Gemini that you might be using daily?

Let's break down what the DPDP Act promises you. It aims to create a strict framework for how your personal data is handled. It mandates that organizations, known as data fiduciaries, must obtain your informed, specific, and unambiguous consent before they can process your information.

The law grants you, the data principal, clear rights: you can access your data, demand corrections, have it deleted, and seek grievance redressal.

Crucially, it enforces "purpose limitation," meaning a company cannot collect your data for one reason and then use it for another without coming back to you for fresh consent. It also imposes strong security standards to prevent breaches.

However, you will quickly discover a fundamental conflict. Generative AI platforms operate on a principle that seems directly at odds with this. When you interact with a chatbot, you provide inputs, questions, prompts, and even sensitive personal details.

These platforms typically harvest this information to train and refine their AI models.

You are often unaware of the extent to which your conversations are stored, reused, and mined to improve the service or build new ones. The consent mechanism you encounter is usually a broad, all-encompassing "Terms of Service" that you likely accept without reading.

This lack of granularity and transparency threatens to make the DPDP's core principles of specific consent and purpose limitation almost useless in practice.

Consider this scenario: you ask a GenAI chatbot for medical or financial advice, sharing private details. Under the DPDP, the platform should clearly state how this data will be used and get your explicit consent for anything beyond generating an immediate answer.

In reality, your sensitive query could be fed directly into the AI's training dataset, to be used and reused indefinitely for purposes you never explicitly approved. This creates a situation where the legal safeguards designed to protect you are easily circumvented.

It's clear to you that the rapid rise of GenAI presents a compliance challenge that the DPDP may not be equipped to handle. This doesn't mean the law is entirely powerless in your favor. It will likely retain strong enforcement capabilities in areas like targeted marketing and direct sales communication.

When a company wants to use your data for advertising or product recommendations, these are distinct, auditable activities that fit more neatly within the DPDP's framework of specific consent and purpose.

The real regulatory challenge you should be aware of lies in the blurred lines. Data collected from you under the guise of "improving AI services" can easily be funneled into algorithms that build a detailed consumer profile about you, which then fuels hyper-targeted advertising.

This means the DPDP still has influence, but protecting you will require far more sophisticated enforcement tools and cross-sector coordination than currently exist.

So, while the DPDP Act is undoubtedly an important step toward balancing technological innovation with your privacy rights, the explosive growth of GenAI has exposed significant gaps in its implementation.

The government's own AI Governance Guidelines report acknowledges this tension, admitting that the DPDP's strict consent norms don't align well with the fluid, multi-purpose data pipelines of AI.

It proposes solutions like enhanced transparency, better grievance mechanisms, and evolving consent tools, and is open to future amendments.

Yet, for you, the ultimate outcome remains uncertain. Given the global and rapidly evolving nature of these AI platforms, ensuring meaningful protection of your personal data may be far more difficult than anyone anticipated.

One thing is becoming clear: the DPDP Act, despite its promise, may not be the robust shield against data exploitation that you and other citizens were led to believe it would be.

New Browser Security Report Reveals Emerging Threats for Enterprises

According to the latest Browser Security Report, security leaders are confronting a critical convergence point for identity, SaaS, and AI risks: the user's browser.

Traditional security controls like DLP, EDR, and SSE operate at lower layers, missing the sophisticated threats now emerging in this parallel attack surface.

Security teams are facing unmanaged extensions acting as supply chain implants, GenAI tools accessed through personal accounts, sensitive data being pasted directly into AI prompts, and sessions that completely bypass single sign-on (SSO) protocols.

This article examines the report's key findings about this fundamental shift in enterprise security control points.

GenAI Becomes Primary Data Exfiltration Channel

The widespread adoption of GenAI in workplace workflows has created a massive governance gap. While nearly half of employees use GenAI tools, most access them through unmanaged personal accounts, completely outside IT visibility.

Critical statistics reveal:

  • 77% of employees paste corporate data into GenAI prompts

  • 82% of these data pastes originate from personal accounts

  • 40% of uploaded files contain sensitive PII or PCI information

  • GenAI accounts for nearly one-third of all corporate-to-personal data movement

Legacy DLP solutions cannot address this threat, as the browser has become the dominant channel for unmonitored, policy-free data exfiltration through copy/paste actions.

AI Browsers Represent Emerging Threat Category

A new category of 'agentic' AI browsers combines traditional browser security risks with AI-specific concerns.

Platforms like OpenAI's Atlas, Arc Search, and Perplexity Browser integrate large language models directly into the browsing experience, enabling real-time page analysis and summarization.

While these tools enhance user productivity, they create significant unmonitored risks:

  • Session memory leakage exposes sensitive data through AI personalization

  • Invisible "auto-prompting" sends page content to third-party models

  • Shared cookies create identity boundary confusion

  • These browsers effectively bypass traditional DLP, SSE, and security tools

Browser Extensions: Unmanaged Supply Chain Threat

The extension ecosystem represents one of the most widespread yet least governed software supply chains. With 99% of enterprise users running at least one extension, and over half granting high-risk permissions, the threat landscape is substantial:

  • 26% of extensions are sideloaded outside official channels

  • 54% are published by unverified Gmail accounts

  • 51% haven't received updates in over a year

  • 6% of GenAI-related extensions are classified as malicious

Identity Governance Gaps in Browser Sessions

Identity management breaks down at the browser level, with over two-thirds of logins occurring outside SSO systems and nearly half using personal credentials.

This creates complete visibility gaps for security teams about who is accessing what resources from where.

Key findings include:

  • 68% of corporate logins bypass SSO entirely

  • 43% of SaaS access uses personal accounts

  • 26% of users reuse passwords across multiple services

  • 8% of browser extensions access user identity data or cookies

Recent attacks like Scattered Spider demonstrate that browser session tokens have replaced passwords as the primary attack target.

SaaS and Messaging Platforms Enable Silent Data Exfiltration

Modern workflows have shifted from file uploads to browser-based pasting, AI prompting, and third-party plugins, with most activity occurring at the browser layer rather than within applications themselves.

Concerning patterns observed:

  • 62% of pastes into messaging apps contain PII/PCI data

  • 87% occur through non-corporate accounts

  • Users average 4 sensitive data pastes daily into unmonitored tools

Incidents like the Rippling/Deel breach demonstrate that data leaks increasingly originate from unmonitored chat applications within browsers rather than traditional malware or phishing attacks.

Traditional Security Tools Lack Browser Visibility

Current security solutions operate at the wrong layer, EDR monitors processes, SSE tracks network traffic, and DLP scans files. None provides visibility into browser session activities, including which SaaS tabs are open, what data is being pasted, or which extensions are injecting scripts.

Security teams remain blind to:

  • Shadow AI usage and prompt inputs

  • Extension activity and code modifications

  • Personal versus corporate account crossover

  • Session hijacking and cookie theft attempts

Session-Native Controls Emerge as Critical Solution

Regaining security control requires browser-native visibility and capabilities that operate at the session level without disrupting user experience. Essential controls include:

  • Monitoring copy/paste and upload activities across applications

  • Detecting unmanaged GenAI tools and extensions

  • Enforcing session isolation and universal SSO

  • Applying DLP to non-file-based interactions

Modern browser security platforms, as detailed in the full report, can deliver these capabilities without requiring users to switch browsers, representing the next frontier in enterprise security.

Meta's Omnilingual ASR Covers 1,600+ Languages

Meta has unveiled a revolutionary multilingual automatic speech recognition (ASR) system that dramatically surpasses existing solutions, supporting over 1,600 languages, far exceeding OpenAI's Whisper model, which covers just 99 languages.

What makes this breakthrough even more significant is its extensible architecture that enables developers to expand support to thousands more languages through zero-shot in-context learning.

Key Innovation: Zero-Shot Learning for Unlimited Expansion

The system's most groundbreaking feature allows users to transcribe new languages without retraining. By providing just a few audio-text examples during inference, the model can immediately adapt to transcribe additional content in that language.

This extends potential coverage to over 5,400 languages, essentially every spoken language with a known writing system.

Complete Open-Source Access

Unlike Meta's previous restrictive licensing approaches, Omnilingual ASR is released under the permissive Apache 2.0 license, allowing immediate commercial and enterprise use without limitations. The November 10 release includes:

  • Multiple speech recognition model families

  • A 7-billion-parameter multilingual audio representation model

  • A massive speech corpus covering 350+ underserved languages

  • Complete availability on GitHub, Hugging Face, and Meta's platform

Technical Architecture and Performance

The system encompasses four model types trained on 4.3 million hours of audio:

  • wav2vec 2.0 models for speech representation (300M–7B parameters)

  • CTC-based models for efficient transcription

  • LLM-ASR models combining speech encoders with text decoders

  • LLM-ZeroShot for inference-time language adaptation

Performance metrics demonstrate remarkable accuracy, achieving character error rates below 10% in 78% of supported languages, including 500+ languages never before covered by any ASR system.

Strategic Context: Meta's AI Reset

This release represents a strategic pivot for Meta following the disappointing reception of Llama 4 and subsequent leadership changes, including the appointment of Scale AI's Alexandr Wang as Chief AI Officer.

Omnilingual ASR serves as both a technical demonstration and reputational recovery, returning Meta to its strengths in multilingual AI while embracing truly open-source principles.

Community-Driven Data Collection

The training corpus was developed through partnerships with global organizations, including African Next Voices, Mozilla Foundation's Common Voice, and Lanfrica/NaijaVoices. Data collection emphasized culturally relevant, unscripted speech with local speaker compensation and rigorous quality assurance.

Enterprise Implications and Applications

For businesses operating in multilingual markets, Omnilingual ASR eliminates previous language barriers in speech-to-text applications:

  • Customer Support: Voice-based systems for previously unsupported languages

  • Education: Transcription and accessibility tools for diverse linguistic communities

  • Media: Subtitling and content localization across thousands of languages

  • Research: Preservation and study of endangered languages

The system's flexible deployment options, from high-performance 7B parameter models requiring 17GB GPU memory to lighter 300M parameter versions, ensure compatibility across various enterprise infrastructure requirements.

Access and Implementation

Developers can immediately access all resources through:

  • Code and Models: GitHub repository

  • Dataset: Hugging Face platform

  • Documentation: Meta AI blog

Installation is streamlined through PyPI with comprehensive API support for language identification and transcription pipelines.

Omnilingual ASR represents a paradigm shift from fixed-language models to adaptable frameworks that communities can extend themselves. As Meta states in their technical paper:

"No model can ever anticipate and include all of the world's languages in advance, but Omnilingual ASR makes it possible for communities to extend recognition with their own data."

This release not only advances the technical frontier of speech recognition but also demonstrates how open-source AI can bridge digital divides and empower linguistic diversity worldwide.

True technological empowerment, however, requires more than just access to models, it requires the infrastructure to connect them meaningfully to business applications and workflows. This is the mission of TowardsMCP.

Our platform provides the enterprise-grade orchestration layer that turns powerful, open-source AI models like Meta's into reliable, scalable, and secure business solutions.

We help you move from experimental potential to production-ready value, ensuring that breakthroughs in AI don't just exist in a repository, but actively drive your competitive advantage.

Your opinion matters!

Hope you loved reading our piece of newsletter as much as we had fun writing it. 

Share your experience and feedback with us below ‘cause we take your critique very critically. 

How's your experience?

Login or Subscribe to participate in polls.

Thank you for reading

-Shen & Towards AGI team