Towards AGI
Posts
Grammarly’s AI Genius Upgrade: How GenAI Is Crushing ChatGPT’s Reign

Grammarly’s AI Genius Upgrade: How GenAI Is Crushing ChatGPT’s Reign

ChatGPT’s Worst Nightmare.

Shen Pandi
May 06, 2025

Here is what’s new in the AI world.

AI news: AI Writing Wars Escalate.

What’s new: From Rivals to Riches.

OpenAI: Kevin Weil’s Warning

Open AI: Meta & Cisco Declare War on Hackers

Hot Tea: AI’s Boys’ Club

Grammarly’s Silent AI Takeover: Rewriting the Rules with GenAI Wizardry

To showcase the impact of its AI-driven writing tools, Grammarly has adopted a bold approach: A/B testing with enterprise clients. The company splits customer support teams into two groups, one equipped with its full suite of generative AI and large language model (LLM) capabilities, and the other using a pared-down version focused solely on error detection.

Over two to four weeks, the experiment reveals stark contrasts in performance, consistently showing that teams leveraging advanced AI features achieve a 17% rise in customer satisfaction.

Why It Works

The AI-enhanced group produces clearer, error-free communications aligned with brand voice, while also responding faster. “These tools transform writing from a task into a strategic asset,” explained Luke Behnke, Grammarly’s Head of Enterprise Product, on the Targeting AI podcast. The results underscore how LLMs elevate communication beyond basic proofreading, fostering efficiency and consistency.

Words that work. Work that flows. Say goodbye to juggling too many apps and hello to AI that works for you so you can get more done.
With Grammarly and @coda_hq , clear communication meets intuitive collaboration for results that speak volumes. gram.ly/4iSHTbI
— Grammarly (@Grammarly)
5:58 PM • Apr 2, 2025

From Grammar Checks to Productivity Powerhouse

Initially launched in 2009 with natural language processing (NLP) for error detection, Grammarly has evolved into a multifaceted productivity platform. By integrating generative AI and over 800 third-party systems, including Salesforce and Google Calendar, the company now positions itself as a hub for “productivity agents” that streamline workflows. “

AI has unlocked capabilities far beyond what we imagined,” Behnke noted, emphasizing their shift toward holistic productivity solutions.

Hosted by Esther Shittu (AI-focused journalist) and Shaun Sutner (senior tech editor), the Targeting AI podcast highlights how Grammarly’s strategy reflects broader trends in enterprise AI adoption.

By quantifying tangible benefits through controlled experiments, the vendor not only validates its technology but also sets a precedent for demonstrating ROI in AI investments.

As businesses increasingly seek AI tools that deliver measurable outcomes, Grammarly’s approach offers a blueprint for bridging the gap between innovation and proven value.

How OpenAI’s $3B Windsurf Buy Could Redefine AI

OpenAI is reportedly in advanced discussions to acquire Windsurf, an AI-powered coding assistant previously known as Codeium, in a deal valued at approximately $3 billion, according to Bloomberg.

While the transaction remains pending, it marks OpenAI’s largest acquisition to date, aimed at bolstering ChatGPT’s programming capabilities amid intensifying competition.

Windsurf’s Growth Trajectory

Valuation Surge: Windsurf, previously operating as Codeium, has recently held discussions with investors such as General Catalyst and Kleiner Perkins to secure funding targeting a $3 billion valuation, Bloomberg News reported.
Strategic Fit: The acquisition aligns with OpenAI’s push to enhance its coding tools, having already integrated upgrades into newer AI models.

OpenAI’s Expansion Strategy

Recent Acquisitions: Last year’s purchase of Rockset, a search and analytics startup, for hundreds of millions in stock, strengthened enterprise infrastructure.
User Growth: Weekly active users surged to 400 million in February, up from 300 million in December 2023.
Funding Ambitions: Plans for a 40 billion fundraising round led by SoftBank.
The deal, if finalized, would position OpenAI at the forefront of AI-driven software development innovation.

We're excited to announce we’ve launched several improvements to ChatGPT search, and today we’re starting to roll out a better shopping experience.
Search has become one of our most popular & fastest growing features, with over 1 billion web searches just in the past week 🧵
— OpenAI (@OpenAI)
8:06 PM • Apr 28, 2025

As AI coding assistants like GitHub Copilot and Amazon CodeWhisperer gain traction, OpenAI’s move signals a bid to dominate developer tools by merging Windsurf’s specialized capabilities with ChatGPT’s expansive reach.

The deal, if finalized, would position OpenAI at the forefront of AI-driven software development innovation. OpenAI declined to comment, while Windsurf has not yet responded to inquiries.

The Gen Matrix Advantage

In a world drowning in data but starved for clarity, Gen Matrix second edition cuts through the clutter. We don’t just report trends, we analyze them through the lens of actionable intelligence.

Our platform equips you with:

Strategic foresight to anticipate market shifts
Competitive benchmarks to refine your approach
Network-building tools to forge game-changing partnerships

US vs. China: OpenAI’s Secret Plan to Dominate the Open-Source AI Race

OpenAI is developing an open-weights AI model designed to be one generation behind its cutting-edge proprietary systems, aiming to establish it as the world’s most widely adopted open model. Built in the U.S., the initiative seeks to prioritize democratic values and counterbalance potential advancements by geopolitical rivals like China.

OpenAI CPO, Kevin Weil:
We're preparing to release an open-weight model soon, built on democratic values.
It won’t be the frontier model and will be one generation behind on purpose to avoid accelerating rivals like China.
— Haider. (@slow_developer)
2:07 PM • May 5, 2025

Key Objectives

Global Leadership: OpenAI CPO Kevin Weil emphasized the goal at The Hill and Valley Forum: “I want the best open-weights model in the world to be a U.S. model, reflecting democratic principles—not one developed by China.”
Balancing Access and Security: The model will offer robust capabilities for widespread use while mitigating risks of strategic acceleration by adversarial nations. “A frontier model release could aid China,” Weil noted, “but this open model allows global adoption without compromising security.”

Model Specifications

Performance Tier: The open-weights model will lag behind OpenAI’s proprietary “frontier” systems (e.g., GPT-5), serving as a high-quality but non-cutting-edge alternative.
Proprietary Frontier Tech: Advanced models will remain closed-source, ensuring competitive and security advantages.

Developer Collaboration & Vision

CEO Sam Altman highlighted feedback from developers as pivotal: “Their insights were unexpected but achievable. We’re poised to deliver something extraordinary.” The project aligns with OpenAI’s broader restructuring into a public benefit corporation (PBC) and its pending acquisition of AI coding platform Windsurf, signaling a focus on ethical scalability.

By releasing a democratically aligned open model, OpenAI aims to set a global standard that counters authoritarian tech influence, fostering innovation while safeguarding U.S. strategic interests.

Meta & Cisco’s Open-Source LLMs Are Winning the Arms Race

As cyberthreats escalate at machine-scale velocity, open-source large language models (LLMs) are emerging as pivotal infrastructure for adaptive, cost-efficient defenses. At RSAC 2025, Cisco, Meta, and ProjectDiscovery unveiled groundbreaking initiatives that signal a shift toward collaborative, community-driven innovation in cybersecurity.

Cisco’s Foundation-sec-8B: A Cybersecurity-First LLM

Cisco’s newly launched Foundation-sec-8B, developed by its Foundation AI group, is an open-source LLM purpose-built for cybersecurity. Trained on a curated dataset—including CVEs, MITRE ATT&CK frameworks, and real-world incident reports—the model excels in threat detection and response.

Key features:

Tailored Training: Combines vulnerability databases, red-team playbooks, and compliance guidelines for deep threat understanding.
Benchmark Superiority: Matches or outperforms larger general-purpose models like Llama-3.1-70B in cybersecurity tasks (e.g., 75.26 vs. 72.66 in CTI-RCM).
Accessibility: Runs efficiently on minimal hardware (1–2 Nvidia A100 GPUs) under Apache 2.0, avoiding vendor lock-in.

Cisco’s vision: Transform competitors into collaborators by open-sourcing tools that unify defenses across SOC, DevSecOps, and threat intelligence workflows.

Meta, Cisco put open-source LLMs at the core of next-gen SOC workflows ow.ly/Cw1S10682uj
— Dr Constantina (@startdoms)
5:49 AM • May 6, 2025

Meta’s AI Defenders Suite: Fortifying Generative AI

Meta expanded its open-source security toolkit with:

Llama Guard 4: Multimodal classifier for policy compliance in text/images.
LlamaFirewall: Real-time framework integrating PromptGuard 2 (detects prompt injections) and CodeShield (mitigates code vulnerabilities).
CyberSec Eval 4: Benchmarking suite co-developed with CrowdStrike, assessing AI efficacy in SOC scenarios and auto-patching capabilities.

The Llama Defenders Program offers early access to tools like sensitive-document classifiers, while Private Processing pilots on-device AI in WhatsApp, prioritizing privacy.

ProjectDiscovery’s Nuclei: Community-Powered Defense

Winning RSAC’s “Most Innovative Startup,” ProjectDiscovery’s Nuclei leverages a global community to maintain 11,000+ detection templates (3,000+ CVE-specific). This open-source scanner identifies vulnerabilities across APIs, clouds, and networks, embodying Gartner’s call for democratized security via SBOM frameworks and open-source governance.

Strategic Takeaways for Security Leaders

Collaborate, Don’t Compete: As Cisco’s Jeetu Patel emphasized, adversaries—not rivals—are the true enemy. Open-source models like Foundation-sec-8B enable shared defense infrastructure.
Prioritize Specialization: Generic AI models fall short. Domain-specific training (e.g., Meta’s threat behavior mappings) ensures accuracy and speed.
Embrace Affordability: Advanced threat analytics no longer require exorbitant costs. Open-source tools democratize access, slashing infrastructure expenses.

Why It Matters?

For Leaders: Benchmark your AI strategy against the best.
For Founders: Find investors aligned with your vision.
For Builders: Get inspired by the individuals shaping AI’s future.
For Investors: Track high-potential opportunities before they go mainstream.

Gender Bias in Code: Study Reveals Open-Source AI Bots Discriminate Against Women

A new study reveals that open-source AI models tend to favor men over women when recommending job candidates, especially for higher-paying roles, highlighting persistent gender bias issues as AI becomes more common in corporate hiring and HR processes.

While bias in AI is a known concern, researchers say the problem remains unresolved. “We don’t know exactly which companies are using these models, as they rarely disclose it,” said Rochana Chaturvedi, a PhD student at the University of Illinois and co-author of the study. She noted that more transparency could be critical for meeting regulatory standards.

Chaturvedi, along with Sugat Chaturvedi, an assistant professor at Ahmedabad University, analyzed several mid-sized open-source large language models (LLMs) to test for gender bias in hiring decisions.

AI are just more sophisticated chat bots. They regurgitate the data they are trained on. That includes a lot of leftist bullshit like gender ideology. If any actual AI with cold and calculating logic were asked to evaluate that incoherent mess of an ideology it would shit on it.
— Loki (@Lowkey1324)
6:04 PM • May 3, 2025

Their paper, "Who Gets the Callback? Generative AI and Gender Bias", evaluated models including Llama-3-8B-Instruct, Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Granite-3.1-8B-it, Mistral-8B-Instruct-2410, and Gemma-2-9B-it.

Using over 330,000 English-language job listings from India’s National Career Services portal, they asked each model to choose between two equally qualified male and female candidates.

They then measured how often each model chose the female candidate, known as the "female callback rate," and analyzed whether job ads hinted at gender preferences, which are technically illegal in most parts of India but appeared in 2% of listings.

The researchers concluded that most models reinforced gender stereotypes by frequently selecting women for lower-wage positions. They attributed these patterns to gender biases in training data and to "agreeableness bias" introduced during the reinforcement learning phase.

The models’ performance varied significantly. For example, Mistral had a female callback rate of just 1.4%, while Gemma hit 87.3%. Llama-3.1 showed the most balanced performance, with a 41% female callback rate and the highest tendency (6%) to avoid considering gender at all, suggesting Meta’s fairness mechanisms are stronger than those in other models.

Even when the researchers adjusted models to ensure equal callback rates for men and women, wage disparities persisted. Women were still more likely to be recommended for lower-paying roles. The smallest wage gaps appeared in Granite and Llama-3.1 (around 9 log points), while the worst gaps were found in Mistral (≈ 84 log points) and Gemma (≈ 65 log points). Interestingly, Llama-3 favored women slightly, showing a wage penalty for men.

The study also explored how AI "personality" traits, such as agreeableness and altruism, often embedded during training, impact outcomes. These personality traits may be communicated through system prompts or embedded in training data. As an example, a model might be prompted to act as a trustworthy and sympathetic persona.

To test this, researchers had models simulate 99 historical figures. Female callback rates soared above 95% when the models mimicked figures like Mary Wollstonecraft or Margaret Sanger. However, when prompted to emulate controversial figures like Hitler or Stalin, models often refused to make a decision, as safety filters were triggered.

Callback rates for women dropped slightly (2–5%) when models were instructed to act like Ronald Reagan, Queen Elizabeth I, Machiavelli, or D.W. Griffith. Women received the highest-paying job recommendations when the persona was Margaret Sanger or Vladimir Lenin.

The researchers advocate for using real-world data to audit AI models, arguing it complements traditional benchmark tests. They say models like the Llama-3.1 variant can be fine-tuned for fairer hiring outcomes and stress the importance of understanding these biases, especially in light of international regulations like the EU’s AI ethics guidelines and India’s governance framework.

However, with the U.S. having recently repealed its federal AI oversight measures, job seekers there may have to rely on luck—or Lenin’s approval.

Your opinion matters!

Hope you loved reading our piece of newsletter as much as we had fun writing it.

Share your experience and feedback with us below ‘cause we take your critique very critically.

How did you like our today's edition?

Thank you for reading

-Shen & Towards AGI team