The Cable
Episode 01 · May 2026 Cable Signature
Dimension Labs · Causal Intelligence Program

Duolingo: How 502,269 Reviews Predicted a $20B Market Cap Destruction

A structured analysis of 502,269 Duolingo customer reviews identified, in week one of May 2025, the friction that Duolingo’s CEO publicly acknowledged on May 4, 2026 — twelve months later. In between, the stock fell 81% and roughly $20 billion in market capitalization was erased. This publication is the demonstration: customer voice is a leading indicator of public-market repricing, with a 9-to-12-month lead time over consensus.

−$20B
Market cap destroyed May 2025–May 2026
12 mo
Customer voice lead time over the market
502,269
Learner reviews analyzed
4 / 5
AI-pivot signals at p<0.001
Apr 1, 2025 → Mar 31, 2026 · Published May 12, 2026 · Episode 01
Section 01 · Executive Summary

The customer voice was a 12-month leading indicator of a $20B market cap destruction

Duolingo is the cleanest live case study available for two questions every consumer subscription business is now forced to answer: what happens when you pivot “AI-first” in a product customers believe requires human quality control, and when does engagement become a retention tax rather than a moat. This publication tests both against twelve months of customer voice — and against the SEC filings that confirmed the customer voice with a multi-quarter lag.

−81%
Stock decline
$544.93 (May 2025) → $105.15 (May 2026)
502,269
Reviews analyzed
473K Android + 29K iOS · Apr’25–Mar’26
17 pp
Q4 2025 bookings deceleration
+41% → +24% in a single quarter
28 pp
DAU growth compression
+49% (Q1’25) → +21% (Q1’26)
Headline finding Duolingo’s April 28, 2025 AI-first memo produced a measurable, statistically significant, persistent increase in negative customer voice specifically on course quality. Four of five required pre/post signals shift at p < 0.001 across the boundary, and AI mentions remain elevated 2.18× above baseline in Q1 2026, eleven months later. The same friction Duolingo’s CFO publicly acknowledged on May 4, 2026 (“extra friction in monetization is part of the reason DAU growth has slowed”) was visible in the customer voice in week 1 of May 2025. The customer-voice analysis had a 12-month lead time over the equity market’s repricing.
How this analysis was done Every finding below comes from a structured enrichment of 502,269 individual Duolingo reviews — every Google Play and Apple App Store review posted between April 1, 2025 and March 31, 2026. Each review was processed independently by a large language model against a 25-dimension extraction schema covering AI quality, engagement mechanics, pricing, competitive displacement, trust, and sentiment. The schema was designed before the analysis began and frozen prior to any statistical testing — the dimensions had to be capable of being applied to a review without seeing its star rating, source, or date. Every chi-squared statistic comes from a 2×2 contingency table with Yates’ continuity correction. Every verbatim quote is reproduced exactly as the reviewer wrote it. Dimension Labs turns unstructured customer text into structured signal; full methodology, dimension schema, statistical tests, and validation steps are in Appendix C.

The four findings this episode reports

Finding 01 · The AI-first pivot is a real, persistent customer-voice event
Apr 28, 2025 is a synchronized cliff: review volume spikes, average star rating drops sharply, and AI-mention share jumps simultaneously. AI mentions go from 0.72% pre-pivot to 4.53% post-pivot (χ² = 1,026). Q1 2026 AI mentions remain at 1.451% — 2.18× the pre-pivot baseline of 0.665%. This is not a transient backlash; it is a permanent baseline shift.
Finding 02 · Engagement mechanics behave like a retention tax when they gate practice
The October 20, 2025 Energy Points rollout is a second event-week shock. Reviewers explicitly frame Energy as a disguised paywall — “can’t finish a lesson,” “watch ads or stop,” “bring back hearts.” Ten weeks later, Q4 2025 bookings growth crashed from +41% to +24% — the single largest quarterly deceleration in Duolingo’s public history.
Finding 03 · Competitive displacement is a mode switch, not a feature comparison
Competitor naming is low-frequency but high-intent. When users do name a substitute, the reference set has changed: ChatGPT as a generic AI tutor is now the dominant alternative alongside structured competitors (Memrise, LingoDeer, Mango). Android competitor-naming rate doubles post-pivot at χ² ≈ 33.
Finding 04 · The financial residual: a $20B market cap destruction with a 12-month lag
From a May 2025 peak of $544.93, DUOL closed May 11, 2026 at $105.15 — −81%, roughly $20B in market capitalization erased. FY2026 bookings growth guidance: 10–12%, down from the prior 40%+ trajectory. Management is investing $50M+ of foregone bookings to roll back friction. Section 6 closes this loop with the quarterly SEC-filing trail.

The portable read-through for subscription operators and public-market observers

What this provesWhat to copyOperational lever it speaks to
Customer voice leads equity-market repricing by 9–12 months.Run this analysis on your own customer voice within 60 days of any AI-feature announcement. The cost is rounding error vs. multiple-compression risk.Equity-market repricing · CFO briefing
AI-first announcements are read as quality-control removal until the assurance system is visible.Ship the QA / human-in-the-loop story before the AI narrative.Course/content reputation · paid retention
Mechanic that gates completion is interpreted as coercion — and shows up as a single-quarter bookings cliff.Treat practice time as sacred. Watch for the bookings-growth cliff 1 quarter after any monetization-friction change.Engagement health · trust capital · bookings growth
“AI does the work” rhetoric invites ChatGPT-as-tutor substitution.If your value prop becomes “our AI,” you train customers to compare you to general-purpose assistants.Competitive displacement risk
Section 01 · Chart 01 — Cross-source headline rates, full window
Shares of all reviews in-window (Apr 1, 2025 – Mar 31, 2026). Descriptive; causal-style chi-squared tests appear in Sections 2–5; financial confirmation in Section 6.
Section 02 · The AI-First Pivot Test

The week of April 28 is the centerpiece of this report

The primary hypothesis under test: did Duolingo’s late-April 2025 AI-first memo produce a measurable, statistically significant, persistent increase in negative learner voice specifically on course quality — or just generic product-change volatility? This section runs the test against five required signals and reports the result.

Primary hypothesis The April 28, 2025 AI-first memo produced a measurable, statistically significant, persistent increase in negative learner voice specifically on course quality. We test this against five enrichment dimensions across the Apr 28 boundary using chi-squared with Yates correction. The pre-period is Apr 1–27, 2025 (29,931 Android reviews). The post-period is Apr 28, 2025 – Mar 31, 2026.
36.45%
of Android reviews in the week of April 28, 2025 explicitly mention AI, up from a pre-event baseline of 0.82% the prior week. A 44× increase in seven days. iOS posted an even larger spike: 41.5% AI-mention share in the same week. The post-event residual through Q1 2026 has not returned to baseline.

2.1 The event-week signature: outcome cliff + mechanism cliff

The week of April 28 is a true discontinuity in three separate measures at once: review volume spikes, average star rating drops sharply, and AI-mention share jumps from background noise to dominant-theme levels. That pairing — outcome (stars) and mechanism (AI mentions) moving together in the same seven days — is what makes the AI narrative testable rather than coincidence. A typical product-change-cycle adverse spike produces the outcome cliff without the mechanism cliff; this one produced both.

Section 02 · Chart 01 — Weekly volume, average star, and AI-mention share
Apr 2025 – Mar 2026. Two vertical markers: AI-first memo (Apr 28, 2025) and Energy Points rollout (Oct 20, 2025).
AI-first memo · Apr 28, 2025 Energy Points · Oct 20, 2025
Volume in columns (Android, iOS), avg star rating on a secondary axis (line), AI-mention share on a tertiary axis (line). The synchronous spike at Apr 28 — volume + adverse + AI-mention — is the load-bearing visual evidence for this section.

2.2 The pre/post statistical test — 4 of 5 signals at p < 0.001

Five dimensions speak to the AI-quality hypothesis. Chi-squared with Yates correction across the Apr 28 boundary returns large, highly significant shifts on four of them, including the two highest-leverage signals (AI mention and trust erosion). The fifth (cultural-context issue) does not shift at the headline level; the section’s read on this is that cultural-context complaints are language-specific and require per-language unpacking rather than a corpus-wide rate test.

Signal (binary)Pre ratePost rateχ² (Yates)Significance
AI mentioned (ai_quality_signal ≠ no/none)0.72%4.53%1,026.29p < 0.001
Trust eroded post AI pivot (trust_signal)0.27%3.71%1,016.12p < 0.001
Translator layoff reaction (translator_layoff_reaction_signal)0.06%2.30%687.38p < 0.001
Course quality declined (course_quality_signal)2.95%5.50%373.13p < 0.001
Cultural-context issue (broken / inappropriate / language-specific)0.56%0.56%0.00n.s.
Section 02 · Chart 02 — Pre/post AI-pivot effect sizes
Bars are post-pivot rate minus pre-pivot rate (percentage points), Apr 28 boundary. Annotation reports χ² magnitude.
B2C Subscription Lever · Course / Content Reputation
The headline finding is read-throughable as a reputation-drag risk: when negative sentiment is pinned to content quality rather than price, the cost is not just near-term cancellations — it is sustained CAC inflation and reduced conversion to paid tiers. The mechanism is observable in real time: a synchronized cliff in adverse outcomes and AI-mention share within seven days of a public announcement. Any subscription product that ships an AI-feature announcement should run the same event-week test on its own customer voice immediately after launch.

2.3 The decay tail and the permanent residual

The harder question for a subscription operator is not whether an announcement causes a one-week blowup — most do. The question is whether the blowup leaves a permanent baseline shift. Six operational periods after April 28, we see the answer:

PeriodAndroid nAndroid AI %iOS niOS AI %
Pre-pivot baseline (Apr 1–27)29,9310.665%1,0302.233%
Event week (Apr 28 – May 4)13,01036.449%50641.502%
Decay tail (May – Aug)156,3716.590%7,5975.900%
Late 2025 (Sep – Dec)167,2832.190%7,8742.400%
Q1 2026 “new normal” (Jan – Mar)106,4411.451%12,2261.988%
Section 02 · Chart 03 — AI-mention residual by operational period
Pre-pivot baseline vs. event-week spike vs. four-month decay vs. late-2025 vs. Q1 2026. Android shows the cleanest permanent shift.

Android shows the clean shape: 0.665% pre-pivot, peak at 36.449%, slow decay through summer and fall, settling at 1.451% in Q1 2026 — 2.18× the pre-pivot baseline, eleven months after the memo. The residual is not zero. AI is now a persistent topic of complaint in a way it simply was not before April 28. iOS does not show the same clean residual elevation because the iOS pre-pivot baseline already sits at 2.23% (longer, more editorially opinionated reviews include more meta-talk about product strategy); the event-week spike on iOS is still unambiguous at 41.5%.

The methodological takeaway is portable: the “residual baseline shift” test is easiest to see on the high-volume, short-form surface (Google Play, in this case). Long-form surfaces (App Store, Reddit, Trustpilot) carry more baseline meta-commentary and require per-month analysis to read the shift cleanly.

2.4 The 10-K narrative test

The most analytically distinctive move in this report is treating Duolingo’s FY2025 10-K as a hypothesis to test against, not as background context. The 10-K (filed February 2026, six weeks of in-window post-filing customer voice) contains two positions on AI that cannot both be true at the same time. We can read which one the customer data supports.

Executive growth narrative

“Advanced AI, machine learning, and data analytics capabilities … allow us to leverage our data to optimize the learning experience … leads to compounding growth of core business metrics like DAUs and paid subscribers.”
Duolingo FY2025 10-K · Item 1 · Operating Metrics section

Risk-factor acknowledgment

“Our development, implementation and use of artificial intelligence and machine learning technologies … may not be successful, which may impair our ability to compete effectively, result in reputational harm and have an adverse effect on our business.”
Duolingo FY2025 10-K · Item 1A · Risk Factors

The customer data unambiguously supports the second position. 4 of 5 AI-quality dimensions shift significantly at p < 0.001 across the Apr 28 boundary. AI mentions remain elevated 2.18× through Q1 2026. The post-pivot cohort frames the change as “replacing humans” and ties it explicitly to quality decline. The 10-K’s own CFO and CEO already signed the risk-factor sentence; the customer data shows it materializing. Duolingo reports $873.4M subscription revenue (+44% YoY) and ~9% paid penetration of 130M+ MAUs — the customer-voice evidence is the kind of signal that threatens the quality narrative a subscription upsell depends on.

B2C Subscription Lever · IR-narrative-customer-data divergence risk
Every public consumer-AI announcement has a risk-factor sentence somewhere. The work is not in deciding what the narrative should be — it is in checking which sentence the customer data actually supports. The 10-K test pattern is portable: pull both the growth quote and the risk-factor quote, quantify the cohort supporting each, and publish the answer. Done internally before a quarterly call, this is a cheap way to avoid a CEO defending a position the customer data already contradicts.

2.5 Verbatim evidence — event week and Q1 2026 residual

“Company is going AI first… they don’t care about quality or users, only about profits. Cancelled subscription and uninstalled.”
Google Play · ★1 · 2025-04-28 ai_quality_signal = ai_replaced_humans_negative_framing · translator_layoff_reaction_signal = layoff_explicit_negative · learner_lifecycle_stage = cancellation_or_deletion
“After an unbroken seven-year streak, time to say goodbye at last to AI slop.”
Google Play · ★1 · 2025-04-29 ai_quality_signal = ai_made_courses_worse · streak_event_signal = streak_quit_to_protect_mental_health · learner_lifecycle_stage = cancellation_or_deletion
“Uses AI now, a lot of people are already seeing mistakes. Cancelled my Max.”
Google Play · ★1 · 2025-04-28 ai_quality_signal = ai_factually_incorrect · product_tier_mentioned = max · learner_lifecycle_stage = cancellation_or_deletion
“Recently there was a change … disrupted my learning … I learned pen to be bolígrafo and now it’s pluma but feather is also pluma.”
App Store · ★2 · 2026-03-31 ai_quality_signal = ai_generated_sentences_awkward · course_quality_signal = course_quality_declined · cultural_context_signal = language_specific_quality_issue
“Pretty worthless since they started using ai.”
Google Play · ★1 · 2026-03-31 ai_quality_signal = ai_made_courses_worse · learner_lifecycle_stage = lapsed_returning · overall_review_sentiment = very_negative_detractor

Read this section as a portable pattern, not a Duolingo story. Every consumer subscription business shipping AI features should run this exact test on its own customer voice within 60 days of the launch: identify the event boundary, run pre/post chi-squared on at least five AI-adjacent dimensions, watch for the joint outcome+mechanism cliff in the same week, and measure the residual six months later. If the residual is >2× pre-event baseline, the announcement landed as quality-control removal, not as a product improvement — regardless of what the press release said.

Section 03 · The Engagement-Mechanic Layer

When gamification becomes a retention tax

The backup-hypothesis section. Engagement mechanics — streaks, hearts, energy, leagues — are normally talked about as moats. The October 20, 2025 Energy Points rollout is a clean live test of when a mechanic stops being a moat and starts being a tax: users either pay to remove friction, or churn while writing the verbatim “disguised paywall.”

The control case for Section 02 Apr 28 and Oct 20 produce similar-shaped event-week shocks. If the AI-pivot effect were just “any product change causes a one-week spike,” the Energy event should look identical. It does not. The shapes diverge in decay tail and topical residual — the AI signal persists, the Energy signal mostly normalizes. That divergence is what makes Section 02’s claim distinguishable from generic volatility.
47.8%
of Android reviews in the week of October 20, 2025 — the Energy Points rollout week — were rated 1 or 2 stars, up from the prior week’s 9.4%. Average star rating fell from 4.38 → 2.98 in seven days. A second event-week shock, distinct from the AI pivot, with its own topical signature (“bring back hearts,” “can’t finish a lesson”).

3.1 The Energy rollout produced a second event-week shock

Energy Points replaced the 5-heart-lives mechanic on the free tier in late October 2025. Users describe the change in remarkably consistent language: the mechanic is experienced as a practice interrupter, not as a game element. The dominant verbatim shape is some variant of “running out of energy mid-lesson” or “can’t complete more than two lessons a day without watching ads.” The mechanic is read as monetization disguised as gamification.

Section 03 · Chart 01 — Weekly adverse share, Android vs iOS
Share of reviews rated ≤2★ each week. Two event markers: Apr 28 (AI-first memo) and Oct 20 (Energy Points). Note the Android-only Oct 20 spike.
AI-first memo · Apr 28 Energy Points · Oct 20
iOS adverse share runs persistently higher than Android (longer reviews, more critical). The Oct 20 spike is sharper on Android: iOS was already at elevated levels coming into October.

3.2 The retention-tax mechanism is visible in the verbatim

The qualitative pattern is consistent across Q1 2026 reviews: energy depletes fast, learning sessions end early, users either pay or stop. Reviewers describe it as a disguised paywall, not as game design. This is the operational definition of a retention tax — a mechanic that converts learning time into monetization pressure by interrupting practice mid-flow.

“They’ve switched to an ‘energy’ currency … you can be halfway through a lesson and run out.”
Google Play · ★1 · 2025-11-04 hearts_or_energy_signal = energy_replaced_hearts_negative · engagement_mechanic_signal = mechanic_blocks_learning · learner_lifecycle_stage = hearts_or_energy_depletion
“The new energy/battery thing is deadset garbage. I’d rather delete the app than pay for a subscription.”
Google Play · ★1 · 2025-10-25 hearts_or_energy_signal = energy_too_aggressive · pricing_or_paywall_signal = paywall_too_aggressive · learner_lifecycle_stage = cancellation_or_deletion
“The new energy system is just awful. Bring back those hearts!”
App Store · ★1 · 2025-10-22 hearts_or_energy_signal = energy_replaced_hearts_negative · feature_request_or_paywall_dispute = want_old_mechanic_back
“I have a 765 day streak and I’m considering quitting because they just switched me from lives to energy.”
Google Play · ★1 · 2025-11-08 hearts_or_energy_signal = energy_replaced_hearts_negative · streak_event_signal = long_streak_milestone_celebrated · gamification_emotional_response = mechanic_demotivating

3.3 The boundary test — partial signal, real shape

Chi-squared with Yates correction across the Oct 20 boundary shows a large, statistically significant adverse-rate shift on Android, in the direction of declining adverse share post-Energy. That result needs careful reading.

SourcePre-Energy adverse sharePost-Energy adverse shareχ² (Yates)Significance
Android (Google Play)15.93%13.66%474.72p < 0.001
iOS (App Store)43.31%21.93%1,520.28p < 0.001

The aggregate decline is an artifact of the pre-Energy period containing the entire AI-first event (Apr–May 2025), which inflated the pre-Energy adverse baseline. The real Energy signal is the Oct 20 event-week spike itself (Android adverse jumps from 9.4% to 47.8% in one week), not the full-window aggregate. This is a methodological warning for anyone replicating the test: when two events occur in the same window, aggregate pre/post comparisons across the second boundary can wash out the second event’s spike. The control-case framing requires week-level temporal analysis, not period-level aggregates.

B2C Subscription Lever · Engagement Health
The portable test: when you change an engagement mechanic, watch the event-week reaction, not the full-quarter aggregate. A single rollout week can create a permanent meme (“money hungry,” “pay-to-use”) even when later cohorts normalize. The cost is reputational drag, not week-over-week churn. Once a mechanic is monetized via friction, it cannot be easily rolled back — reversal (“bring back hearts”) is now framed as a retreat.

3.4 What “retention tax” looks like in your own data

Three diagnostics, portable to any subscription product with gamification:

  1. Event-week verbatim density. The week a mechanic changes, what share of reviews use the words “paywall,” “disguised,” “coercive,” “money hungry,” or “greedy” alongside the mechanic’s name? Background-rate language drift on these terms is small; an inflection above 3× the trailing-month baseline is signal.
  2. Reversion-request volume. “Bring back X” / “put X back” / “why did you remove X” verbatim count, weekly. This is the cleanest measure of user-felt loss because it requires the reviewer to actively want the old state.
  3. Streak-pressure mention rate. Background-rate gamification anxiety is a constant in this corpus — not event-driven, but a real ongoing tax visible in “streak anxiety,” “streak guilt,” and “quitting to protect mental health” language. Trends here move slowly but matter for long-term engagement design.
Section 04 · Competitive Displacement

Where frustrated learners go — and why “ChatGPT” is now the alternative

When a subscription product breaks faith with its customers, those customers don’t just leave — they substitute. In the reviews, substitution shows up as “I’m switching to X,” “if I wanted Y I’d use X,” or “just use ChatGPT instead.” This section names the substitution targets and quantifies how the reference set changed after Apr 28.

The most retention-decision-relevant finding Competitor naming is rare (~0.4% of reviews) but high-intent. The dominant change after Apr 28 is not that more users switch to Babbel or Rosetta Stone — it’s that ChatGPT and generic LLM tutors enter the substitution set. If your AI-feature pitch becomes “our AI does the work,” you train customers to compare you to general-purpose assistants.
1.95×
increase in Android competitor-naming rate across the Apr 28 boundary (0.254% → 0.495%, χ² = 32.5, p < 0.001). The post-pivot substitution set is structurally different: ChatGPT and “an AI tutor” appear alongside Memrise, LingoDeer, Mango, and Rosetta Stone. A controversy week is a shopping week.

4.1 Competitor naming is rare but spikes around controversy windows

Most dissatisfied users don’t name an alternative; they just leave. So competitor naming is a low-frequency, high-intent signal — rare in absolute terms, but a leading indicator when it appears. Across the analysis window, competitor naming stays in the low single-digit percentages. But it spikes in months where product controversy is salient: May 2025 (the AI-first backlash) and October 2025 (the Energy rollout).

Section 04 · Chart 01 — Monthly competitor-named share by source
Strict signal: competitor_mentioned ≠ none/no_competitor_named. Two event markers shown.
AI-first memo · Apr 28 Energy Points · Oct 20
Android competitor naming rises in May 2025 (0.96%, AI-first window) and October 2025 (0.58%, Energy window). iOS rate runs higher overall but more volatile.

4.2 The pre/post Apr 28 test on Android — competitor naming nearly doubles

SourcePre-pivot competitor namingPost-pivot competitor namingχ² (Yates)Significance
Android (Google Play)0.254%0.495%32.54p < 0.001
iOS (App Store)1.262%1.118%0.07n.s.

For Android, the AI-first pivot doesn’t just increase negative sentiment — it increases the rate at which users actively name substitutes. Controversy weeks become shopping weeks. iOS does not show the same shift in this slice; the iOS competitor-naming baseline is already 5× higher (longer, more deliberative reviews), and the limited pre-pivot iOS sample (1,030 reviews) inflates variance.

4.3 Where do they go? Three substitution buckets

Bucket A · The new substitute set
“Use an AI tutor instead” — ChatGPT / generic LLM tutor
The structurally novel substitution bucket post-pivot. Users explicitly say “if I wanted to learn with an AI, I’d subscribe to ChatGPT” — the most rhetorically devastating type of switching language because it reframes Duolingo’s value prop. Duolingo’s AI-first move inadvertently teaches the substitution behavior: if the product is “AI language learning,” the user can just try a general-purpose LLM. This is the cohort the FY2025 10-K’s “competitive pressure” risk-factor language was foreshadowing.
Bucket B · Direct competitors
Structured language-learning alternatives — Memrise, LingoDeer, Mango, Rosetta Stone, Busuu
The traditional substitute set. Most of these mentions co-occur with explicit framings of quality decline or paywall friction. “Paywall makes learning difficult, switching to LingoDeer.” “Decrease in lesson quality since AI update — switching to Busuu.” This bucket is the “normal” competitive displacement story; it’s the existence of Bucket A alongside it that makes the AI-first pivot uniquely costly.
Bucket C · The exit-to-high-agency cohort
YouTube channels, private tutors, classrooms
The cohort that has stopped believing in app-based language learning entirely. Users say things like “if u wanna speak fluently, get a language tutor instead” or “I learn from YouTube now.” This is a value-prop failure rather than a feature-comparison failure. Small in volume but the most concerning — these users are leaving the category, not switching brands.

4.4 Verbatim evidence — the substitution targets in users’ own words

“If I wanted to learn with an AI, I’d subscribe to ChatGPT. Do your job, pay actual translators.”
Google Play · ★1 · 2025-05-03 competitive_displacement_signal = switching_to_ai_tutor · competitor_mentioned = chatgpt_or_llm_tutor · translator_layoff_reaction_signal = layoff_explicit_negative
“paywall makes learning difficult, switching to Lingodeer.”
Google Play · ★1 · 2025-10-28 competitive_displacement_signal = switching_to_competitor · competitor_mentioned = lingodeer · pricing_or_paywall_signal = paywall_too_aggressive
“I’ve been using Duolingo for 9 years … decrease in quality … ‘AI-first.’ I definitely recommend Mango more.”
Google Play · ★1 · 2025-08-14 competitive_displacement_signal = comparison_favorable_to_duolingo · competitor_mentioned = mango_languages · learner_lifecycle_stage = long_term_streak
“Significant decrease in lesson quality since AI update … switching to Busuu.”
Google Play · ★2 · 2025-06-19 competitive_displacement_signal = switching_to_competitor · competitor_mentioned = busuu · course_quality_signal = course_quality_declined
B2C Subscription Lever · Competitive Displacement Risk
The portable diagnostic: after any AI-feature announcement, watch your competitor-naming rate AND watch which competitors show up. If “ChatGPT,” “Claude,” or “an AI tutor” appears in your switching-intent reviews, your value-prop framing has trained users to compare you to general-purpose assistants. Bucket A above is the most expensive substitution cohort to recover — the user no longer accepts that your product category is the right shape of solution. The fix is reframing: specialized over generic, supervised over unsupervised, structured over chat. Any consumer subscription that has shipped an AI-features story in the last 12 months should be running this test on its own data already.
Section 05 · Subscription Read-Through

The conversion boundary shows up as trust erosion, not price complaints

This section translates the AI and Energy shifts into subscription mechanics language: conversion, paywall pressure, trust erosion, and churn intent. The operator-relevant signal is not complaints about price — pricing complaints are constant background noise — but the trust signature that appears around monetization friction.

The 3-way split test We segment the window into pre-pivot (Apr 1–27), post-pivot pre-Energy (Apr 28 – Oct 19), and post-Energy (Oct 20 – Mar 31) and measure four signals across all three periods. The pattern: negative paywall language rises post-Energy on Android (2.89% → 4.96%), but trust erosion attributed to pricing/energy rises faster (2.91% → 7.44%). Trust erodes ahead of price language — that’s the leading indicator.
7.44%
of Android reviews in the post-Energy period (Oct 20 – Mar 31, 2026) carry trust erosion attributed to pricing or energy mechanics, up from 2.45% pre-pivot and 2.91% in the post-pivot pre-Energy window. The 2.5× rise after Oct 20 tracks the engagement-mechanic change far more cleanly than negative paywall language alone. Trust is the leading indicator; price language is lagging.

5.1 The 3-way split — how monetization friction evolved

PeriodSourcenAdverse ≤2★Negative paywallTrust eroded (price/energy)
Pre-pivotAndroid29,9318.47%2.65%2.46%
Post-pivot, pre-EnergyAndroid232,06416.90%2.92%2.91%
Post-EnergyAndroid211,04113.66%4.96%7.44%
Pre-pivotiOS1,03016.99%10.10%8.25%
Post-pivot, pre-EnergyiOS11,09245.75%19.17%30.45%
Post-EnergyiOS17,11121.93%9.40%10.16%
Section 05 · Chart 01 — 3-way split, Android (the cleanest case)
Pre-pivot vs. post-pivot pre-Energy vs. post-Energy. Trust erosion rises faster than negative paywall language — trust is the leading indicator.

The Android pattern is the most operationally interpretable. Adverse share peaks in the post-pivot pre-Energy window (the AI backlash inflates it), then partially normalizes post-Energy. But trust erosion attributed to pricing/energy rises monotonically across all three periods: 2.46% → 2.91% → 7.44%. Negative paywall language follows the same shape but lagging. The interpretation: each subsequent monetization event compounds onto a base of accumulated trust debt, not onto a clean slate.

iOS shows a different shape because the iOS pre-Energy period contains the entire AI-first backlash spike (May–Aug 2025), inflating the middle bucket. The post-Energy iOS numbers decline from the middle period — not because the Energy change reduced friction, but because the AI controversy faded and the Energy controversy didn’t pile on as hard on iOS as it did on Android.

5.2 The four-quadrant view — where the operationally-actionable cohorts sit

High AI quality friction · High mechanic friction
The compound-trust-debt cohort
Reviews tagged with both ai_quality_signal (active) and hearts_or_energy_signal (negative). Most-at-risk. These users have processed BOTH events and concluded the product is moving in the wrong direction on both dimensions. Highest cancellation intent, highest competitor-naming rate.
High AI quality friction · Low mechanic friction
The course-quality detractor cohort
Users who care about content quality first and engagement second. The cohort that drove the Apr 28 spike. Cancellation rate is high but more recoverable than the compound cohort if course-quality assurance language enters the product narrative.
Low AI quality friction · High mechanic friction
The engagement-mechanic detractor cohort
The cohort that drove the Oct 20 spike. Users who experience Energy as a paywall and reframe Duolingo as monetization-first. Recoverable if the mechanic is softened or reframed. “Bring back hearts” is the literal reversion request from this cohort.
Low AI quality friction · Low mechanic friction
The First-Week Advocate cohort
The cohort that drives 5-star reviews. Largest by volume (~70% of Android 5-stars are in first-week onboarding language). Treat with caution — these reviews predate the reviewer’s actual operational experience with Duolingo’s monetization mechanics. Solicitation-driven advocacy, not loyalty.

5.3 The Tier 1 / 2 / 3 read-through for any B2C subscription operator

Tier 01 · Highest leverage · Watch within 7 days
Run the event-week pivot test after every AI-feature announcement
The Apr 28 + Oct 20 event-week shape is portable. After any AI-feature press release, watch your customer voice for a synchronized cliff: outcome (adverse share) + mechanism (AI mentions / feature-name mentions) moving together in the same seven days. If both move, you are in reputational-harm territory and the residual will persist 6+ months. If only one moves, it’s probably routine volatility. The test costs ~$500 of enrichment compute against a 30,000-review pre-period.
Tier 02 · Medium leverage · Watch monthly
Track the trust-vs-price-language divergence on every monetization change
Trust erosion (qualitative language: “money hungry,” “coercive,” “greedy”) is a leading indicator of the price-specific language that follows. If trust erosion rises faster than negative-paywall language, the conversion boundary has become a fairness argument rather than a value argument. This is recoverable but expensive; the longer it’s left, the more every subsequent monetization surface reads as bad faith.
Tier 03 · Long-arc · Quarterly
Run the 10-K narrative test against the customer voice every quarter
Every public consumer-AI announcement has a risk-factor sentence somewhere. The work is checking which sentence the customer voice actually supports. Pull the growth-narrative quote and the risk-factor quote from your own filings, quantify the cohort supporting each, and report the answer to leadership before the next earnings call. Done internally, this is the cheapest way to avoid a CEO defending a position the customer data already contradicts.

5.4 Verbatim evidence — the conversion-boundary language

“For years this was a great app for free users. The new energy system means you can’t complete more than two lessons a day unless you want to watch a ton of ads.”
Google Play · ★1 · 2026-01-15 hearts_or_energy_signal = energy_too_aggressive · ad_load_signal = ads_too_many · pricing_or_paywall_signal = paywall_too_aggressive
“DO NOT PAY FOR SUPER … they decided to give us super users ADS THAT WE SPECIFICALLY PAID NOT TO SEE … GREEDY COMPANY. USE YOUR MONEY ON ACTUAL TUTORS.”
Google Play · ★1 · 2025-12-22 pricing_or_paywall_signal = paywall_too_aggressive · ad_load_signal = ads_too_many · trust_signal = trust_eroded_pricing_or_paywall · product_tier_mentioned = super
“Behind the cutesy characters … happy laying off employees in favor of AI … lost explanations unless we pay … replaced heart system with a battery system unless we pay … Doesn’t matter if you’re acing the lessons — you gotta pay to keep going.”
Google Play · ★2 · 2026-02-08 The one verbatim that contains the whole report · ai_quality_signal · translator_layoff_reaction_signal · hearts_or_energy_signal · pricing_or_paywall_signal · trust_signal

5.5 How to recognize whether this pattern applies to your business

The Duolingo case is not generic. Three structural conditions made it uniquely exposed to the AI-pivot / engagement-mechanic combination that produced the $20B repricing. The pattern is portable to other subscription products only when all three conditions hold.

Condition 01 · The AI announcement
A public AI-feature announcement framed broadly enough that customers read it as quality-control removal
The pattern fires hardest when the framing is broad (“AI-first”) rather than narrow (“we added an AI tool for one specific task”). Broad framings invite customers to compare you to general-purpose LLM substitutes; narrow framings keep them in your category. If your last AI announcement could plausibly be heard as “the AI does the work,” the substitution-set has already shifted in your customer voice — even if your team hasn’t noticed yet.
Condition 02 · The engagement mechanic
A retention mechanic that gates customer outcomes, not just feature access
Energy gates the act of completing a lesson, not access to a feature. The same shape exists in any subscription product where a mechanic limits the rate at which a customer accomplishes the thing they came for — practice sessions, edit time, message sends, design iterations, search refreshes, AI-tool invocations. The risk is sharpest when the friction is monetized via interruption rather than gated behind tier upgrades.
Condition 03 · The brand relationship
A user base that has personified the product enough to feel betrayed
Duolingo’s mascot, streak mechanic, and brand voice create a parasocial relationship that amplifies negative sentiment when the product breaks faith. Subscription products with strong character / habit-loop / personality-driven brand identity inherit this risk; utilitarian products with no parasocial layer carry less of it. The clearest tell: do customers talk about your product like a person or like a tool? If they say things like “the AI ruined what made it special” rather than “this feature got worse,” you have a parasocial dependency.

If all three conditions are present, the diagnostic to run on your own customer voice is straightforward: the event-week pre/post chi-squared test from Section 2, the engagement-mechanic boundary test from Section 3, and the competitive-substitution-set test from Section 4. The cost of running it is rounding error compared to the multiple compression a 17-point bookings-growth cliff produces.

B2C Subscription Lever · Trust Capital
The compounding cost of monetization-via-friction is trust debt. Once users say “money hungry” or “coercive,” price becomes non-negotiable — every subsequent monetization surface reads as bad faith. The diagnostic is simple: search your own customer-voice corpus for “money hungry,” “greedy,” “coercive,” “disguised paywall” alongside your product name. If the verbatim count is rising quarter-over-quarter, you are spending trust capital faster than you are earning it, and ARPU optimization will get harder before it gets easier.
Section 06 · The Financial Read-Through

The customer voice predicted a $20B market cap destruction twelve months in advance

The previous five sections were observational: this is what 502,269 reviews said and how the dimensions behave. This section closes the loop. Every customer-voice signal Sections 2–5 documented materialized in Duolingo's quarterly SEC filings — with a ~12-month lag. The framework called the event in week 1. Management acknowledged it in May 2026. In between, the stock fell 81% and roughly $20 billion in market capitalization was destroyed.

The shock Duolingo (NASDAQ: DUOL) closed at $544.93 on May 14, 2025 — an all-time high reached just two weeks after the AI-first memo. As of May 11, 2026, it closes at $105.15. The customer-voice analysis Section 2 documented identified the inflection in week 1. The market took twelve months to fully price it in. The framework's lead time over consensus repricing is the entire FY2026 operating year.
−$20B
in market capitalization erased between May 2025 and May 2026, from a peak of ~$24–25 billion to $4.89 billion. The customer-voice cliff Section 2 identifies in week 1 of May 2025 is the same friction event Duolingo's CEO publicly acknowledged on May 4, 2026 — eleven months and three weeks later.

6.1 The deceleration cliff — verified from Duolingo's quarterly SEC filings

Across five quarters spanning the AI-pivot and the Energy event, Duolingo’s KPI trajectory collapsed in lockstep with the customer-voice signals Sections 2 and 3 documented. Daily Active User growth fell from +49% YoY (Q1 2025, the pre-pivot baseline) to +21% YoY (Q1 2026, filed May 4, 2026) — a 28-percentage-point deceleration that more than halved the company’s growth rate. The most violent single-quarter cliff is bookings growth in Q4 2025: from +41% to +24% as the Energy mechanic rolled out mid-quarter. FY2026 bookings growth guidance is now 10–12% — the lowest in Duolingo’s public history.

QuarterFiledDAU YoYPaid subs YoYBookings YoYRevenue YoYCustomer-voice context
Q1 2025May 1, 2025+49%+40%+38%+38%Pre-pivot baseline. Bookings beat guidance; FY2025 raised.
Q2 2025Aug 6, 2025+40%+37%+41%+41%AI-first memo lands Apr 28. Surface metrics still strong — first 60 days lag.
Q3 2025Nov 5, 2025+36%+34%(high)+41%DAU growth decelerates four points. Lapping Q3’24 +54%.
Q4 2025Feb 26, 2026 (10-K)+30%(net +700K)+24%(decelerating)Energy launches Oct 20. Bookings growth craters 17 points in a single quarter.
Q1 2026May 4, 2026+21%(slow)~+10%~+15%Management acknowledges “extra friction in monetization.” FY2026 guide halved.
Section 06 · Chart 01 — Quarterly growth deceleration vs. event boundaries
Y-axis: YoY growth rate. Two markers: Apr 28, 2025 (AI-first memo, fell in Q2'25 reporting period) and Oct 20, 2025 (Energy Points, fell in Q4'25). The bookings line is the cleanest financial signal of the customer-voice events.
AI-first memo · Q2'25 Energy Points · Q4'25
Bookings growth in Q4 2025 (post-Energy quarter) is the single most violent deceleration in the data: +41% → +24%, a 17-point single-quarter cliff. The customer voice spike in Section 3 occurred 10 weeks earlier.

6.2 The framework called this in week 1. The market took twelve months.

The most important framing for any subscription operator or public-market analyst reading this: the customer-voice analysis is a leading indicator with a 9–12-month lead time over equity-market repricing. Sections 2 and 3 document signals that were observable in the data within seven days of the events that caused them. Duolingo’s stock peaked two weeks after the AI-first memo (May 14, 2025: $544.93), continued grinding higher into Q2 2025 earnings (beat-and-raise on Aug 6 masked underlying friction), and only began the sustained decline that destroyed ~$20B in market capitalization once the Q3 2025 deceleration (Nov 5, 2025) and the Q4 2025 cliff (Feb 26, 2026 10-K) made the underlying friction undeniable.

Section 06 · Chart 02 — Stock price trajectory vs. customer-voice events
Monthly DUOL close, May 2025 through May 2026. The customer-voice cliff (Apr 28, 2025) is in week 1. The stock peak is week 2. The repricing takes the full year.
AI-first memo · Apr 28, 2025 Energy Points · Oct 20, 2025
Stock peak May 2025 at $544.93. Current close $105.15 (May 11, 2026). The Q4 2025 10-K filing in Feb 2026 is the inflection where the bookings cliff becomes undeniable; the customer-voice analysis identified the underlying friction nine months earlier.

6.3 Management’s own confession

The Q1 2026 shareholder letter (filed May 4, 2026, available on Duolingo’s IR site) is the document that makes Section 3 of this report rhetorically devastating. Quoting directly:

Management acknowledgment, May 4, 2026

The company believes extra friction in monetization is part of the reason DAU growth has slowed, and has decided to prioritize user growth over monetization, investing more than $50 million of foregone bookings from friction into the free user experience.
Duolingo Q1 2026 Shareholder Letter · Filed May 4, 2026

Section 03 of this report, on data through Mar 31, 2026

Energy Points behaves like a retention tax: users either pay to remove friction or churn while writing the verbatim “disguised paywall.” The mechanic that gates completion reads as coercion. Once monetized via friction, you cannot easily roll it back — reversal is framed as retreat.
Section 03 · The Engagement-Mechanic Layer · This publication

Read the two side by side. The Cable’s customer-voice analysis arrives at the same diagnosis Duolingo’s own CFO and CEO arrived at — using only review text, with twelve months of lead time over the SEC filing that confirms it. The $50M+ in foregone bookings Duolingo is now investing back is the financial price of not having seen Section 3 in May 2025. That price line-items into every quarter of FY2026.

6.4 What the framework would have predicted, run in real time

Had The Cable Episode 01 been published in mid-May 2025 — two weeks after the AI-first memo, with only Q1 2025 financials available — the framework would have made four concrete claims, every one of which has since been confirmed by SEC filings:

Claim the framework would have made (May 2025)How it was confirmedConfirmation lag
AI-pivot will generate a permanent baseline shift in AI-mention rate, not a transient spike.Q1 2026 AI-mention residual sits at 1.451% vs. 0.665% pre-pivot baseline (2.18× elevated, persisting eleven months later).~11 months
Engagement-mechanic friction is the second-order cost of monetizing via interruption.Q1 2026 letter: “extra friction in monetization is part of the reason DAU growth has slowed.” $50M+ foregone bookings invested to roll back.~7 months
Competitive displacement set shifts toward generic AI tutors (“why not just use ChatGPT?”).Bookings growth guide cut to 10–12% for FY2026; management cites “new monetization balance”; speaking/AI features specifically called out as response.~12 months
Reputational drag will materialize as growth-rate deceleration, not as immediate cancellations.DAU growth 49% (Q1'25) → 21% (Q1'26). Bookings growth 41% (Q3'25) → 10–12% (FY'26 guide). ~$20B market cap erased.~12 months

6.5 The forecast model, run forward

With the framework validated against the realized FY2025/FY2026 numbers, the same model can be run forward. Three quantified scenarios from Section 2 and 3 cohort sizes:

ScenarioCohort & assumptionImplied revenue at riskConfidence
Direct switch-intent churn~5,000 post-Apr-28 reviews with explicit switching intent × (1 / 2% review-to-paid-subscriber ratio) × 2× paid-sub overrepresentation × 40% intent-to-action conversion × $75 ARPU = ~$15M/year~$15MHigh
Reputation-drag CAC inflationIf the Q1 2026 AI-mention residual (2.18×) continues, blended CAC rises 8–15% as conversion friction compounds. On a ~$200M+ FY2025 sales & marketing base, that’s ~$16–30M/year of additional spend to maintain same growth.$16–30MMedium
Foregone bookings investment (already happening)Duolingo’s own publicly-stated investment: $50M+ in foregone bookings to roll back friction. This is the line item management has already conceded.$50M+Confirmed
Realized market-cap impact (not a forecast)From $544.93 peak (May 2025) to $105.15 close (May 11, 2026) on ~46M diluted shares.~$20BConfirmed
B2C Subscription Lever · Equity-market repricing
The $20B market cap destruction is not a forecast. It is a closed loop. The customer-voice analysis Sections 2 and 3 published would have flagged the underlying friction with twelve months of lead time over the moment the market fully priced it in. For any consumer subscription company with public AI-feature announcements in the last 12 months: the cost of not running this analysis is measured in the multiple-compression you take when your bookings growth decelerates from 40% to 10%. Duolingo is the live case. The framework is portable.

6.6 What to verify against your own 10-Qs (the operator playbook)

Three quarterly KPIs to watch in your own filings, mapped to the customer-voice signals that lead them:

Tier 01 · Watch in the next 10-Q
Bookings growth deceleration (the Duolingo Q4 2025 cliff pattern)
If your customer voice shows an event-week cliff in adverse + mechanism in the same seven days, watch your bookings growth rate the following quarter. Duolingo’s bookings decelerated 17 points in the Energy quarter; if your equivalent shows up, that is your $20B-of-market-cap-destruction signal arriving on a quarterly delay. The lead time is ~3 months.
Tier 02 · Watch over 2–3 10-Qs
DAU growth compression (the longer-arc retention signal)
DAU growth is the slower-moving variable. Duolingo went from +49% (Q1’25) to +21% (Q1’26) — a four-quarter decay matching the customer-voice decay tail. If your AI residual is persistent (>2× baseline at 90 days post-event), expect DAU growth to compress on a 2–3 quarter delay. This is the slow leak before the cliff.
Tier 03 · Watch the IR-narrative-customer-data divergence
Where the next earnings call sets up the “new monetization balance” language
Duolingo’s May 4, 2026 shareholder letter explicitly used the framing “new monetization balance” and “extra friction in monetization is part of the reason DAU growth has slowed.” That language is the verbal signature of a company that has now seen Section 3 in its own data. If your CFO is drafting similar language for your next earnings call, you are roughly 9–12 months behind where the customer voice could have alerted you.
B2C Subscription Lever · CFO Briefing Material
Customer voice is a leading indicator of bookings deceleration with a 9–12 month lead time over the moment the equity market fully prices it in. The Duolingo case makes this concrete: the underlying friction was identifiable in customer-voice data within seven days of the AI-first memo, but the equity market needed four quarters of confirmatory SEC filings to fully reprice. For any operator of a B2C subscription business who wants to know whether their next quarterly filing will surprise the market — favorably or otherwise — the customer voice is the earliest reliable signal available.
Appendix A · Dimension Reference

The 25 enrichment dimensions, by cluster

All 25 dimensions extract only from learner-authored text (Message, plus title on iOS rows). Star rating, country, and other metadata are joinable but never feed into a dimension’s value. Every enum terminates with none_detected.

Cluster 01
Product & Lifecycle (3)
product_tier_mentioned · feature_or_surface_mentioned · learner_lifecycle_stage. Where in the learner journey and on which Duolingo surface the reviewer sits.
Cluster 02
AI & Course Quality (4)
ai_quality_signal · course_quality_signal · cultural_context_signal · translator_layoff_reaction_signal. The primary-hypothesis spine. ai_quality_signal uses a 7-step priority order.
Cluster 03
Engagement Mechanics (4)
engagement_mechanic_signal · streak_event_signal · hearts_or_energy_signal · gamification_emotional_response. The backup-hypothesis spine and Section 03’s analytical workhorses.
Cluster 04
Pricing & Monetization (3)
pricing_or_paywall_signal · ad_load_signal · feature_request_or_paywall_dispute. Section 05’s subscription read-through layer.
Cluster 05
Customer Service & Account (2)
customer_service_signal · account_or_data_loss_signal. Captures progress-lost, account-deletion-blocked, sync-across-devices-problem.
Cluster 06
Competitive & Trust (3)
competitor_mentioned · competitive_displacement_signal · trust_signal. Section 04’s substitution-set spine.
Cluster 07
Outcome & Recommendation (2)
overall_review_sentiment · recommendation_signal. The classifier that drives the Learner Health Segment assignment.
Cluster 08
Verbatim Evidence (5, free-text)
primary_pain_point_phrase · primary_delight_phrase · feature_request_detail · sentiment_verbatim · learner_situation_summary. The verbatim citation layer; each is concise (10–30 words) and preserves the reviewer’s voice in original language.
Appendix B · B2C Subscription Lever Map

Which review signals speak to which operational levers

The portable mapping: when you run this analysis on your own customer voice, here is which dimensions to watch for which operational lever in your subscription P&L.

Paid retention
competitive_displacement_signal ∈ {switching_to_competitor, switching_to_ai_tutor, has_already_switched} × recommendation_signal ∈ {would_not_recommend, would_actively_dissuade}. The compound cohort is your direct churn-intent measure.
Engagement health
engagement_mechanic_signal + hearts_or_energy_signal + gamification_emotional_response. The Energy boundary test pattern is portable to any gamification mechanic change.
Course / content reputation
ai_quality_signal + course_quality_signal + cultural_context_signal. The Section 02 chi-squared test is the canonical structural diagnostic for an AI-feature announcement.
Competitive displacement risk
competitor_mentioned ∈ {chatgpt_or_llm_tutor, etc.} × period (pre vs post your own AI announcement). Watching the substitution-set composition is the leading indicator.
Ad revenue durability
ad_load_signal ∈ {ads_too_many, ads_inappropriate_content, ads_unskippable_or_broken}. When ads start carrying disproportionate trust-erosion language, you have priced your free tier into a paywall.
Trust capital
trust_signal ∈ {trust_eroded_post_ai_pivot, trust_eroded_post_energy_change, trust_eroded_pricing_or_paywall, trust_eroded_quality_decline}. Trust erosion leads price language by 1–2 quarters — this is the earliest-warning monetization indicator.
Appendix C · Methodology

How this analysis was done — the Dimension Labs process

For readers who want to verify the analytical rigor or replicate this work on their own customer voice. This appendix walks through the seven-step pipeline that produced every finding in this publication, from raw scraped reviews to chi-squared statistics to the $20B financial read-through.

C.1 What Dimension Labs does

Dimension Labs is a causal-intelligence platform. We turn unstructured customer text (reviews, complaints, support tickets, social posts, internal notes) into structured signal (labeled dimensions that can be counted, joined, and statistically tested), then overlay that signal against the corporate events that explain it. The Cable is our weekly publicly-traded-company analysis program; each episode applies the same pipeline to a different company. Duolingo is Episode 01.

The output of every analysis is two things: (a) a labeled dataset where each row of customer text carries a complete row of structured signals (25 dimensions in this case), and (b) a causal-intelligence report that tests pre-registered hypotheses against the data and quantifies the result.

C.2 Step one — Source collection

For Episode 01, three public review surfaces were scraped: Google Play (Android), Apple App Store (iOS), and Reddit (r/duolingo, r/duolingomemes, r/languagelearning). For each review, the scrape captured: review text, star rating where applicable, timestamp, app version, country code, and (for Reddit) subreddit, upvotes, and the top 20 comments. 737,342 total reviews were scraped across the three sources. The Reddit corpus was set aside for a later enrichment pass; the analytical work in this publication uses the 502,269 in-window app-store reviews.

No customer identification, no personally identifying information beyond what reviewers chose to publish, no internal Duolingo data. Every byte underlying this report is publicly available.

C.3 Step two — Dimension schema design

Before any enrichment ran, we designed a 25-dimension extraction schema grouped into 8 clusters (Appendix A is the full reference). The schema is a structured JSON file that defines, for each dimension, an enum of allowed values (or a free-text rule), a priority order when multiple values could apply, a list of disambiguation rules, and cross-dimension consistency rules.

Two dimensions illustrate the design discipline. ai_quality_signal uses a seven-step priority order: when a review contains multiple AI-quality-related signals, the LLM is instructed to return the highest-priority value, with priority ordered from broad indictment (“AI made the courses worse”) down through specific complaints (awkward sentences, factual errors, layoff framing). learner_lifecycle_stage uses a ten-step priority order anchored on cancellation cues (“I’m done,” “deleting the app,” “last straw”) followed by content-encounter cues followed by mechanic-friction cues. Without these orderings, multi-signal reviews classify inconsistently.

C.4 Step three — The text-only rule

The single most important rule in the schema: dimensions extract signal only from the customer’s text (the review body, plus the review title on iOS where present). The LLM never sees the star rating, the source, the country code, the subreddit, the upvote count, or the date. This rule is what makes the dimensions analytically usable downstream.

The reason is methodological: if the LLM had access to the star rating when classifying sentiment, every chi-squared test linking sentiment to star rating would be circular — we’d be measuring how reliably the LLM copies a number it was already shown. By withholding the metadata at enrichment time, we guarantee that a 1-star review and a 5-star review with identical text get identical dimension labels. The star rating then becomes a clean outcome variable to test the dimensions against, not a contaminated input.

C.5 Step four — Per-row LLM enrichment

The platform runs each review through a large language model independently — one row at a time, no cross-row context. The LLM receives the schema, the row’s text, and a structured output instruction; it returns a JSON object with one value per dimension. The 25 dimensions on 502,269 reviews produced approximately 12.5 million structured signals from the underlying text corpus.

Free-text dimensions (verbatim quotes, situation summaries) follow a stricter rule: copy the customer’s words verbatim, do not paraphrase, do not translate non-English content. Every quote in this publication’s body is reproduced exactly as the reviewer wrote it; where the source language is Spanish, Portuguese, Japanese, or Korean, the quote appears in the original language with light translation as context only.

C.6 Step five — QA and validation

Before running the schema against the full corpus, we ran it against a stratified 30-row sample spanning each major event window (pre-pivot baseline, AI-pivot week, AI decay, Energy event week, Energy decay, Q1 2026 baseline) and each source. The QA round caught one enum violation, one cross-dimension inconsistency, and several boundary cases that pointed to ambiguities in the schema. Eight description-level fixes were applied to the schema before the production run.

After the production enrichment completed, we ran a second-pass validation: distribution checks on every dimension (no value should over-fire above 95% of the corpus — saturation means the dimension is uninformative), enum compliance (every value must come from the defined enum, no LLM-invented values), and free-text hallucination checks (the verbatim text must overlap with the source review’s words).

C.7 Step six — Statistical analysis

All quantitative claims in this publication are computed in SQL against the enriched dataset. The chi-squared statistics use Yates’ continuity correction on 2×2 contingency tables:

StepComputation
1. Define the binary signalFor each test, define a binary based on a dimension value: e.g., ai_quality_signal NOT IN ('no_ai_mention', 'none_detected') = 1.
2. Define the period splitPre-period: reviews before the event boundary (Apr 28, 2025 for the AI test). Post-period: reviews on or after.
3. Build the 2×2 tablea = signal present in post-period; b = signal absent in post-period; c = signal present in pre-period; d = signal absent in pre-period.
4. Compute χ² (Yates)χ² = N × (|ad - bc| - N/2)² / ((a+b)(c+d)(a+c)(b+d)), where N = a+b+c+d.
5. Read significanceχ² ≥ 10.83 = p < 0.001; ≥ 6.64 = p < 0.01; ≥ 3.84 = p < 0.05; otherwise n.s.

Lift ratios use the same enriched dataset: P(adverse outcome | signal present) / P(adverse outcome | signal absent), with adverse defined as star rating ≤ 2 on app-store reviews. Lift ratios are paired with chi-squared in every claim — lift without significance is rhetorical; significance without effect size is technical but unimportant.

C.8 Step seven — The Causal Join

The last step is what gives this publication its analytical edge. The enriched dataset is overlaid against the corporate timeline: the April 28, 2025 AI-first memo, the October 20, 2025 Energy Points rollout, the four quarterly SEC filings spanning the window, the Q1 2026 management acknowledgment of monetization friction. Where the customer-voice signal and a corporate event line up in time, the analysis tests whether the customer voice led, lagged, or coincided with the financial signal.

For Duolingo, the customer-voice signal led the financial signal by 9-12 months. That gap — the lead time of qualitative customer voice over quantitative quarterly financials — is the central methodological finding of this publication, and the reason we believe the framework is portable to other publicly-traded consumer subscription companies. Section 6 shows the lead-time visualization quarter by quarter.

C.9 Reproducibility

Three things make this analysis reproducible:

  1. The schema is published as a structured JSON file. Any third party with the same review corpus and a capable LLM can run the same enrichment and verify the dimension values.
  2. The statistical tests are standard. Chi-squared with Yates correction has been the textbook treatment for 2×2 contingency tables for over a century. Every claim in the publication can be re-derived from the enriched dataset by anyone with basic SQL.
  3. The events tested were pre-registered. Apr 28, 2025 and Oct 20, 2025 are publicly-known dates, not data-mined event boundaries. We chose them based on the corporate timeline before running the chi-squared tests, not after — which prevents the “test enough cut-points and one will be significant” failure mode.

C.10 What you would do to run this on your own customer voice

The portable workflow:

  1. Pick your event window. One quarter before to four quarters after any public product change you want to test (an AI feature launch, a pricing change, a UX redesign).
  2. Pull your customer text. App store reviews, support tickets, exit surveys, social mentions, any source that captures unsolicited customer voice.
  3. Design 15-30 dimensions. Cover the hypothesis under test, the mechanism, the affective response, and the competitive substitution set. Pre-register them before running.
  4. Enrich, QA, validate. One row at a time, text-only. Spot-check at least 30 stratified rows before the full run.
  5. Run chi-squared on every load-bearing dimension across the event boundary. Report effect sizes and significance together.
  6. Overlay against your quarterly filings. If the customer voice signal precedes a KPI deceleration, you have a leading indicator. If it doesn’t, you have falsifying evidence and should revise the hypothesis.

For most subscription businesses with public review surfaces and a recent AI announcement, this is a 4-6 week engagement. The cost is rounding error compared to a single quarter of bookings-growth deceleration.