A structured analysis of 502,269 Duolingo customer reviews identified, in week one of May 2025, the friction that Duolingo’s CEO publicly acknowledged on May 4, 2026 — twelve months later. In between, the stock fell 81% and roughly $20 billion in market capitalization was erased. This publication is the demonstration: customer voice is a leading indicator of public-market repricing, with a 9-to-12-month lead time over consensus.
Duolingo is the cleanest live case study available for two questions every consumer subscription business is now forced to answer: what happens when you pivot “AI-first” in a product customers believe requires human quality control, and when does engagement become a retention tax rather than a moat. This publication tests both against twelve months of customer voice — and against the SEC filings that confirmed the customer voice with a multi-quarter lag.
| What this proves | What to copy | Operational lever it speaks to |
|---|---|---|
| Customer voice leads equity-market repricing by 9–12 months. | Run this analysis on your own customer voice within 60 days of any AI-feature announcement. The cost is rounding error vs. multiple-compression risk. | Equity-market repricing · CFO briefing |
| AI-first announcements are read as quality-control removal until the assurance system is visible. | Ship the QA / human-in-the-loop story before the AI narrative. | Course/content reputation · paid retention |
| Mechanic that gates completion is interpreted as coercion — and shows up as a single-quarter bookings cliff. | Treat practice time as sacred. Watch for the bookings-growth cliff 1 quarter after any monetization-friction change. | Engagement health · trust capital · bookings growth |
| “AI does the work” rhetoric invites ChatGPT-as-tutor substitution. | If your value prop becomes “our AI,” you train customers to compare you to general-purpose assistants. | Competitive displacement risk |
| Source | n | AI mention | Trust eroded post-AI | Course quality declined | Adverse (≤2★) |
|---|---|---|---|---|---|
| Android (Google Play) | 473,036 | 4.32% | 3.56% | 5.12% | 14.92% |
| iOS (App Store) | 29,233 | 3.81% | 2.54% | 8.91% | 30.80% |
The primary hypothesis under test: did Duolingo’s late-April 2025 AI-first memo produce a measurable, statistically significant, persistent increase in negative learner voice specifically on course quality — or just generic product-change volatility? This section runs the test against five required signals and reports the result.
The week of April 28 is a true discontinuity in three separate measures at once: review volume spikes, average star rating drops sharply, and AI-mention share jumps from background noise to dominant-theme levels. That pairing — outcome (stars) and mechanism (AI mentions) moving together in the same seven days — is what makes the AI narrative testable rather than coincidence. A typical product-change-cycle adverse spike produces the outcome cliff without the mechanism cliff; this one produced both.
Five dimensions speak to the AI-quality hypothesis. Chi-squared with Yates correction across the Apr 28 boundary returns large, highly significant shifts on four of them, including the two highest-leverage signals (AI mention and trust erosion). The fifth (cultural-context issue) does not shift at the headline level; the section’s read on this is that cultural-context complaints are language-specific and require per-language unpacking rather than a corpus-wide rate test.
| Signal (binary) | Pre rate | Post rate | χ² (Yates) | Significance |
|---|---|---|---|---|
AI mentioned (ai_quality_signal ≠ no/none) | 0.72% | 4.53% | 1,026.29 | p < 0.001 |
Trust eroded post AI pivot (trust_signal) | 0.27% | 3.71% | 1,016.12 | p < 0.001 |
Translator layoff reaction (translator_layoff_reaction_signal) | 0.06% | 2.30% | 687.38 | p < 0.001 |
Course quality declined (course_quality_signal) | 2.95% | 5.50% | 373.13 | p < 0.001 |
| Cultural-context issue (broken / inappropriate / language-specific) | 0.56% | 0.56% | 0.00 | n.s. |
| Signal | Δ (pp) | χ² |
|---|---|---|
| AI mentioned | +3.81 | 1026 |
| Trust eroded post-AI | +3.44 | 1016 |
| Layoff reaction | +2.24 | 687 |
| Course quality declined | +2.55 | 373 |
| Cultural-context issue | 0.00 | 0.0 (n.s.) |
The harder question for a subscription operator is not whether an announcement causes a one-week blowup — most do. The question is whether the blowup leaves a permanent baseline shift. Six operational periods after April 28, we see the answer:
| Period | Android n | Android AI % | iOS n | iOS AI % |
|---|---|---|---|---|
| Pre-pivot baseline (Apr 1–27) | 29,931 | 0.665% | 1,030 | 2.233% |
| Event week (Apr 28 – May 4) | 13,010 | 36.449% | 506 | 41.502% |
| Decay tail (May – Aug) | 156,371 | 6.590% | 7,597 | 5.900% |
| Late 2025 (Sep – Dec) | 167,283 | 2.190% | 7,874 | 2.400% |
| Q1 2026 “new normal” (Jan – Mar) | 106,441 | 1.451% | 12,226 | 1.988% |
| Period | Android AI % | iOS AI % |
|---|---|---|
| Pre-pivot | 0.665 | 2.233 |
| Event week | 36.449 | 41.502 |
| Decay tail | 6.590 | 5.900 |
| Late 2025 | 2.190 | 2.400 |
| Q1 2026 | 1.451 | 1.988 |
Android shows the clean shape: 0.665% pre-pivot, peak at 36.449%, slow decay through summer and fall, settling at 1.451% in Q1 2026 — 2.18× the pre-pivot baseline, eleven months after the memo. The residual is not zero. AI is now a persistent topic of complaint in a way it simply was not before April 28. iOS does not show the same clean residual elevation because the iOS pre-pivot baseline already sits at 2.23% (longer, more editorially opinionated reviews include more meta-talk about product strategy); the event-week spike on iOS is still unambiguous at 41.5%.
The methodological takeaway is portable: the “residual baseline shift” test is easiest to see on the high-volume, short-form surface (Google Play, in this case). Long-form surfaces (App Store, Reddit, Trustpilot) carry more baseline meta-commentary and require per-month analysis to read the shift cleanly.
The most analytically distinctive move in this report is treating Duolingo’s FY2025 10-K as a hypothesis to test against, not as background context. The 10-K (filed February 2026, six weeks of in-window post-filing customer voice) contains two positions on AI that cannot both be true at the same time. We can read which one the customer data supports.
“Advanced AI, machine learning, and data analytics capabilities … allow us to leverage our data to optimize the learning experience … leads to compounding growth of core business metrics like DAUs and paid subscribers.”
“Our development, implementation and use of artificial intelligence and machine learning technologies … may not be successful, which may impair our ability to compete effectively, result in reputational harm and have an adverse effect on our business.”
The customer data unambiguously supports the second position. 4 of 5 AI-quality dimensions shift significantly at p < 0.001 across the Apr 28 boundary. AI mentions remain elevated 2.18× through Q1 2026. The post-pivot cohort frames the change as “replacing humans” and ties it explicitly to quality decline. The 10-K’s own CFO and CEO already signed the risk-factor sentence; the customer data shows it materializing. Duolingo reports $873.4M subscription revenue (+44% YoY) and ~9% paid penetration of 130M+ MAUs — the customer-voice evidence is the kind of signal that threatens the quality narrative a subscription upsell depends on.
“Company is going AI first… they don’t care about quality or users, only about profits. Cancelled subscription and uninstalled.”Google Play · ★1 · 2025-04-28 ai_quality_signal = ai_replaced_humans_negative_framing · translator_layoff_reaction_signal = layoff_explicit_negative · learner_lifecycle_stage = cancellation_or_deletion
“After an unbroken seven-year streak, time to say goodbye at last to AI slop.”Google Play · ★1 · 2025-04-29 ai_quality_signal = ai_made_courses_worse · streak_event_signal = streak_quit_to_protect_mental_health · learner_lifecycle_stage = cancellation_or_deletion
“Uses AI now, a lot of people are already seeing mistakes. Cancelled my Max.”Google Play · ★1 · 2025-04-28 ai_quality_signal = ai_factually_incorrect · product_tier_mentioned = max · learner_lifecycle_stage = cancellation_or_deletion
“Recently there was a change … disrupted my learning … I learned pen to be bolígrafo and now it’s pluma but feather is also pluma.”App Store · ★2 · 2026-03-31 ai_quality_signal = ai_generated_sentences_awkward · course_quality_signal = course_quality_declined · cultural_context_signal = language_specific_quality_issue
“Pretty worthless since they started using ai.”Google Play · ★1 · 2026-03-31 ai_quality_signal = ai_made_courses_worse · learner_lifecycle_stage = lapsed_returning · overall_review_sentiment = very_negative_detractor
Read this section as a portable pattern, not a Duolingo story. Every consumer subscription business shipping AI features should run this exact test on its own customer voice within 60 days of the launch: identify the event boundary, run pre/post chi-squared on at least five AI-adjacent dimensions, watch for the joint outcome+mechanism cliff in the same week, and measure the residual six months later. If the residual is >2× pre-event baseline, the announcement landed as quality-control removal, not as a product improvement — regardless of what the press release said.
The backup-hypothesis section. Engagement mechanics — streaks, hearts, energy, leagues — are normally talked about as moats. The October 20, 2025 Energy Points rollout is a clean live test of when a mechanic stops being a moat and starts being a tax: users either pay to remove friction, or churn while writing the verbatim “disguised paywall.”
Energy Points replaced the 5-heart-lives mechanic on the free tier in late October 2025. Users describe the change in remarkably consistent language: the mechanic is experienced as a practice interrupter, not as a game element. The dominant verbatim shape is some variant of “running out of energy mid-lesson” or “can’t complete more than two lessons a day without watching ads.” The mechanic is read as monetization disguised as gamification.
The qualitative pattern is consistent across Q1 2026 reviews: energy depletes fast, learning sessions end early, users either pay or stop. Reviewers describe it as a disguised paywall, not as game design. This is the operational definition of a retention tax — a mechanic that converts learning time into monetization pressure by interrupting practice mid-flow.
“They’ve switched to an ‘energy’ currency … you can be halfway through a lesson and run out.”Google Play · ★1 · 2025-11-04 hearts_or_energy_signal = energy_replaced_hearts_negative · engagement_mechanic_signal = mechanic_blocks_learning · learner_lifecycle_stage = hearts_or_energy_depletion
“The new energy/battery thing is deadset garbage. I’d rather delete the app than pay for a subscription.”Google Play · ★1 · 2025-10-25 hearts_or_energy_signal = energy_too_aggressive · pricing_or_paywall_signal = paywall_too_aggressive · learner_lifecycle_stage = cancellation_or_deletion
“The new energy system is just awful. Bring back those hearts!”App Store · ★1 · 2025-10-22 hearts_or_energy_signal = energy_replaced_hearts_negative · feature_request_or_paywall_dispute = want_old_mechanic_back
“I have a 765 day streak and I’m considering quitting because they just switched me from lives to energy.”Google Play · ★1 · 2025-11-08 hearts_or_energy_signal = energy_replaced_hearts_negative · streak_event_signal = long_streak_milestone_celebrated · gamification_emotional_response = mechanic_demotivating
Chi-squared with Yates correction across the Oct 20 boundary shows a large, statistically significant adverse-rate shift on Android, in the direction of declining adverse share post-Energy. That result needs careful reading.
| Source | Pre-Energy adverse share | Post-Energy adverse share | χ² (Yates) | Significance |
|---|---|---|---|---|
| Android (Google Play) | 15.93% | 13.66% | 474.72 | p < 0.001 |
| iOS (App Store) | 43.31% | 21.93% | 1,520.28 | p < 0.001 |
The aggregate decline is an artifact of the pre-Energy period containing the entire AI-first event (Apr–May 2025), which inflated the pre-Energy adverse baseline. The real Energy signal is the Oct 20 event-week spike itself (Android adverse jumps from 9.4% to 47.8% in one week), not the full-window aggregate. This is a methodological warning for anyone replicating the test: when two events occur in the same window, aggregate pre/post comparisons across the second boundary can wash out the second event’s spike. The control-case framing requires week-level temporal analysis, not period-level aggregates.
Three diagnostics, portable to any subscription product with gamification:
When a subscription product breaks faith with its customers, those customers don’t just leave — they substitute. In the reviews, substitution shows up as “I’m switching to X,” “if I wanted Y I’d use X,” or “just use ChatGPT instead.” This section names the substitution targets and quantifies how the reference set changed after Apr 28.
Most dissatisfied users don’t name an alternative; they just leave. So competitor naming is a low-frequency, high-intent signal — rare in absolute terms, but a leading indicator when it appears. Across the analysis window, competitor naming stays in the low single-digit percentages. But it spikes in months where product controversy is salient: May 2025 (the AI-first backlash) and October 2025 (the Energy rollout).
competitor_mentioned ≠ none/no_competitor_named. Two event markers shown.| Source | Pre-pivot competitor naming | Post-pivot competitor naming | χ² (Yates) | Significance |
|---|---|---|---|---|
| Android (Google Play) | 0.254% | 0.495% | 32.54 | p < 0.001 |
| iOS (App Store) | 1.262% | 1.118% | 0.07 | n.s. |
For Android, the AI-first pivot doesn’t just increase negative sentiment — it increases the rate at which users actively name substitutes. Controversy weeks become shopping weeks. iOS does not show the same shift in this slice; the iOS competitor-naming baseline is already 5× higher (longer, more deliberative reviews), and the limited pre-pivot iOS sample (1,030 reviews) inflates variance.
“If I wanted to learn with an AI, I’d subscribe to ChatGPT. Do your job, pay actual translators.”Google Play · ★1 · 2025-05-03 competitive_displacement_signal = switching_to_ai_tutor · competitor_mentioned = chatgpt_or_llm_tutor · translator_layoff_reaction_signal = layoff_explicit_negative
“paywall makes learning difficult, switching to Lingodeer.”Google Play · ★1 · 2025-10-28 competitive_displacement_signal = switching_to_competitor · competitor_mentioned = lingodeer · pricing_or_paywall_signal = paywall_too_aggressive
“I’ve been using Duolingo for 9 years … decrease in quality … ‘AI-first.’ I definitely recommend Mango more.”Google Play · ★1 · 2025-08-14 competitive_displacement_signal = comparison_favorable_to_duolingo · competitor_mentioned = mango_languages · learner_lifecycle_stage = long_term_streak
“Significant decrease in lesson quality since AI update … switching to Busuu.”Google Play · ★2 · 2025-06-19 competitive_displacement_signal = switching_to_competitor · competitor_mentioned = busuu · course_quality_signal = course_quality_declined
This section translates the AI and Energy shifts into subscription mechanics language: conversion, paywall pressure, trust erosion, and churn intent. The operator-relevant signal is not complaints about price — pricing complaints are constant background noise — but the trust signature that appears around monetization friction.
| Period | Source | n | Adverse ≤2★ | Negative paywall | Trust eroded (price/energy) |
|---|---|---|---|---|---|
| Pre-pivot | Android | 29,931 | 8.47% | 2.65% | 2.46% |
| Post-pivot, pre-Energy | Android | 232,064 | 16.90% | 2.92% | 2.91% |
| Post-Energy | Android | 211,041 | 13.66% | 4.96% | 7.44% |
| Pre-pivot | iOS | 1,030 | 16.99% | 10.10% | 8.25% |
| Post-pivot, pre-Energy | iOS | 11,092 | 45.75% | 19.17% | 30.45% |
| Post-Energy | iOS | 17,111 | 21.93% | 9.40% | 10.16% |
The Android pattern is the most operationally interpretable. Adverse share peaks in the post-pivot pre-Energy window (the AI backlash inflates it), then partially normalizes post-Energy. But trust erosion attributed to pricing/energy rises monotonically across all three periods: 2.46% → 2.91% → 7.44%. Negative paywall language follows the same shape but lagging. The interpretation: each subsequent monetization event compounds onto a base of accumulated trust debt, not onto a clean slate.
iOS shows a different shape because the iOS pre-Energy period contains the entire AI-first backlash spike (May–Aug 2025), inflating the middle bucket. The post-Energy iOS numbers decline from the middle period — not because the Energy change reduced friction, but because the AI controversy faded and the Energy controversy didn’t pile on as hard on iOS as it did on Android.
ai_quality_signal (active) and hearts_or_energy_signal (negative). Most-at-risk. These users have processed BOTH events and concluded the product is moving in the wrong direction on both dimensions. Highest cancellation intent, highest competitor-naming rate.“For years this was a great app for free users. The new energy system means you can’t complete more than two lessons a day unless you want to watch a ton of ads.”Google Play · ★1 · 2026-01-15 hearts_or_energy_signal = energy_too_aggressive · ad_load_signal = ads_too_many · pricing_or_paywall_signal = paywall_too_aggressive
“DO NOT PAY FOR SUPER … they decided to give us super users ADS THAT WE SPECIFICALLY PAID NOT TO SEE … GREEDY COMPANY. USE YOUR MONEY ON ACTUAL TUTORS.”Google Play · ★1 · 2025-12-22 pricing_or_paywall_signal = paywall_too_aggressive · ad_load_signal = ads_too_many · trust_signal = trust_eroded_pricing_or_paywall · product_tier_mentioned = super
“Behind the cutesy characters … happy laying off employees in favor of AI … lost explanations unless we pay … replaced heart system with a battery system unless we pay … Doesn’t matter if you’re acing the lessons — you gotta pay to keep going.”Google Play · ★2 · 2026-02-08 The one verbatim that contains the whole report · ai_quality_signal · translator_layoff_reaction_signal · hearts_or_energy_signal · pricing_or_paywall_signal · trust_signal
The Duolingo case is not generic. Three structural conditions made it uniquely exposed to the AI-pivot / engagement-mechanic combination that produced the $20B repricing. The pattern is portable to other subscription products only when all three conditions hold.
If all three conditions are present, the diagnostic to run on your own customer voice is straightforward: the event-week pre/post chi-squared test from Section 2, the engagement-mechanic boundary test from Section 3, and the competitive-substitution-set test from Section 4. The cost of running it is rounding error compared to the multiple compression a 17-point bookings-growth cliff produces.
The previous five sections were observational: this is what 502,269 reviews said and how the dimensions behave. This section closes the loop. Every customer-voice signal Sections 2–5 documented materialized in Duolingo's quarterly SEC filings — with a ~12-month lag. The framework called the event in week 1. Management acknowledged it in May 2026. In between, the stock fell 81% and roughly $20 billion in market capitalization was destroyed.
Across five quarters spanning the AI-pivot and the Energy event, Duolingo’s KPI trajectory collapsed in lockstep with the customer-voice signals Sections 2 and 3 documented. Daily Active User growth fell from +49% YoY (Q1 2025, the pre-pivot baseline) to +21% YoY (Q1 2026, filed May 4, 2026) — a 28-percentage-point deceleration that more than halved the company’s growth rate. The most violent single-quarter cliff is bookings growth in Q4 2025: from +41% to +24% as the Energy mechanic rolled out mid-quarter. FY2026 bookings growth guidance is now 10–12% — the lowest in Duolingo’s public history.
| Quarter | Filed | DAU YoY | Paid subs YoY | Bookings YoY | Revenue YoY | Customer-voice context |
|---|---|---|---|---|---|---|
| Q1 2025 | May 1, 2025 | +49% | +40% | +38% | +38% | Pre-pivot baseline. Bookings beat guidance; FY2025 raised. |
| Q2 2025 | Aug 6, 2025 | +40% | +37% | +41% | +41% | AI-first memo lands Apr 28. Surface metrics still strong — first 60 days lag. |
| Q3 2025 | Nov 5, 2025 | +36% | +34% | (high) | +41% | DAU growth decelerates four points. Lapping Q3’24 +54%. |
| Q4 2025 | Feb 26, 2026 (10-K) | +30% | (net +700K) | +24% | (decelerating) | Energy launches Oct 20. Bookings growth craters 17 points in a single quarter. |
| Q1 2026 | May 4, 2026 | +21% | (slow) | ~+10% | ~+15% | Management acknowledges “extra friction in monetization.” FY2026 guide halved. |
The most important framing for any subscription operator or public-market analyst reading this: the customer-voice analysis is a leading indicator with a 9–12-month lead time over equity-market repricing. Sections 2 and 3 document signals that were observable in the data within seven days of the events that caused them. Duolingo’s stock peaked two weeks after the AI-first memo (May 14, 2025: $544.93), continued grinding higher into Q2 2025 earnings (beat-and-raise on Aug 6 masked underlying friction), and only began the sustained decline that destroyed ~$20B in market capitalization once the Q3 2025 deceleration (Nov 5, 2025) and the Q4 2025 cliff (Feb 26, 2026 10-K) made the underlying friction undeniable.
The Q1 2026 shareholder letter (filed May 4, 2026, available on Duolingo’s IR site) is the document that makes Section 3 of this report rhetorically devastating. Quoting directly:
The company believes extra friction in monetization is part of the reason DAU growth has slowed, and has decided to prioritize user growth over monetization, investing more than $50 million of foregone bookings from friction into the free user experience.
Energy Points behaves like a retention tax: users either pay to remove friction or churn while writing the verbatim “disguised paywall.” The mechanic that gates completion reads as coercion. Once monetized via friction, you cannot easily roll it back — reversal is framed as retreat.
Read the two side by side. The Cable’s customer-voice analysis arrives at the same diagnosis Duolingo’s own CFO and CEO arrived at — using only review text, with twelve months of lead time over the SEC filing that confirms it. The $50M+ in foregone bookings Duolingo is now investing back is the financial price of not having seen Section 3 in May 2025. That price line-items into every quarter of FY2026.
Had The Cable Episode 01 been published in mid-May 2025 — two weeks after the AI-first memo, with only Q1 2025 financials available — the framework would have made four concrete claims, every one of which has since been confirmed by SEC filings:
| Claim the framework would have made (May 2025) | How it was confirmed | Confirmation lag |
|---|---|---|
| AI-pivot will generate a permanent baseline shift in AI-mention rate, not a transient spike. | Q1 2026 AI-mention residual sits at 1.451% vs. 0.665% pre-pivot baseline (2.18× elevated, persisting eleven months later). | ~11 months |
| Engagement-mechanic friction is the second-order cost of monetizing via interruption. | Q1 2026 letter: “extra friction in monetization is part of the reason DAU growth has slowed.” $50M+ foregone bookings invested to roll back. | ~7 months |
| Competitive displacement set shifts toward generic AI tutors (“why not just use ChatGPT?”). | Bookings growth guide cut to 10–12% for FY2026; management cites “new monetization balance”; speaking/AI features specifically called out as response. | ~12 months |
| Reputational drag will materialize as growth-rate deceleration, not as immediate cancellations. | DAU growth 49% (Q1'25) → 21% (Q1'26). Bookings growth 41% (Q3'25) → 10–12% (FY'26 guide). ~$20B market cap erased. | ~12 months |
With the framework validated against the realized FY2025/FY2026 numbers, the same model can be run forward. Three quantified scenarios from Section 2 and 3 cohort sizes:
| Scenario | Cohort & assumption | Implied revenue at risk | Confidence |
|---|---|---|---|
| Direct switch-intent churn | ~5,000 post-Apr-28 reviews with explicit switching intent × (1 / 2% review-to-paid-subscriber ratio) × 2× paid-sub overrepresentation × 40% intent-to-action conversion × $75 ARPU = ~$15M/year | ~$15M | High |
| Reputation-drag CAC inflation | If the Q1 2026 AI-mention residual (2.18×) continues, blended CAC rises 8–15% as conversion friction compounds. On a ~$200M+ FY2025 sales & marketing base, that’s ~$16–30M/year of additional spend to maintain same growth. | $16–30M | Medium |
| Foregone bookings investment (already happening) | Duolingo’s own publicly-stated investment: $50M+ in foregone bookings to roll back friction. This is the line item management has already conceded. | $50M+ | Confirmed |
| Realized market-cap impact (not a forecast) | From $544.93 peak (May 2025) to $105.15 close (May 11, 2026) on ~46M diluted shares. | ~$20B | Confirmed |
Three quarterly KPIs to watch in your own filings, mapped to the customer-voice signals that lead them:
All 25 dimensions extract only from learner-authored text (Message, plus title on iOS rows). Star rating, country, and other metadata are joinable but never feed into a dimension’s value. Every enum terminates with none_detected.
product_tier_mentioned · feature_or_surface_mentioned · learner_lifecycle_stage. Where in the learner journey and on which Duolingo surface the reviewer sits.ai_quality_signal · course_quality_signal · cultural_context_signal · translator_layoff_reaction_signal. The primary-hypothesis spine. ai_quality_signal uses a 7-step priority order.engagement_mechanic_signal · streak_event_signal · hearts_or_energy_signal · gamification_emotional_response. The backup-hypothesis spine and Section 03’s analytical workhorses.pricing_or_paywall_signal · ad_load_signal · feature_request_or_paywall_dispute. Section 05’s subscription read-through layer.customer_service_signal · account_or_data_loss_signal. Captures progress-lost, account-deletion-blocked, sync-across-devices-problem.competitor_mentioned · competitive_displacement_signal · trust_signal. Section 04’s substitution-set spine.overall_review_sentiment · recommendation_signal. The classifier that drives the Learner Health Segment assignment.primary_pain_point_phrase · primary_delight_phrase · feature_request_detail · sentiment_verbatim · learner_situation_summary. The verbatim citation layer; each is concise (10–30 words) and preserves the reviewer’s voice in original language.The portable mapping: when you run this analysis on your own customer voice, here is which dimensions to watch for which operational lever in your subscription P&L.
competitive_displacement_signal ∈ {switching_to_competitor, switching_to_ai_tutor, has_already_switched} × recommendation_signal ∈ {would_not_recommend, would_actively_dissuade}. The compound cohort is your direct churn-intent measure.engagement_mechanic_signal + hearts_or_energy_signal + gamification_emotional_response. The Energy boundary test pattern is portable to any gamification mechanic change.ai_quality_signal + course_quality_signal + cultural_context_signal. The Section 02 chi-squared test is the canonical structural diagnostic for an AI-feature announcement.competitor_mentioned ∈ {chatgpt_or_llm_tutor, etc.} × period (pre vs post your own AI announcement). Watching the substitution-set composition is the leading indicator.ad_load_signal ∈ {ads_too_many, ads_inappropriate_content, ads_unskippable_or_broken}. When ads start carrying disproportionate trust-erosion language, you have priced your free tier into a paywall.trust_signal ∈ {trust_eroded_post_ai_pivot, trust_eroded_post_energy_change, trust_eroded_pricing_or_paywall, trust_eroded_quality_decline}. Trust erosion leads price language by 1–2 quarters — this is the earliest-warning monetization indicator.For readers who want to verify the analytical rigor or replicate this work on their own customer voice. This appendix walks through the seven-step pipeline that produced every finding in this publication, from raw scraped reviews to chi-squared statistics to the $20B financial read-through.
Dimension Labs is a causal-intelligence platform. We turn unstructured customer text (reviews, complaints, support tickets, social posts, internal notes) into structured signal (labeled dimensions that can be counted, joined, and statistically tested), then overlay that signal against the corporate events that explain it. The Cable is our weekly publicly-traded-company analysis program; each episode applies the same pipeline to a different company. Duolingo is Episode 01.
The output of every analysis is two things: (a) a labeled dataset where each row of customer text carries a complete row of structured signals (25 dimensions in this case), and (b) a causal-intelligence report that tests pre-registered hypotheses against the data and quantifies the result.
For Episode 01, three public review surfaces were scraped: Google Play (Android), Apple App Store (iOS), and Reddit (r/duolingo, r/duolingomemes, r/languagelearning). For each review, the scrape captured: review text, star rating where applicable, timestamp, app version, country code, and (for Reddit) subreddit, upvotes, and the top 20 comments. 737,342 total reviews were scraped across the three sources. The Reddit corpus was set aside for a later enrichment pass; the analytical work in this publication uses the 502,269 in-window app-store reviews.
No customer identification, no personally identifying information beyond what reviewers chose to publish, no internal Duolingo data. Every byte underlying this report is publicly available.
Before any enrichment ran, we designed a 25-dimension extraction schema grouped into 8 clusters (Appendix A is the full reference). The schema is a structured JSON file that defines, for each dimension, an enum of allowed values (or a free-text rule), a priority order when multiple values could apply, a list of disambiguation rules, and cross-dimension consistency rules.
Two dimensions illustrate the design discipline. ai_quality_signal uses a seven-step priority order: when a review contains multiple AI-quality-related signals, the LLM is instructed to return the highest-priority value, with priority ordered from broad indictment (“AI made the courses worse”) down through specific complaints (awkward sentences, factual errors, layoff framing). learner_lifecycle_stage uses a ten-step priority order anchored on cancellation cues (“I’m done,” “deleting the app,” “last straw”) followed by content-encounter cues followed by mechanic-friction cues. Without these orderings, multi-signal reviews classify inconsistently.
The single most important rule in the schema: dimensions extract signal only from the customer’s text (the review body, plus the review title on iOS where present). The LLM never sees the star rating, the source, the country code, the subreddit, the upvote count, or the date. This rule is what makes the dimensions analytically usable downstream.
The reason is methodological: if the LLM had access to the star rating when classifying sentiment, every chi-squared test linking sentiment to star rating would be circular — we’d be measuring how reliably the LLM copies a number it was already shown. By withholding the metadata at enrichment time, we guarantee that a 1-star review and a 5-star review with identical text get identical dimension labels. The star rating then becomes a clean outcome variable to test the dimensions against, not a contaminated input.
The platform runs each review through a large language model independently — one row at a time, no cross-row context. The LLM receives the schema, the row’s text, and a structured output instruction; it returns a JSON object with one value per dimension. The 25 dimensions on 502,269 reviews produced approximately 12.5 million structured signals from the underlying text corpus.
Free-text dimensions (verbatim quotes, situation summaries) follow a stricter rule: copy the customer’s words verbatim, do not paraphrase, do not translate non-English content. Every quote in this publication’s body is reproduced exactly as the reviewer wrote it; where the source language is Spanish, Portuguese, Japanese, or Korean, the quote appears in the original language with light translation as context only.
Before running the schema against the full corpus, we ran it against a stratified 30-row sample spanning each major event window (pre-pivot baseline, AI-pivot week, AI decay, Energy event week, Energy decay, Q1 2026 baseline) and each source. The QA round caught one enum violation, one cross-dimension inconsistency, and several boundary cases that pointed to ambiguities in the schema. Eight description-level fixes were applied to the schema before the production run.
After the production enrichment completed, we ran a second-pass validation: distribution checks on every dimension (no value should over-fire above 95% of the corpus — saturation means the dimension is uninformative), enum compliance (every value must come from the defined enum, no LLM-invented values), and free-text hallucination checks (the verbatim text must overlap with the source review’s words).
All quantitative claims in this publication are computed in SQL against the enriched dataset. The chi-squared statistics use Yates’ continuity correction on 2×2 contingency tables:
| Step | Computation |
|---|---|
| 1. Define the binary signal | For each test, define a binary based on a dimension value: e.g., ai_quality_signal NOT IN ('no_ai_mention', 'none_detected') = 1. |
| 2. Define the period split | Pre-period: reviews before the event boundary (Apr 28, 2025 for the AI test). Post-period: reviews on or after. |
| 3. Build the 2×2 table | a = signal present in post-period; b = signal absent in post-period; c = signal present in pre-period; d = signal absent in pre-period. |
| 4. Compute χ² (Yates) | χ² = N × (|ad - bc| - N/2)² / ((a+b)(c+d)(a+c)(b+d)), where N = a+b+c+d. |
| 5. Read significance | χ² ≥ 10.83 = p < 0.001; ≥ 6.64 = p < 0.01; ≥ 3.84 = p < 0.05; otherwise n.s. |
Lift ratios use the same enriched dataset: P(adverse outcome | signal present) / P(adverse outcome | signal absent), with adverse defined as star rating ≤ 2 on app-store reviews. Lift ratios are paired with chi-squared in every claim — lift without significance is rhetorical; significance without effect size is technical but unimportant.
The last step is what gives this publication its analytical edge. The enriched dataset is overlaid against the corporate timeline: the April 28, 2025 AI-first memo, the October 20, 2025 Energy Points rollout, the four quarterly SEC filings spanning the window, the Q1 2026 management acknowledgment of monetization friction. Where the customer-voice signal and a corporate event line up in time, the analysis tests whether the customer voice led, lagged, or coincided with the financial signal.
For Duolingo, the customer-voice signal led the financial signal by 9-12 months. That gap — the lead time of qualitative customer voice over quantitative quarterly financials — is the central methodological finding of this publication, and the reason we believe the framework is portable to other publicly-traded consumer subscription companies. Section 6 shows the lead-time visualization quarter by quarter.
Three things make this analysis reproducible:
The portable workflow:
For most subscription businesses with public review surfaces and a recent AI announcement, this is a 4-6 week engagement. The cost is rounding error compared to a single quarter of bookings-growth deceleration.