Dimension Labs · Research · June 2026
The Dimensionality of Meaning
A new foundation for enterprise research in the age of AI
Our early validation work on text dimensionality provides key support for an emerging class of AI-native research methods.
Executive summary
Text dimensionality provides a now-validated theoretical framework, and it begins by inverting an assumption most analytics quietly accepts: that subjective interpretation is a barrier to meaningful insight. When several readers take different meanings from the same message, that disagreement is not a sign that the text is too soft to measure but rather evidence that it carries several distinct and separately measurable meanings at once, each of which can be extracted and quantified at scale. The implication is that subjectivity is a signal that reveals the depth of information contained in language data.
We bring the concept into focus on real business data, to show the value of dimensionality in practice. Reading 1,916 of Stake.com’s support conversations through six different roles, each with a stake in what customers say, we found that 92% of interactions carried information relevant to more than one business function at once. The clean single-topic ticket every support operation is built around accounted for fewer than five conversations in a hundred, and the meanings proved to be not noise but part of a stable structure, in which support engagements, the work of actually solving a problem, occurred alongside signals relevant to stakeholders across the organization. The insights that dimensionality produced had direct implications for revenue, among them a driver of churn that started with a failed deposit, moved through a slow support exchange, and only then became a decision to leave.
The implication is that immense operational value sits inside the corpus of data the enterprise already owns but has long treated as too complex to analyze, and those signals can now be mapped to what actually moves its revenue. Set against the practical impossibility of having humans annotate this data at scale, the near-zero marginal cost of large-language-model annotation, paired with a direct line to revenue, is what makes text dimensionality the foundation for an emerging class of AI-native research. The durable advantage will accrue to whichever companies begin asking more of the unstructured data they already hold.
One customer message is really six messages at once
Consider an ordinary support conversation, the kind a consumer business generates by the thousand each day. A player writes in because a 24,000 deposit never arrived, and over the next few messages the exchange picks up more: the bonus tied to that deposit is missing, a verification step will not clear, the agent resolves nothing, and the player shifts from asking for help to threatening to leave. A single short transcript now holds four distinct meanings at once, a payments failure, a service breakdown, a marketing shortfall, and a churn risk, each of which a different leader in the business would pay to know. The words never change; what changes is how much a reader is equipped to take from them.
Conventional text analytics keeps one of those meanings and discards the rest. Sentiment scoring collapses the conversation to a single polarity, topic modeling assigns it to one cluster, and a support taxonomy routes it to one queue, and while each tool does its job, the job is to flatten. We call this the flattening problem, and its cost is hidden, because it never shows up as an error, only as absence: the four-fifths of every conversation no one reads, and the questions a business never thinks to ask of data it already pays to keep.
For thirty years that waste was rational, because reading even one dimension of meaning across a large corpus required a codebook, trained coders, and weeks of work, so teams answered the single question with the clearest business case and left the rest of the text dark. What has changed is not the value of the text, which was always the richest record a company held, but the cost of reading it, and the right response to that collapse in cost is to stop asking each conversation what it is about and start asking it everything at once.
Text dimensionality is the framework that makes this possible. It holds that a single passage carries information along several independent axes at once, each one a real and separable signal that can be extracted and quantified on its own, so that the deposit conversation is at the same time a payments record, a service record, a compliance record, a marketing record, and a churn record, and the meaning a reader recovers depends entirely on the question that reader brings. The disagreement among readers about what such a text is “really” about, long treated as a reason to dismiss unstructured data as too soft to measure, is in fact the opposite: a signal that reveals how much information the text contains.
The test: is meaning really multi-dimensional, or does it just look that way?
A fair objection stands in the way. A model asked six questions of a conversation will return six answers, and those answers are worthless unless they are genuinely distinct rather than one signal dressed six ways, so the burden of this study was to show that the multiplicity belongs to the text and not to the asking.
We took 1,916 support conversations from a single day of live traffic at Stake.com, the cryptocurrency betting platform Easygo operates, and read each one through six lenses, one for every role with a stake in what customers say: payments, support, marketing, compliance, product, and retention. Each lens is a single instruction to the model, telling it to read for one dimension, record whether that dimension is present, and, where it is, name the issue and quote the customer’s own words. The prompt is the instrument. We call the method observer-dependent measurement, because pointing a defined observer at a fixed text resolves one dimension of its meaning, and a different observer recovers another.
The timing is what makes this practical rather than theoretical, because the one hard technical question has already been settled from the outside. Across more than a thousand annotation tasks, Asirvatham, Mokski, and Shleifer (2026) showed that a model used this way matches trained human raters and holds steady however the request is worded. Reliability, in other words, is solved, and the value has moved up a level, to the question of what a company chooses to measure. That is where text dimensionality makes its claim: the most underused lever a business holds is not a better classifier but the recognition that one conversation answers many questions at once. Reliability was the precondition; multiplicity is the prize.
To keep the test honest, the data carries its own control, a field we call divergence, set only when two or more lenses surface genuinely different aspects of the same conversation rather than relabeling one shared point. It turns the objection into a number anyone can check. Were divergence rare, the lenses would be redundant and the framework would fail on its own evidence; a high rate proves the meanings distinct. Everything that follows rests on that flag and the structure beneath it.
Exhibit: Scope and readiness
| Metric | Value |
|---|---|
| Conversations analyzed | 1,916 |
| Window | one day, 22 to 23 March 2026 |
| Full conversations | 1,909 of 1,916 |
| Lenses applied per conversation | 6 |
| Rows where the lens count disagreed with the flags | 0 |
The table is clean: the recorded count matched the underlying flags on every row, and nothing was marked divergent on a single lens. The findings rest on the data as recorded.
Most conversations carry several meanings at once
The first result resets everything after it: in this data, multiple meanings are the rule, not the exception. Of the 1,916 conversations, 1,767, or 92%, carried two or more genuinely distinct meanings at once; the average activated 2.74 of the six lenses; and the clean single-issue ticket that every routing system presumes proved to be the rare case, just 4.75% of the data. A business that reads its conversations one meaning at a time is not missing the occasional nuance, it is discarding most of what its customers are telling it.
Exhibit: How many meanings does one conversation carry?
The divergence flag is deliberately strict. It ignores the few conversations where two lenses point at the same thing, which is why 92% are flagged as multi-meaning while the raw count of conversations with two or more lenses is a touch higher.
The shape of the distribution matters as much as the headline. The mass sits in the middle, at two and three meanings each, not in a thin tail of unusually tangled threads, which means the multiplicity comes from ordinary contact, not outliers: the typical conversation is the multi-dimensional one. A failed deposit does not stay a payments issue. It becomes a service issue while the customer waits, a marketing issue when the bonus goes missing, a compliance issue when verification blocks the fix, and a retention issue once patience runs out. The conversation is layered because the experience was, and any reading that keeps one layer discards the rest by design.
The meanings aren’t noise, because they fall into a pattern
A high count of meanings proves little if the meanings are arbitrary, so the harder test is whether the six lenses measure distinct things or merely scatter correlated tags. The structure of the data settles it, and the structure is not random. One dimension organizes the rest: the support lens was active in 90% of all conversations, far ahead of payments at 61% and everything else near a third or below. What matters is not how often support appears, but what it appears with.
Exhibit: When any other problem appears, support is almost always there too
That pattern is the proof that the lenses measure something real. Random tagging does not produce a 94% overlap, a single dimension that binds all the others, and a stable ranking of recurring combinations. Noise fires independently and evenly; these organize around the support experience, exactly as the conversations read, because a customer rarely reports only that a deposit failed but describes, at length, the experience of waiting, re-explaining, and chasing a fix. The same order runs through the most common combinations, where money plus support leads by a wide margin and the next bundles simply stack a missing bonus or a verification block on the same core.
Exhibit: The default shape of a conversation: money, support, and one thing more
| Most common combinations | Conversations | Share of multi-meaning |
|---|---|---|
| Payments + Support | 385 | 21.79% |
| Marketing + Payments + Support | 144 | 8.15% |
| Compliance + Payments + Support | 142 | 8.04% |
| Marketing + Support | 100 | 5.66% |
| Payments + Support + Retention | 94 | 5.32% |
Every leading combination tells one story, of a money problem, the effort to resolve it, and one further concern on top, which means the conversation has a default architecture with money and support at its foundation, a structure worth far more to an operator than any sentiment score, because it locates the recurring spine of customer trouble rather than its passing mood.
What actually drives churn
Retention, the dimension closest to revenue, is the clearest case. Measured the usual way, churn is a bucket, a list of unhappy customers handed to a save team after the decision to leave has already formed. Measured across all six lenses at once, it is the last stage of a problem that began elsewhere: retention risk almost never arrived alone, 90% of conversations carrying departure language also carried a support problem, nearly half, 49%, also carried a payments problem, and the most common high-value pattern in the corpus was support and retention together, 530 conversations in which the threat to leave was wrapped around a service experience, not a complaint about price.
Exhibit: Where the threat to leave actually comes from
| Pattern | Conversations | Share of multi-meaning |
|---|---|---|
| Support + Retention | 530 | 29.99% |
| Compliance + Payments + Support | 290 | 16.41% |
| Payments + Retention | 289 | 16.36% |
| Marketing + Retention | 289 | 16.36% |
| Compliance + Retention | 253 | 14.32% |
The same conversations point to the source. Inside the payments lens, deposit failures are the single largest issue, at 53%, and inside retention, explicit departure language is half of everything we see, and the two appear in the same conversations, beside a support exchange that did not put the problem right. The departure language is not where the trouble starts but what a customer writes once a failed deposit has gone unresolved, and because the failure that sets these conversations off happens before the customer ever opens the chat, the lever is deposit reliability, a fix the business controls, rather than anything said in the conversation itself.
Exhibit: What each lens mostly sees
| Lens | Leading issue | Share of that lens |
|---|---|---|
| Payments | Deposit failure | 52.57% |
| Support | Was the problem actually solved | 52.70% |
| Retention | Explicit churn (“I’m leaving”) | 50.67% |
| Marketing | Promised bonus never arrived | 87.65% |
| Compliance | Verification or KYC friction | 40.29% |
| Product | Betting or app friction | 32.84% |
The strategic implication is direct: the language of churn is a symptom, and the deposit failure that triggered it is the cause worth fixing, so a team watching only for departure language is treating the symptom and leaving the cause untouched in the payments data. This is what a driver of revenue actually looks like, a traceable line from a fixable cause to a financial outcome with the volume of conversations attached at every step, and it is the line a flattened, one-meaning reading can never draw. The aggregate holds up in the transcripts behind it: a player whose 24,000 deposit never cleared and whose bonus never came, signing off by asking whether the company ever tires of robbing its customers; a long-tenured player awaiting an 800,000-peso withdrawal, recalling years of trust and heavy spend in the same breath as the delay. Each is one experience carrying several business meanings at once, now captured as structured data rather than lost in prose.
The framework makes four predictions, and the study was built so that real data could have proven any of them wrong. Across a hard day of real conversations all four predictions held.
Exhibit: The theory made four predictions; the data met all four
| Text dimensionality predicts | The Stake data showed |
|---|---|
| One text carries several meanings at once, as the norm | 92% multi-meaning; 2.74 lenses per conversation |
| The meanings are distinct, not the same point relabeled | The divergence flag, which requires different aspects, fired on 92% |
| The meanings form real structure, not random co-activation | Support is the hub at 90%-plus overlap; a stable order of combinations |
| The structure carries cause-and-effect a one-topic view misses | Churn sits downstream of payments and support; deposit to reply to exit |
Surviving that test makes text dimensionality more than a concept, and because the same test showed exactly where revenue leaks, it marks the turn from proof to the more valuable question of what to do with it.
From theory to roadmap: cheap questions, joined to data you already have
What turns a single study into a method is a fact about cost, and it is obvious the moment you consider the seventh question rather than the first. Asking a new question of the text is no longer a project but an instruction, written once and run across conversations already in storage, with no new data to collect, no model to train, and no coders to hire, so the first question carries the cost of standing the system up while the seventh and the seventieth cost almost nothing. Set that against the alternative, the practical impossibility of having people read and annotate millions of conversations by hand, and the change is not incremental: questions worth asking only on the chance the answer is interesting become worth asking at all, and a corpus that used to sit on the books as a storage cost and a compliance liability becomes an asset that appreciates with every question put to it.
The larger return comes when each measured meaning, now a column with a timestamp and an account behind it, sits beside the structured data the business already trusts, the deposit ledger, the withdrawal logs, the lifetime-value table. The question stops being the soft and unfalsifiable “what are customers saying” and becomes one with a number attached: do the accounts whose deposits failed on Tuesday wager less by Friday, and by how much. The warehouse alone could never answer it, because it saw the failed deposit and the falling wager but not the conversation in between that turned a glitch into a decision to leave; the transcript alone could never answer it either, because it held that account in prose that could be neither counted nor joined. Measuring the conversation along defined dimensions is the step that lets both halves of a company’s data answer one question together, so that for Stake the central finding becomes a quantity rather than an anecdote: deposit reliability is a measurable lever on retention, and the size of that lever can be read straight off the ledger.
What this changes for the business
The first thing to give way is the assumption built into the org chart, that each conversation belongs to one team, because 92% of them do not, and the most useful reading of the Stake data is that payments and support are one problem seen from two desks. Payments is the highest-leverage place to start: deposit and withdrawal failures dominate that lens and pull support into the conversation 94% of the time, so reliability work there relieves support load and churn pressure at the same stroke. Support is the amplifier, because its presence in nearly every conversation means that faster, clearer resolution lowers the felt severity of problems that start elsewhere. Compliance is a customer-experience function whether or not it is staffed as one, since verification friction is 40% of its volume and routinely blocks both play and payouts. And retention should move its instruments upstream, because by the time departure language appears, the deposit or service failure that caused it has already happened, and the save team is being asked to recover a customer the business could have kept.
A different way to get value from text
For most of the history of analytics, getting more from your data meant collecting more of it. Text dimensionality reverses that logic. We added not a single conversation to the Stake corpus between the one-meaning reading and the six-meaning one; we asked more of what was already there, and a day of support chats a sentiment model would have crushed into one negative tick gave up six connected views of the business instead.
The shift is large enough to stand as a discipline of its own. The unit of analysis stops being the document and its one main topic and becomes the dimension, a specific question put to the text the same way on every record, so answers can be counted, related, and joined to everything else the business measures. The scarce skill stops being the cleaning of text or the tuning of a model and becomes the judgment of which questions are worth asking. And the corpus itself begins to appreciate, because every new question makes the same words worth more.
The proposition was that meaning is layered. The data agreed, on more than nine in ten conversations. The payoff is that those layers, once measured and joined to the figures a business already keeps, are the clearest map it will find of what drives its revenue, so that read at scale, across every corpus a company holds, text dimensionality becomes the central mechanism of enterprise-grade analytics and the foundation of the big-data analysis still to come. The durable advantage will go to whichever companies start asking the most of the unstructured data they already own.
Source: the pf2_observer_divergence analysis of 1,916 Stake.com support conversations, 22 to 23 March 2026. Measurement reliability per Asirvatham, H., Mokski, E., & Shleifer, A. (2026), validating large language models as measurement instruments statistically indistinguishable from trained human evaluators.