Big Five vs MBTI: What 16Personalities Won't Tell You

Roughly 100 million people have taken a test on 16Personalities. Most of them walked away with a four-letter type, a color, a flattering write-up, and the quiet feeling that something finally explained them.

That feeling is real. The science behind it is not.

This is not a takedown of anyone who loves their INFJ identity. The desire to be seen, named, and understood is one of the most human things there is, and MBTI-flavored sites are very good at scratching that itch. The problem is the part they leave out: the framework underneath has been failing scientific tests for forty years, and the people who built it never set out to build a science in the first place.

Here is what the research actually shows, why the descriptions still feel so accurate, and what a personality measurement looks like when it is built to hold up instead of to flatter.

What MBTI got right

Before the critique, the honest part.

MBTI got one thing more right than any academic personality model: it made self-understanding feel like something you would actually want to do. The four-letter types are sticky. They give you a vocabulary to talk to a friend about why you keep canceling plans, or why your partner needs ninety minutes alone after a dinner party. That language matters. Most clinical personality assessments have nothing remotely as readable.

It also took introversion seriously at a moment when American culture treated it as a defect. A lot of quiet kids in the 1980s and 90s first learned the word "introvert" from a Myers-Briggs handout and stopped trying to perform extraversion for a while. That was a real gift.

Hold onto that — because everything below is about why the framework still does not work as a measurement, not about whether people who took it found something useful in the experience.

Where MBTI breaks down

Three problems keep showing up in the peer-reviewed literature, and none of them are minor.

1. The test does not give you the same answer twice

When people retake the MBTI a few weeks apart, somewhere between 39% and 76% get a different four-letter type ¹. Read that again. A test that is supposed to capture a stable identity returns a different identity to roughly half its takers within a month.

This is not a small noise issue. The MBTI sells itself on type — you are an INTJ, you are an ENFP — but the test cannot reliably tell you which one you are. If your bathroom scale gave you a different weight every other Tuesday, you would not call it a scale.

2. The either/or types invent a cliff that does not exist

Real human traits sit on a smooth distribution. Most people are not deeply introverted or deeply extraverted. They cluster in the middle, with a slight lean one way or the other.

MBTI takes that middle and shoves it to a side. A person who scores 49% on the extraversion items gets stamped with one letter. A person who scores 51% gets the opposite letter and the opposite four-letter type — even though those two people are statistically almost identical ². The framework is built on cliffs that the underlying data does not contain.

This is the part that makes type-based descriptions feel so dramatic. INTJ and INTP sound like different species. The score difference between them is often a few percentage points.

3. The framework does not predict the things it claims to

The strongest claim a personality test can make is predictive validity — give me your score, and I can tell you something useful about how your life is likely to go. Job performance. Relationship outcomes. Mental health. Income trajectory.

Decades of attempts to use MBTI scores this way have come back weak or null. A 1991 National Academy of Sciences review concluded there was "not sufficient, well-designed research to justify the use of the MBTI in career counselling programs" ³. A 2019 Social and Personality Psychology Compass review of the MBTI's validity reached a similar conclusion: the type structure is not supported by the data, and the predictive findings are thin ⁴.

This matters because predictive validity is the whole point. A personality measurement that cannot tell you anything about your future life is a vibe. There is nothing wrong with a vibe — but it should not be priced or used like a science.

The part about who built it

Most people assume MBTI came out of a research lab. It did not.

Katharine Cook Briggs and her daughter Isabel Briggs Myers developed the indicator in the 1940s based on their reading of Carl Jung's Psychological Types. Neither had formal training in psychology, psychometrics, or statistics. Jung himself wrote that his types were "in no sense meant to classify human beings into categories" but rather to describe rough tendencies ⁴. The mother-daughter team took the tendencies and built categories anyway.

This is not a smear. It is a fact about provenance. The MBTI was built by two thoughtful, self-taught women trying to help women find suitable work during World War II. That is a respectable origin story. It is also not the origin story of a scientific instrument, and pretending otherwise does a disservice to everyone involved.

Why the descriptions still feel accurate

Here is the part most critiques skip. If MBTI is so broken, why does the description on 16Personalities feel like someone read your diary?

There is a specific psychological phenomenon for this. It is called the Barnum effect (sometimes the Forer effect), and it was demonstrated in a 1949 classroom experiment. Bertram Forer gave students a personality test, handed each of them a "personalized" result a week later, and asked them to rate its accuracy. Average rating: 4.3 out of 5. Forer had given every student the exact same description, assembled from newspaper horoscopes ⁵.

The trick has three ingredients, and most type-based descriptions use all three:

Statements vague enough to fit almost anyone. "You have a great deal of unused capacity which you have not turned to your advantage." Who reads that and disagrees?
A flattering tilt. People rate positive descriptions as more accurate than negative ones, even when both are equally generic.
Reader-supplied specifics. When a sentence is general enough, your brain quietly fills in a real memory that fits. The sentence then feels personally targeted.

Run that pattern through a confident four-letter label and a clean visual design and you get the 16Personalities experience. The recognition feels like accuracy. It is not the same thing.

The cleanest test: does any of the description not apply to you? If everything lands as true, the description is doing Barnum work, not measurement work.

Why the Big Five holds up

The Big Five (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism) came out of a very different process. Researchers started with the words real people use to describe each other in everyday language, ran the patterns through factor analysis across decades, languages, and cultures, and asked which clusters of words consistently traveled together ⁶. Five did. Across English, German, Czech, Turkish, Filipino, and dozens more.

That work has paid off in three ways that MBTI never has.

Predictive validity that replicates. Conscientiousness is one of the strongest non-cognitive predictors of job performance and income across almost every occupation studied ⁷. Neuroticism predicts relationship dissatisfaction and divorce risk ⁸. High Openness predicts who pursues creative or unconventional life paths. These are not weak correlations — they are some of the most replicated findings in personality psychology.

Dimensional, not categorical. A Big Five report does not say you are an INTJ. It says you are 73rd percentile on Conscientiousness, 22nd on Extraversion, 64th on Openness. That is a specific position on a distribution, and it cannot be quietly rewritten into "you have a mix of both" without losing information. The 22nd percentile is the 22nd percentile.

Honest about cost. Each trait has tradeoffs that show up in the research. High Conscientiousness predicts income and longevity but also rigidity and burnout. High Agreeableness predicts warmth in relationships but lower salary negotiations. High Openness predicts creativity and lower religiosity. A good Big Five report is willing to name the cost, not just the upside. That is the part type-based reports almost never do.

"But I like being an INFJ"

That is fine, and it is allowed.

A test does not need to be scientifically valid to be a useful prompt for self-reflection. If your MBTI type gave you language to explain something about yourself to your partner, your therapist, or yourself, the language did real work even if the underlying framework did not.

The argument here is narrower. It is that if you want a measurement you can lean on for actual decisions — what job to take, why a relationship keeps hitting the same wall, whether a pattern you keep noticing in yourself is real or imagined — you want the model that has fifty years of replication behind it, not the one that gives you a different answer every time you take it.

You can keep the INFJ as a nickname. You should not use it to choose a career.

What a real measurement reads like

Compare two sentences.

"You are an empathetic idealist who values deep connection and may struggle when the world fails to live up to your standards."

"You are at the 78th percentile on Agreeableness and the 81st on Neuroticism. That combination predicts strong empathy in relationships and a real cost: difficulty enforcing your own limits, and a tendency to absorb conflict you did not start. The pattern often shows up as resentment that arrives later than the moment that caused it."

The first sentence could be about almost anyone. The second is a specific claim about a specific score range, and it names a cost the reader might rather not see. Some readers will recognize it. Others will not. That asymmetry — the fact that some of it should not apply to you — is what an actual measurement looks like when it is written honestly.

What to do with this

If you want to keep the four-letter type as a piece of personal vocabulary, keep it. It costs nothing.

If you want to know what your personality actually looks like by the standards of the field that studies personality for a living, take a Big Five assessment instead. The version we use, the IPIP-NEO-120, is in the public domain, takes about twelve minutes, and is one of the most validated free instruments in the literature ⁹.

You will not get a color or a four-letter label. You will get five percentile scores and a report that is willing to tell you what the high ones cost you.

That is the difference between a personality test that wants you to feel seen and a personality test that wants to be right.

See your actual Big Five profile (free, 12 min) →

References

Pittenger, D. J. (1993). The utility of the Myers-Briggs Type Indicator. Review of Educational Research, 63(4), 467–488. https://doi.org/10.3102/00346543063004467 ↩
McCrae, R. R., & Costa, P. T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40. https://doi.org/10.1111/j.1467-6494.1989.tb00759.x ↩
Druckman, D., & Bjork, R. A. (Eds.). (1991). In the Mind's Eye: Enhancing Human Performance. National Academy Press. ↩
Stein, R., & Swan, A. B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching tool and window into intuitive psychology. Social and Personality Psychology Compass, 13(2), e12434. https://doi.org/10.1111/spc3.12434 ↩ ↩²
Forer, B. R. (1949). The fallacy of personal validation: A classroom demonstration of gullibility. Journal of Abnormal and Social Psychology, 44(1), 118–123. https://doi.org/10.1037/h0059240 ↩
Goldberg, L. R. (1990). An alternative "description of personality": The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229. https://doi.org/10.1037/0022-3514.59.6.1216 ↩
Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The power of personality: The comparative validity of personality traits, socioeconomic status, and cognitive ability for predicting important life outcomes. Perspectives on Psychological Science, 2(4), 313–345. https://doi.org/10.1111/j.1745-6916.2007.00047.x ↩
Solomon, B. C., & Jackson, J. J. (2014). Why do personality traits predict divorce? Multiple pathways through satisfaction. Journal of Personality and Social Psychology, 106(6), 978–996. https://doi.org/10.1037/a0036190 ↩
Johnson, J. A. (2014). Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory. Journal of Research in Personality, 51, 78–89. https://doi.org/10.1016/j.jrp.2014.05.003 ↩