The short answer: the Myers-Briggs Type Indicator (MBTI) does not meet the standards that academic psychology uses to judge whether a test "works." The longer answer is more interesting, because it explains how something so widely used can also be so widely criticized — and what that means for anyone who has been told they are an INTJ or ENFP.
This post sticks close to the published research. Every claim is cited.
What "scientifically valid" actually means
A test in psychology is considered scientifically valid if it meets three rough standards:
- Reliability — you get roughly the same result when you retake the test.
- Predictive validity — the score predicts things outside the test (job performance, life satisfaction, health).
- Independent replication — the result holds up when researchers who did not build the test try to test it.
The MBTI struggles with the first two and has an unusual problem with the third.
Reliability: the retest problem
When researchers give the same person the MBTI twice with five weeks between tests, somewhere between 39% and 76% of people get a different four-letter type 1. That is not a small error band. It is the test telling roughly half the people who take it that they are someone different than they were last month.
The math behind this is not mysterious. The MBTI uses sharp either/or cutoffs on traits that actually sit on a smooth continuum. If you score 51% Extraversion on Monday, you are an E. If you score 49% Extraversion three weeks later, you are an I — even though almost nothing about you has changed.
The Big Five model handles the same situation differently. It would report a continuous score (say, 52nd percentile on Extraversion) and call it stable. The change from 51% to 49% would not flip a label.
Predictive validity: does the type predict anything?
This is where the gap is largest.
A 1991 review by the National Academy of Sciences, asked to evaluate the MBTI for use in U.S. career counseling, concluded that there was "not sufficient, well-designed research to justify the use of the MBTI in career counselling programs" 2. That review is now decades old. The follow-up research has not changed the picture.
A 2003 meta-analysis on personality and job performance found that Big Five Conscientiousness was a robust predictor across nearly every job — but did not find the same kind of evidence for MBTI types 3. A 2023 review of MBTI's predictive validity for job performance reached similar conclusions 4.
The pattern is not that the MBTI is "wrong." It is that the four-letter types do not carry enough information to predict outcomes the way the underlying continuous traits do.
The independence problem
Most of the research that finds the MBTI useful has been published by the Center for Applications of Psychological Type (CAPT) — an organization run by the Myers-Briggs Foundation — and printed in its in-house journal 5. That is not unusual for a commercial test. It also means the evidence base looks less independent than it would if outside academics were doing most of the work.
When researchers outside that ecosystem do replicate the analyses, they typically report weaker effects than the in-house studies do.
Why the MBTI still feels accurate
Here is the part that does not get said enough: a test can be statistically weak and personally meaningful.
There are real reasons why your MBTI description may feel like it nails you, even if the test itself has reliability problems:
1. The Barnum effect. Vague, mostly-positive descriptions that could fit almost anyone tend to read as deeply personal. This is the same effect that makes horoscopes work 6.
2. The dichotomy still captures real signal. Even though the I/E line is too sharp, the underlying question — do you draw energy from people or from solitude? — is asking about a real thing. A blurry photo of a real thing is still a photo of a real thing.
3. The descriptions are written to flatter. Every MBTI type writeup includes a list of strengths. None of them say "you may be hard to be around." If the descriptions called out the bad days as clearly as the good ones, the test would feel less magical and more like a mirror.
What this means for your INTJ result
If the MBTI has helped you reflect, that is real. A test does not need to be scientifically rigorous to be a useful prompt.
But if you are making decisions based on the type — job choices, partner choices, hiring choices — the research is clear that the four-letter label is too coarse and too unstable to lean on. For those decisions, the underlying continuous traits the Big Five measures (and that the MBTI is, in a way, a low-resolution version of) carry more signal.
The good news is the most rigorous free Big Five assessment, the IPIP-NEO-120, takes 12 minutes and is in the public domain 7. It is the assessment we use at Defaults for that reason.
If you have ever wanted to know "but what is the version of MBTI that actually predicts things?" — that is the closest answer.
Try the Big Five (12 min, free) →
References
Footnotes
-
Pittenger, D. J. (1993). The utility of the Myers-Briggs Type Indicator. Review of Educational Research, 63(4), 467–488. https://doi.org/10.3102/00346543063004467 ↩
-
Druckman, D., & Bjork, R. A. (Eds.). (1991). In the Mind's Eye: Enhancing Human Performance. National Academy Press. ↩
-
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at the beginning of the new millennium: What do we know and where do we go next? International Journal of Selection and Assessment, 9(1‐2), 9–30. https://doi.org/10.1111/1468-2389.00160 ↩
-
Stein, R., & Swan, A. B. (2019). Evaluating the validity of Myers-Briggs Type Indicator theory: A teaching tool and window into intuitive psychology. Social and Personality Psychology Compass, 13(2), e12434. https://doi.org/10.1111/spc3.12434 ↩
-
McCrae, R. R., & Costa, P. T. (1989). Reinterpreting the Myers-Briggs Type Indicator from the perspective of the five-factor model of personality. Journal of Personality, 57(1), 17–40. https://doi.org/10.1111/j.1467-6494.1989.tb00759.x ↩
-
Forer, B. R. (1949). The fallacy of personal validation. Journal of Abnormal and Social Psychology, 44(1), 118–123. https://doi.org/10.1037/h0059240 ↩
-
Johnson, J. A. (2014). Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory. Journal of Research in Personality, 51, 78–89. https://doi.org/10.1016/j.jrp.2014.05.003 ↩