A technical study of the distribution of the Mandela Effect

Statistics

The process for creating meaningful statistics for the Mandela Effect began with the data collection.

Firstly, a system was put in place to solicit a curated list of the incidents themselves, which were broadened to satisfy the MMDE definition of the Mass Memory Discrepancy Effect, of which the Mandela Effects are part of. The sources came from user submissions, authors' research and similar mechanisms. They were verified, allocated a reference code and documented in the form of a single published article for each reported instance.

Next, a yes/no question was associated with each applicable entry. This was then entered into the online test, making it a candidate to be asked once per test run. All the results were stored, with no user-identifiable data collected. Over 600,000 questions were asked.

Published

Once this had been running for 1 year, with 50,000+ tests processed, 12 questions per test, analytics were run on them and the results published.

Care was taken to ensure the sample sizes were meaningful. Of particular concern was the way the question pool constantly grew. If this wasn't taken into account, the results would have been skewed. For example, if a question was entered into the system today, and it only came up for one person in all todays runs, their answer would by definition be either 100% current and 0% alternate, or the other way round. When producing the ordered lists, this result would greatly distort the overall outcome. Therefore, the sample size of the number of total answers, as well as the question recency, was taken into account during the analysis. As the number of answers for any particular question rises, so does it's overall relevance.

This weighting algorithm must by design make compromises, with a balance being struck between each answer having too little, or too much, influence on the outcome depending on it's context. If only one person from a country, for example Taured, scored 100% on the test, should the report for the most affected counties have Taured at the top? Of course not. By the same token, there must come a point at which the growing number of people from Taured taking the test do start to impact the results, and that number must be based on the totals for the other countries.

Further work

The process is ongoing and live, meaning future tests continue to influence the results, with the analytics from them being updated daily.

New reports are under consideration, as is the study of the corpus for any emerging patterns using deep mining techniques, with the possibility of a form of prediction given the characteristics of any given MMDE.