How Random am I?

Human vs. Computer Random Number Generation

I've always thought I could be a good random number generator. Let's compete against an actual random number generator and see how random we really are.

To make patterns easier to detect, I chose numbers from 1–5 instead of 1–100. With fewer possible values, statistical tests should reveal any bias much more quickly. I generated 500 numbers using Python's pseudorandom number generator and then manually entered 500 numbers myself.

Computer-generated sequence

import random

computer_list = [[1, 4, 3, 1, 4, 5, 2, 1, 4, 1, 2, 3, 5, 4, 1, 5, 4, 2, 3, 5, 5, 5, 1, 4, 2, 4, 1, 4, 1, 5, 1, 2, 3, 3, 4, 2, 5, 5, 3, 5, 1, 4, 3, 2, 5, 5, 5, 3, 1, 1, 5, 4, 3, 4, 1, 1, 1, 2, 1, 3, 4, 3, 3, 4, 2, 2, 2, 5, 5, 2, 4, 1, 1, 1, 1, 5, 2, 2, 3, 4, 4, 3, 5, 5, 2, 4, 2, 4, 5, 5, 5, 5, 3, 5, 3, 1, 5, 3, 1, 3, 4, 4, 1, 4, 4, 2, 2, 4, 1, 3, 4, 2, 5, 2, 4, 5, 1, 3, 2, 1, 1, 3, 4, 4, 5, 5, 2, 5, 5, 5, 5, 3, 3, 4, 2, 2, 4, 5, 5, 3, 3, 4, 3, 2, 5, 4, 2, 4, 4, 2, 1, 4, 2, 3, 2, 4, 2, 5, 5, 1, 5, 4, 1, 2, 2, 1, 2, 5, 3, 2, 4, 2, 4, 5, 1, 5, 5, 3, 4, 1, 2, 3, 1, 4, 5, 5, 3, 5, 1, 2, 3, 5, 3, 1, 1, 5, 2, 4, 4, 3, 3, 3, 2, 3, 5, 3, 5, 5, 3, 5, 1, 3, 5, 2, 1, 2, 1, 2, 4, 5, 1, 5, 1, 4, 5, 2, 2, 3, 5, 2, 1, 4, 4, 4, 5, 5, 2, 1, 1, 4, 5, 4, 1, 1, 2, 4, 4, 5, 3, 3, 3, 1, 4, 2, 3, 4, 5, 4, 3, 1, 2, 5, 4, 3, 3, 2, 3, 5, 5, 4, 1, 4, 2, 2, 2, 4, 2, 3, 3, 1, 1, 4, 1, 3, 5, 2, 3, 3, 3, 1, 1, 3, 3, 3, 1, 5, 5, 1, 4, 1, 2, 2, 3, 1, 1, 4, 2, 4, 2, 2, 1, 4, 3, 5, 2, 4, 3, 5, 2, 1, 1, 3, 3, 2, 1, 5, 3, 4, 4, 5, 4, 1, 4, 4, 3, 4, 3, 2, 4, 5, 2, 3, 3, 3, 1, 4, 2, 3, 3, 3, 3, 3, 5, 5, 4, 4, 1, 5, 4, 5, 5, 3, 1, 5, 4, 2, 3, 5, 5, 3, 2, 4, 2, 1, 4, 1, 4, 5, 3, 4, 5, 1, 2, 2, 3, 4, 4, 2, 3, 2, 5, 5, 5, 4, 3, 4, 1, 3, 3, 5, 4, 5, 4, 4, 1, 4, 1, 1, 3, 2, 1, 3, 1, 2, 4, 5, 2, 2, 1, 5, 1, 3, 3, 2, 5, 1, 4, 2, 4, 4, 3, 3, 5, 2, 4, 4, 3, 1, 2, 5, 4, 2, 5, 4, 3, 4, 3, 1, 3, 1, 3, 4, 2, 1, 5, 2, 1, 4, 2, 3, 1, 1, 5, 5, 5, 1, 1, 4, 4, 5, 2, 3, 4, 4, 5, 4, 5, 1, 2, 1, 1, 2, 4, 5, 3, 4, 4, 4, 3, 3, 5, 3, 2, 3, 1, 4, 1, 2, 5, 2]]

Human-generated sequence

human_list = [[1, 2, 3, 5, 2, 3, 1, 3, 2, 2, 3, 5, 3, 4, 3, 4, 1, 3, 2, 4, 2, 3, 3, 4, 4, 3, 5, 3, 4, 3, 3, 2, 1, 3, 2, 4, 3, 3, 4, 5, 3, 4, 2, 1, 3, 2, 3, 2, 1, 3, 4, 2, 3, 2, 3, 2, 3, 2, 4, 2, 4, 3, 4, 2, 4, 4, 3, 4, 5, 3, 3, 4, 3, 2, 1, 1, 1, 2, 1, 3, 3, 2, 4, 4, 3, 3, 2, 5, 3, 3, 4, 4, 4, 2, 3, 2, 3, 2, 1, 2, 3, 3, 1, 1, 1, 3, 2, 3, 2, 4, 3, 3, 4, 5, 3, 5, 3, 4, 3, 4, 2, 3, 2, 2, 3, 1, 4, 2, 4, 3, 5, 3, 5, 3, 4, 3, 4, 3, 4, 3, 4, 2, 3, 1, 2, 2, 1, 1, 3, 3, 2, 5, 4, 3, 5, 5, 3, 5, 3, 2, 2, 3, 4, 4, 5, 5, 3, 2, 1, 2, 4, 2, 2, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 4, 3, 4, 4, 4, 3, 4, 3, 4, 3, 4, 3, 3, 4, 3, 4, 2, 3, 2, 3, 1, 3, 2, 4, 3, 5, 3, 5, 3, 5, 3, 4, 4, 2, 3, 2, 3, 2, 4, 3, 4, 3, 4, 3, 4, 4, 3, 4, 3, 4, 1, 2, 2, 1, 2, 1, 3, 4, 4, 4, 5, 3, 4, 2, 3, 2, 3, 2, 5, 3, 5, 3, 4, 4, 3, 4, 3, 2, 3, 1, 1, 2, 1, 3, 2, 4, 3, 3, 4, 3, 4, 3, 4, 3, 5, 3, 5, 3, 5, 4, 5, 4, 5, 2, 2, 1, 3, 1, 2, 3, 3, 3, 4, 4, 5, 5, 2, 3, 4, 3, 2, 1, 2, 3, 4, 5, 2, 2, 2, 4, 4, 5, 4, 3, 2, 4, 2, 5, 3, 5, 3, 2, 2, 3, 2, 1, 1, 2, 3, 4, 5, 5, 4, 3, 2, 1, 1, 3, 3, 4, 2, 4, 2, 1, 2, 2, 4, 3, 4, 3, 5, 3, 4, 2, 4, 2, 4, 1, 3, 2, 4, 3, 5, 2, 4, 2, 4, 2, 5, 2, 5, 3, 5, 3, 3, 4, 3, 5, 3, 5, 3, 5, 3, 4, 5, 3, 3, 4, 3, 4, 3, 4, 3, 4, 3, 3, 4, 4, 5, 5, 5, 5, 5, 3, 3, 2, 4, 4, 3, 5, 5, 3, 2, 3, 3, 3, 3, 3, 3, 3, 2, 3, 5, 2, 2, 2, 2, 3, 3, 3, 4, 5, 4, 3, 4, 3, 2, 3, 2, 2, 2, 1, 2, 2, 4, 4, 3, 4, 3, 5, 3, 3, 4, 3, 4, 3, 4, 3, 5, 3, 5, 4, 3, 2, 3, 2, 1, 2, 2, 1, 2, 1, 3, 2, 4, 3, 5, 3, 5, 3, 5, 4, 5, 4, 5, 4, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4]]

Entering 500 numbers manually was surprisingly exhausting. Toward the end I stopped trying to be clever and just started typing whatever came to mind.


Shannon Entropy

The first metric I tested was Shannon entropy, which measures the uncertainty of a probability distribution.

For a discrete random variable,

H(X)=i=1npilog2(pi)H(X)=-\sum_{i=1}^{n}p_i\log_2(p_i)

where

pi=count of value iN.p_i=\frac{\text{count of value }i}{N}.

For five equally likely outcomes, the theoretical maximum entropy is

log2(5)2.3219 bits.\log_2(5)\approx2.3219\text{ bits}.

Using the entropy function below,

from collections import Counter
import math

def calculate_shannon_entropy(data):
    counts = Counter(data)
    total = len(data)

    entropy = 0
    for count in counts.values():
        p = count / total
        entropy -= p * math.log2(p)

    return entropy

I obtained

Computer List Entropy: 2.3186
Human List Entropy:    2.1645

The computer sequence is essentially at the theoretical maximum, while my sequence is noticeably lower. This suggests that I subconsciously favored some values over others.


Frequency Test

Entropy alone only measures the overall distribution, so I next performed a chi-square goodness-of-fit test.

The test statistic is

χ2=i=15(OiEi)2Ei\chi^2=\sum_{i=1}^{5}\frac{(O_i-E_i)^2}{E_i}

where

Ei=N5.E_i=\frac{N}{5}.

The results were

--- Computer Frequency Test ---
Total elements: 500

Number 1: 18.80%
Number 2: 18.20%
Number 3: 20.20%
Number 4: 22.00%
Number 5: 20.80%

Chi-Square Statistic: 2.3400

--- Human Frequency Test ---
Total elements: 500

Number 1: 7.80%
Number 2: 20.40%
Number 3: 34.60%
Number 4: 24.00%
Number 5: 13.20%

Chi-Square Statistic: 106.1000

The bias is immediately obvious. I strongly favored 3, used 4 somewhat more often than expected, and rarely selected 1. One possible explanation is that pressing 1 requires my left pinky, making it slightly less natural to type repeatedly, although that may simply be post-hoc rationalization.

A chi-square statistic of

χ2=106.1\chi^2=106.1

with four degrees of freedom is extraordinarily unlikely under a truly uniform random process.


Transition Matrix (Lag-1 Structure)

Uniform frequencies are only one aspect of randomness. A sequence can have perfectly balanced frequencies while still containing predictable patterns.

To investigate this, I computed the lag-1 transition matrix, whose entries estimate

P(Xt+1=jXt=i).P(X_{t+1}=j\mid X_t=i).

For an ideal random generator, every row should be approximately

[15,15,15,15,15].\left[\frac15,\frac15,\frac15,\frac15,\frac15\right].

The transition matrix was computed using

def get_transition_matrix(data):
    matrix = np.zeros((5, 5))

    for i in range(len(data) - 1):
        matrix[data[i] - 1, data[i + 1] - 1] += 1

    row_sums = matrix.sum(axis=1)
    return matrix / row_sums[:, np.newaxis]

The resulting heat maps make the difference visually apparent: the computer-generated sequence is close to uniform, while the human sequence develops noticeable preferred transitions between values.

Comments