Open computational mathematics. AI-audited, not peer-reviewed. All code and data open for independent verification.

Interactive Mathematics

Live visualizations of our computational results. Text layout powered by Pretext — measuring and flowing text without DOM reflows.

Distributed Mathematics Research

Every Google Colab instance comes with a free NVIDIA T4 GPU. When you open one of these notebooks, your GPU auto-detects, the CUDA kernels compile for your hardware, and you're running experiments on open mathematical conjectures within a minute.

This enables collaborative computation across multiple users. Each Colab session that runs an experiment expands the range of data checked — a new digit set, a new density, a new character table. The results are downloadable and submittable as a PR. No specialized degree required. No funding required. Just a browser and some curiosity.

MCP Explorer + GPU Compute

Query 23 tools (arXiv, zbMATH, OEIS, LMFDB, Lean/Mathlib), explore datasets, then compile and run CUDA experiments on your free GPU. No API key needed for exploration; GPU compute is one click.

Open in Colab Open MCP Explorer

Research Agent + GPU Compute

The full autonomous loop: compile CUDA kernels, run experiments, harvest results, AI peer review, fix issues. Auto-detects your GPU (T4/A100/L4/4090/H100). Free Gemini API key for reviews, or use OpenAI/Anthropic.

Open in Colab Open Research Agent

The GPU Zoo

These CUDA kernels run on GPUs from a free Colab T4 to multi-GPU setups. Tap any card to expand.

Free

Tesla T4

16 GBVRAM
300 GB/sBandwidth
8.1FP32 TFLOPS
70WPower

Turing architecture (2018) · 12nm · Colab free tier

Can verify Zaremba to 10^10 (1.25 GB bitset). Kronecker S₂₀ in 4 sec. Quick wins in seconds.

Cannot: Zaremba 10^11+ (12.5 GB bitset), full S₄₀ Kronecker triple-sum.

Budget

RTX 4060

8 GBVRAM
272 GB/sBandwidth
15.1FP32 TFLOPS
115WPower

Ada Lovelace architecture (2023) · 5nm · ~$300 street price

Quick wins instantly. Zaremba to 10^9 (0.12 GB). Kronecker S₂₀. Good starter GPU for math.

Cannot: Zaremba 10^10+ (1.25 GB barely fits). Limited by 8 GB VRAM.

Enthusiast

RTX 4090

24 GBVRAM
1.0 TB/sBandwidth
82.6FP32 TFLOPS
450WPower

Ada Lovelace architecture (2022) · 5nm · The gamer's beast (~$1,600)

Zaremba to 10^11 (12.5 GB fits in 24 GB). Kronecker S₃₀. Serious compute for the price.

Cannot: multi-GPU NVLink. No HBM (GDDR6X latency). Single card only.

Pro

A100

40–80 GBVRAM
2.0 TB/sBandwidth
19.5FP32 TFLOPS
400WPower

Ampere architecture (2020) · 7nm · Colab Pro or cloud

Zaremba to 10^11 easily. Kronecker S₃₀ in minutes. Class numbers at serious scale.

Cannot: 8-GPU NVLink workloads (single card).

Consumer

RTX 5090

32 GBVRAM
1.8 TB/sBandwidth
105FP32 TFLOPS
575WPower

Blackwell architecture (2025) · 4nm · Our local dev machine

Hausdorff spectrum (all 2²⁰ subsets, 72 min). Faster than H100 on FP32. Zaremba to 10^10 in seconds.

Cannot: multi-GPU NVLink. Limited to 32 GB (no S₄₀ full triple-sum).

Datacenter

H100 SXM

80 GBVRAM
3.4 TB/sBandwidth
67FP32 TFLOPS
700WPower

Hopper architecture (2022) · 4nm · Cloud providers

80 GB VRAM handles most large problems. NVLink for multi-GPU. The workhorse of AI datacenters.

Cannot: match B200 bandwidth (3.4 vs 8 TB/s). No FP4 tensor cores.

Our Cluster

B200 SXM

192 GBVRAM
8 TB/sBandwidth
75FP32 TFLOPS
1000WPower

Blackwell architecture (2024) · 4nm · Our 8×B200 DGX cluster (1.43 TB total)

Everything. Zaremba 10^12 (122 GB bitset). Ramsey 4.4T extensions in 130 sec. Kronecker S₃₀ in 7 min. 8 GPUs linked via NVLink 5.

Costs: ~$2M for the DGX system. That's what the merch funds.

Relative VRAM
T4
A100
5090
H100
8×B200 = 1,536 GB
Kernels compile automatically for your hardware via nvidia-smi --query-gpu=compute_cap

How This Site Was Built

A human picked the problems. An AI wrote the CUDA kernels. The human decided which results mattered. The AI managed the data pipeline. Neither could have done this alone.

That's not a grand statement about the future of mathematics — it's just what happened here. One person with a GPU cluster and an AI assistant produced computational results across a dozen open problems in number theory and combinatorics. Some of those computations would have taken months of manual coding. With AI collaboration, they took hours.

0Zaremba exceptions
with A={1,2,3}
0Kronecker coefficients
for S₃₀
0class numbers
computed
0Hausdorff dimensions
all subsets of {1,...,20}

These are computational results, not proofs. None of this work has been peer-reviewed. The code is open, the data is open, and the reproduction commands are on every page — so anyone can check our work. That's the point. We're not claiming authority. We're publishing the computations and inviting scrutiny.

The AI didn't have mathematical insight. It had speed. It could write a CUDA kernel in minutes, debug it against test cases, optimize memory access patterns, and manage uploads to Hugging Face — all tasks that are tedious for a human but trivial for an AI. The human decided what to compute and why it mattered. The AI decided how to compute it efficiently.

The bottleneck in computational mathematics has always been engineering, not ideas.

There are thousands of open conjectures where the next step is "compute this for larger values and see what happens." Most of those computations never get done because writing the GPU kernel, setting up the data pipeline, and publishing the results is more work than most researchers want to do for a speculative computation. AI collaboration removes that friction. The ideas were always there. Now they can be tested.

This site is an experiment in that process. Some of the results will turn out to be interesting. Some won't survive peer review. All of the data is here for you to use.

— Cahlen Humphreys, 2026. Built with Claude.

Zaremba Witness Distribution

Drag the slider to explore how the witness ratio α(d)/d concentrates around 0.171 as d grows. Each bubble is a value of d, sized by its CF length, colored by max partial quotient. Finding → Experiment →

Spectral Gap Landscape

The congruence spectral gaps σm for all 1,214 square-free moduli up to m=1999. Hover over any point to see the modulus, factorization, and gap value. The red dashed line marks the Bourgain-Kontorovich threshold. Finding → Experiment →

CF Tree Explorer

The continued fraction tree with partial quotients in {1, …, 5}. Each path from root to leaf gives a Zaremba denominator. Watch the tree grow and see how quickly it covers the integers. Finding →

Zaremba Density Phase Transition

For digit set A, the Zaremba density measures what fraction of integers d have a coprime a/d with all CF partial quotients in A. A sharp phase transition occurs between A={1,2} (58%) and A={1,2,3} (99.997%). The transition is controlled by the Hausdorff dimension crossing 1/2. Finding →

Class Number Distribution

Distribution of class numbers h(d) for 2.74 billion real quadratic fields Q(√d) with d ∈ [109, 1010). Dominated by powers of 2 (genus theory). The Cohen-Lenstra prediction of 75% with h=1 is far from realized at this scale — convergence is non-monotone. Finding → Experiment →

Hausdorff Dimension Spectrum

Hausdorff dimension dimH(EA) for every subset A ⊆ {1,…,10}, grouped by cardinality |A|. Each dot is one of the 1,023 nonempty subsets. Adding digit 1 has the largest effect on dimension — 5 digits including 1 beat 9 digits without it. Finding → Experiment →

CUDA Kernels: A Rosetta Stone

The same computation explained three ways. The mathematician and computer scientist think they speak different languages — but they're describing the same structure. The third column shows what actually happened: a human directing an AI agent. That's how the kernel got built. Not by hand-coding CUDA. By conversation.

The Problem: Is this number "Zaremba-good"?

Given an integer d, does there exist a coprime a/d whose continued fraction has all partial quotients at most 5?

For the Mathematician

The object

We study the semigroup generated by matrices M(a) = ((a,1),(1,0)) for a = 1,...,5 acting on pairs (p,q). Each product gives a convergent p/q of a bounded CF.

The recursion

The convergent recurrence: p_new = a*p + p_prev, q_new = a*q + q_prev. This is just matrix multiplication in SL(2,Z). Each step extends the CF by one partial quotient.

The search

Enumerate the tree of all finite products. At each node, the denominator q is a "covered" integer. If every integer up to N appears as some q, Zaremba's conjecture holds to N.

The insight

The tree grows exponentially (branching factor 5), but most branches produce large q quickly. The density of covered integers depends on the Hausdorff dimension of the underlying Cantor set E_5 — specifically, on whether 2*delta > 1.

What AI does

Translates this tree enumeration into a GPU kernel where each thread explores an independent subtree. The mathematician says "enumerate the semigroup." The AI writes the parallel DFS.

For the Computer Scientist

The data structure

A 5-ary tree where each node stores a state (p_prev, p, q_prev, q). Children are generated by multiplying the state by 5 different small integers. We need to mark which values of q appear anywhere in the tree.

The recurrence

State transition: (pp, p, qp, q) + digit a gives (p, a*p+pp, q, a*q+qp). Four integer multiplies and two adds per step. The tree terminates when q exceeds a threshold N.

The parallelism

Enumerate the tree to depth k on CPU, producing ~5^k independent subtrees. Each GPU thread takes one subtree and does iterative DFS, marking q values in a shared bitset via atomicOr.

The optimization

The tree is unbalanced — branches with small digits (a=1) go deepest. Prefix depth k should be tuned so each GPU thread gets roughly equal work. Memory: one bit per integer up to N. For N=10^9, that's 125 MB — fits in any GPU.

What AI does

Understands the mathematical structure well enough to choose the right parallelization strategy. The CS person says "parallelize the tree search." The AI knows that the branching factor and Hausdorff dimension determine the optimal prefix depth.

What Actually Happened

These are real quotes from the session that built this kernel. The human caught a validation bug (76 vs 27) and pushed back on single-GPU usage. The agent debugged and fixed. This is how the code on this site was actually written — not by hand, but through conversation.

Human

The CPU density code has been running 12 hours. A CUDA kernel would finish in minutes.

Agent

The CF tree enumeration is embarrassingly parallel. Each branch is independent. I'll generate prefixes on CPU to depth 8, giving ~6500 GPU threads, then each thread does DFS on its subtree.

Human

Why are we only using 1 GPU? We have 8.

Agent

Each digit set at each range is independent. Let me launch all 8 — different digit sets in parallel.

Human

Validation says 76 uncovered instead of 27. Something's wrong.

Agent

The prefix depth skips CFs shorter than 8. Depths 4-7 aren't being marked. I need the CPU to handle all depths below PREFIX_DEPTH, then GPU handles the rest.

Human

27 exceptions confirmed. Same 27 as at 10^6. Zero new ones to 10^10.

The point

The human never wrote a line of CUDA. The human never debugged a thread index. The human said what the computation should do and whether the answer was right. The agent handled the engineering. That's how a mathematician who understands the problem but doesn't want to spend weeks on GPU plumbing gets a result in an afternoon.

Notice: the mathematician's "semigroup enumeration" and the computer scientist's "parallel tree search" are the same algorithm. The "Hausdorff dimension" and the "tree balance factor" are the same number. The "convergent recurrence" and the "state transition" are the same four lines of code.

The fields use different words for identical structures. AI doesn't care about the vocabulary — it sees the computation underneath. That's why it can translate between them, and why a mathematician with an AI assistant can write GPU kernels, and a systems programmer with an AI assistant can explore number theory.

The Actual Kernel

Here's the CUDA code. 30 lines. Both audiences can now read it.

__global__ void enumerate_subtrees(
    uint64 *prefixes, int num_prefixes,
    int *digits, int num_digits,
    uint8_t *bitset, uint64 max_d)
{
    int tid = blockIdx.x * blockDim.x + threadIdx.x;
    if (tid >= num_prefixes) return;

    // Load this thread's starting state (from CPU-generated prefix)
    uint64 pp = prefixes[tid*4], p = prefixes[tid*4+1];
    uint64 qp = prefixes[tid*4+2], q = prefixes[tid*4+3];

    // Mark the denominator
    atomicOr(&bitset[q >> 3], 1 << (q & 7));

    // DFS: iterative stack, each thread independent
    struct { uint64 pp,p,qp,q; } stack[200];
    int sp = 0;

    for (int i = num_digits-1; i >= 0; i--) {
        uint64 qn = digits[i]*q + qp;
        if (qn > max_d) continue;
        stack[sp++] = {p, digits[i]*p+pp, q, qn};
    }

    while (sp > 0) {
        auto s = stack[--sp];
        atomicOr(&bitset[s.q >> 3], 1 << (s.q & 7));
        for (int i = num_digits-1; i >= 0; i--) {
            uint64 qn = digits[i]*s.q + s.qp;
            if (qn > max_d) continue;
            if (sp < 199)
                stack[sp++] = {s.p, digits[i]*s.p+s.pp, s.q, qn};
        }
    }
}

Recent Updates

updateGPU Zoo: cards now expandable (tap to see specs + what it can compute)
updateGPU Zoo: interactive comparison with verified specs from NVIDIA
updateUpdate README: current architecture, key pages, machine discoverability
updateAdd LICENSE: CC BY 4.0 (attribution required)
updateImprove AI crawlability: semantic HTML + contact info
reviewRegenerate meta.json + certifications.json (now auto-generated)
updateAdd /meta.json: machine-readable index for AI crawlers
findingAdd /cite/ page: ready-to-copy citations for every finding
updateAdd IndexNow key verification file
findingAdd structured data for machine discoverability on every finding page