Open computational mathematics. AI-audited, not peer-reviewed. All code and data open for independent verification.
Interactive Mathematics
Live visualizations of our computational results. Text layout powered by Pretext — measuring and flowing text without DOM reflows.
Distributed Mathematics Research
Every Google Colab instance comes with a free NVIDIA T4 GPU. When you open one of these notebooks, your GPU auto-detects, the CUDA kernels compile for your hardware, and you're running experiments on open mathematical conjectures within a minute.
This enables collaborative computation across multiple users. Each Colab session that runs an experiment expands the range of data checked — a new digit set, a new density, a new character table. The results are downloadable and submittable as a PR. No specialized degree required. No funding required. Just a browser and some curiosity.
MCP Explorer + GPU Compute
Query 23 tools (arXiv, zbMATH, OEIS, LMFDB, Lean/Mathlib), explore datasets, then compile and run CUDA experiments on your free GPU. No API key needed for exploration; GPU compute is one click.
Research Agent + GPU Compute
The full autonomous loop: compile CUDA kernels, run experiments, harvest results, AI peer review, fix issues. Auto-detects your GPU (T4/A100/L4/4090/H100). Free Gemini API key for reviews, or use OpenAI/Anthropic.
The GPU Zoo
These CUDA kernels run on GPUs from a free Colab T4 to multi-GPU setups. Tap any card to expand.
Free Tesla T4
Turing architecture (2018) · 12nm · Colab free tier
Can verify Zaremba to 10^10 (1.25 GB bitset). Kronecker S₂₀ in 4 sec. Quick wins in seconds.
Cannot: Zaremba 10^11+ (12.5 GB bitset), full S₄₀ Kronecker triple-sum.
Budget RTX 4060
Ada Lovelace architecture (2023) · 5nm · ~$300 street price
Quick wins instantly. Zaremba to 10^9 (0.12 GB). Kronecker S₂₀. Good starter GPU for math.
Cannot: Zaremba 10^10+ (1.25 GB barely fits). Limited by 8 GB VRAM.
Enthusiast RTX 4090
Ada Lovelace architecture (2022) · 5nm · The gamer's beast (~$1,600)
Zaremba to 10^11 (12.5 GB fits in 24 GB). Kronecker S₃₀. Serious compute for the price.
Cannot: multi-GPU NVLink. No HBM (GDDR6X latency). Single card only.
Pro A100
Ampere architecture (2020) · 7nm · Colab Pro or cloud
Zaremba to 10^11 easily. Kronecker S₃₀ in minutes. Class numbers at serious scale.
Cannot: 8-GPU NVLink workloads (single card).
Consumer RTX 5090
Blackwell architecture (2025) · 4nm · Our local dev machine
Hausdorff spectrum (all 2²⁰ subsets, 72 min). Faster than H100 on FP32. Zaremba to 10^10 in seconds.
Cannot: multi-GPU NVLink. Limited to 32 GB (no S₄₀ full triple-sum).
Datacenter H100 SXM
Hopper architecture (2022) · 4nm · Cloud providers
80 GB VRAM handles most large problems. NVLink for multi-GPU. The workhorse of AI datacenters.
Cannot: match B200 bandwidth (3.4 vs 8 TB/s). No FP4 tensor cores.
Our Cluster B200 SXM
Blackwell architecture (2024) · 4nm · Our 8×B200 DGX cluster (1.43 TB total)
Everything. Zaremba 10^12 (122 GB bitset). Ramsey 4.4T extensions in 130 sec. Kronecker S₃₀ in 7 min. 8 GPUs linked via NVLink 5.
Costs: ~$2M for the DGX system. That's what the merch funds.
nvidia-smi --query-gpu=compute_cap How This Site Was Built
A human picked the problems. An AI wrote the CUDA kernels. The human decided which results mattered. The AI managed the data pipeline. Neither could have done this alone.
That's not a grand statement about the future of mathematics — it's just what happened here. One person with a GPU cluster and an AI assistant produced computational results across a dozen open problems in number theory and combinatorics. Some of those computations would have taken months of manual coding. With AI collaboration, they took hours.
with A={1,2,3}
for S₃₀
computed
all subsets of {1,...,20}
These are computational results, not proofs. None of this work has been peer-reviewed. The code is open, the data is open, and the reproduction commands are on every page — so anyone can check our work. That's the point. We're not claiming authority. We're publishing the computations and inviting scrutiny.
The AI didn't have mathematical insight. It had speed. It could write a CUDA kernel in minutes, debug it against test cases, optimize memory access patterns, and manage uploads to Hugging Face — all tasks that are tedious for a human but trivial for an AI. The human decided what to compute and why it mattered. The AI decided how to compute it efficiently.
There are thousands of open conjectures where the next step is "compute this for larger values and see what happens." Most of those computations never get done because writing the GPU kernel, setting up the data pipeline, and publishing the results is more work than most researchers want to do for a speculative computation. AI collaboration removes that friction. The ideas were always there. Now they can be tested.
This site is an experiment in that process. Some of the results will turn out to be interesting. Some won't survive peer review. All of the data is here for you to use.
— Cahlen Humphreys, 2026. Built with Claude.
Zaremba Witness Distribution
Drag the slider to explore how the witness ratio α(d)/d concentrates around 0.171 as d grows. Each bubble is a value of d, sized by its CF length, colored by max partial quotient. Finding → Experiment →
Spectral Gap Landscape
The congruence spectral gaps σm for all 1,214 square-free moduli up to m=1999. Hover over any point to see the modulus, factorization, and gap value. The red dashed line marks the Bourgain-Kontorovich threshold. Finding → Experiment →
CF Tree Explorer
The continued fraction tree with partial quotients in {1, …, 5}. Each path from root to leaf gives a Zaremba denominator. Watch the tree grow and see how quickly it covers the integers. Finding →
Zaremba Density Phase Transition
For digit set A, the Zaremba density measures what fraction of integers d have a coprime a/d with all CF partial quotients in A. A sharp phase transition occurs between A={1,2} (58%) and A={1,2,3} (99.997%). The transition is controlled by the Hausdorff dimension crossing 1/2. Finding →
Class Number Distribution
Distribution of class numbers h(d) for 2.74 billion real quadratic fields Q(√d) with d ∈ [109, 1010). Dominated by powers of 2 (genus theory). The Cohen-Lenstra prediction of 75% with h=1 is far from realized at this scale — convergence is non-monotone. Finding → Experiment →
Hausdorff Dimension Spectrum
Hausdorff dimension dimH(EA) for every subset A ⊆ {1,…,10}, grouped by cardinality |A|. Each dot is one of the 1,023 nonempty subsets. Adding digit 1 has the largest effect on dimension — 5 digits including 1 beat 9 digits without it. Finding → Experiment →
CUDA Kernels: A Rosetta Stone
The same computation explained three ways. The mathematician and computer scientist think they speak different languages — but they're describing the same structure. The third column shows what actually happened: a human directing an AI agent. That's how the kernel got built. Not by hand-coding CUDA. By conversation.
The Problem: Is this number "Zaremba-good"?
Given an integer d, does there exist a coprime a/d whose continued fraction has all partial quotients at most 5?
For the Mathematician
We study the semigroup generated by matrices M(a) = ((a,1),(1,0)) for a = 1,...,5 acting on pairs (p,q). Each product gives a convergent p/q of a bounded CF.
The convergent recurrence: p_new = a*p + p_prev, q_new = a*q + q_prev. This is just matrix multiplication in SL(2,Z). Each step extends the CF by one partial quotient.
Enumerate the tree of all finite products. At each node, the denominator q is a "covered" integer. If every integer up to N appears as some q, Zaremba's conjecture holds to N.
The tree grows exponentially (branching factor 5), but most branches produce large q quickly. The density of covered integers depends on the Hausdorff dimension of the underlying Cantor set E_5 — specifically, on whether 2*delta > 1.
Translates this tree enumeration into a GPU kernel where each thread explores an independent subtree. The mathematician says "enumerate the semigroup." The AI writes the parallel DFS.
For the Computer Scientist
A 5-ary tree where each node stores a state (p_prev, p, q_prev, q). Children are generated by multiplying the state by 5 different small integers. We need to mark which values of q appear anywhere in the tree.
State transition: (pp, p, qp, q) + digit a gives (p, a*p+pp, q, a*q+qp). Four integer multiplies and two adds per step. The tree terminates when q exceeds a threshold N.
Enumerate the tree to depth k on CPU, producing ~5^k independent subtrees. Each GPU thread takes one subtree and does iterative DFS, marking q values in a shared bitset via atomicOr.
The tree is unbalanced — branches with small digits (a=1) go deepest. Prefix depth k should be tuned so each GPU thread gets roughly equal work. Memory: one bit per integer up to N. For N=10^9, that's 125 MB — fits in any GPU.
Understands the mathematical structure well enough to choose the right parallelization strategy. The CS person says "parallelize the tree search." The AI knows that the branching factor and Hausdorff dimension determine the optimal prefix depth.
What Actually Happened
These are real quotes from the session that built this kernel. The human caught a validation bug (76 vs 27) and pushed back on single-GPU usage. The agent debugged and fixed. This is how the code on this site was actually written — not by hand, but through conversation.
The CPU density code has been running 12 hours. A CUDA kernel would finish in minutes.
The CF tree enumeration is embarrassingly parallel. Each branch is independent. I'll generate prefixes on CPU to depth 8, giving ~6500 GPU threads, then each thread does DFS on its subtree.
Why are we only using 1 GPU? We have 8.
Each digit set at each range is independent. Let me launch all 8 — different digit sets in parallel.
Validation says 76 uncovered instead of 27. Something's wrong.
The prefix depth skips CFs shorter than 8. Depths 4-7 aren't being marked. I need the CPU to handle all depths below PREFIX_DEPTH, then GPU handles the rest.
27 exceptions confirmed. Same 27 as at 10^6. Zero new ones to 10^10.
The human never wrote a line of CUDA. The human never debugged a thread index. The human said what the computation should do and whether the answer was right. The agent handled the engineering. That's how a mathematician who understands the problem but doesn't want to spend weeks on GPU plumbing gets a result in an afternoon.
Notice: the mathematician's "semigroup enumeration" and the computer scientist's "parallel tree search" are the same algorithm. The "Hausdorff dimension" and the "tree balance factor" are the same number. The "convergent recurrence" and the "state transition" are the same four lines of code.
The fields use different words for identical structures. AI doesn't care about the vocabulary — it sees the computation underneath. That's why it can translate between them, and why a mathematician with an AI assistant can write GPU kernels, and a systems programmer with an AI assistant can explore number theory.
The Actual Kernel
Here's the CUDA code. 30 lines. Both audiences can now read it.
__global__ void enumerate_subtrees(
uint64 *prefixes, int num_prefixes,
int *digits, int num_digits,
uint8_t *bitset, uint64 max_d)
{
int tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid >= num_prefixes) return;
// Load this thread's starting state (from CPU-generated prefix)
uint64 pp = prefixes[tid*4], p = prefixes[tid*4+1];
uint64 qp = prefixes[tid*4+2], q = prefixes[tid*4+3];
// Mark the denominator
atomicOr(&bitset[q >> 3], 1 << (q & 7));
// DFS: iterative stack, each thread independent
struct { uint64 pp,p,qp,q; } stack[200];
int sp = 0;
for (int i = num_digits-1; i >= 0; i--) {
uint64 qn = digits[i]*q + qp;
if (qn > max_d) continue;
stack[sp++] = {p, digits[i]*p+pp, q, qn};
}
while (sp > 0) {
auto s = stack[--sp];
atomicOr(&bitset[s.q >> 3], 1 << (s.q & 7));
for (int i = num_digits-1; i >= 0; i--) {
uint64 qn = digits[i]*s.q + s.qp;
if (qn > max_d) continue;
if (sp < 199)
stack[sp++] = {s.p, digits[i]*s.p+s.pp, s.q, qn};
}
}
}