computer-science

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Hacker News · Jun 28, 2026, 7:38 PM · Also reported by 1 other source

Key takeaways

A GPT-2-class language model built entirely from scratch in C/CUDA — no Py Torch, no autograd, no ML libraries.
This is a research/educational artifact, built in public.
x = x + f(x) Read it as a step of numerical integration.

A GPT-2-class language model built entirely from scratch in C/CUDA — no Py Torch, no autograd, no ML libraries. The forward and backward passes are written and verified by hand, and the whole training pipeline lives in this repo: a hand-written byte-level BPE tokenizer, pretraining on a books + web corpus, and supervised fine-tuning into a chat model (RLHF/DPO planned). It runs on CPU (libm + Open MP) for a small showcase model, and a full from-scratch CUDA engine — cu BLAS matmuls, a hand-written FlashAttention, validated against a CPU reference by a full-model gradient check — trains a ~116M-parameter model on a single RTX 4070.

Status & honesty. This is a research/educational artifact, built in public. At ~116M parameters trained on a single consumer GPU, it is a text generator in the spirit of GPT-2-small: fluent-ish English, no real world knowledge. It is not a capable assistant — the chat model demonstrates that the pretrain→SFT pipeline works end to end, it is not a useful chatbot. The point of the project is the from-scratch engineering and the complete, understandable training pipeline.

make check # verify the backward pass (gradient check, double precision) make # build the training binary ./nanoeuler train # train the small showcase model (~0.76M params) ./nanoeuler train big # train the larger model (~10M params; meant for a GPU) ./nanoeuler chat # REPL: type a prompt, the model continues it Why "Euler"? A residual block computes

Article preview — originally published by Hacker News. Full story at the source.

Read full story on Hacker News → More top stories

Also covered by

Al Jazeera Satellite images show scale of destruction in Venezuela earthquakes

Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Key takeaways

Also covered by

More in computer-science