araon.space

I got myself an RTX 3080 and had no idea what I was getting into. I wanted to actually understand what CUDA could do - not just read about it, but get my hands dirty.

Not My First Rodeo

Back in 2019, I built Sadhu-Kamra - a Twitter bot that combined tweets from Sadhguru and Kunal Kamra, trained on GPT-2. But I was just using pre-trained models. I never actually trained anything from scratch. I didn't understand what was happening under the hood.

This time, I wanted to do it properly.

Memory Is Everything

Turns out, CUDA is pretty wild. The RTX 3080 has 10GB of VRAM, which sounds like a lot until you start loading models and data.

Key learnings:

Mixed precision training doubles your effective memory
Gradient accumulation lets you simulate larger batch sizes
You can't just throw everything at the GPU and hope it works

It's Actually Learning

I built a tiny GPT-style model with ~7M parameters. It's not going to write Shakespeare (well, it tries), but it's a real language model that I trained myself.

The training process is fascinating. Watching the loss decrease, seeing the model learn patterns, and then generating text that's... sometimes coherent, sometimes hilariously bad. But it's learning.

What I Took Away

Transformer architectures (attention is all you need, apparently)
Learning rate scheduling and gradient clipping
Tokenization (way more complex than I thought)
Mixed precision training

The RTX 3080 can handle this stuff pretty well. For learning and experimentation, it's perfect. Training that would take days on a CPU can be done in hours.

Getting Started

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Train a model
python train.py

# Test it out
python test.py --checkpoint checkpoints/model_final.pt --prompt "Hello"

# Or run the web interface
python chatbot_web.py

If you're curious about CUDA, GPU computing, or just want to train a tiny language model yourself, feel free to poke around the code.

Micro-LLM

Not My First Rodeo

Memory Is Everything

It's Actually Learning

What I Took Away

Getting Started

Tags

Contact