Thoughts on AI/ML, research findings, and technical insights
Democratizing AI: Fine-tuning Large Language Models with Limited Resources
I was impressed by how fine-tuned large language models outperform retrieval-augmented systems, especially at inference time. So I set out to fine-tune an open-source model like Meta's Llama 3.1 (8B parameters). But most sources stated you needed giant, expensive GPUs and tons of storage resources that I simply didn't have...
Read Full ArticleA deep dive into porting transformer kernels from NVIDIA CUDA to AMD ROCm/HIP, achieving 12.1ms token generation with 79-83% memory bandwidth utilization on MI300X architecture.
Learn how to fine-tune large language models without expensive hardware, using LoRA and smart optimization techniques.
Discover additional articles, research insights, and technical deep-dives on my Substack
Visit My SubstackGet notified about new blog posts, research findings, and AI/ML insights