About & Experience - Ramshankar Bhuvaneswaran

Who I Am

I'm a passionate AI/ML Engineer and Researcher recently graduated from Northeastern University with a Master's in Information Systems. My journey in technology began with a deep fascination for artificial intelligence and its potential to solve complex real-world problems.

With a strong foundation in computer science and extensive experience in machine learning, I specialize in developing innovative AI solutions, optimizing ML pipelines, and building scalable data systems. My work spans across computer vision, natural language processing, and distributed computing, with a particular focus on large language models and diffusion models.

I believe in the power of open-source collaboration and actively contribute to the AI/ML community through research, technical writing, and knowledge sharing. I recently contributed to the LLVM project by refactoring the `ilogbf128` math function to a header-only implementation in libc ( LLVM libc open-source contribution ). When I'm not coding or researching, you'll find me exploring the latest AI trends, mentoring fellow students, or contributing to open-source projects.

Education

Northeastern University

Boston, MA

Master of Science in Information Systems

2023 - 2025

Specializing in AI/ML, Data Engineering, and Distributed Systems. Focus on cutting-edge technologies including large language models, computer vision, and scalable data processing platforms.

Master's Thesis:

ModelOpt: Research Framework for Zero-Shot Computer Vision Model Optimization With Tree Search and Federated Knowledge Sharing, Northeastern University, 2025.

Anna University

Chennai, India

Bachelor of Technology in Computer Science

2019 - 2023

Comprehensive foundation in computer science fundamentals, algorithms, data structures, and software engineering principles. Graduated with distinction and active involvement in technical projects.

Work Experience

AI Engineer

Community Dream Foundation Remote Apr 2026 – Present

• Designed and deployed LLM-powered backend services using FastAPI and LangChain, integrating RAG pipelines with grounded retrieval to surface context-aware, real-world information for end users across web and mobile surfaces.
• Built generative UX features backed by structured LLM prompting strategies (chain-of-thought, few-shot, tool-use), enabling proactive and location-aware AI responses.

Graduate Teaching and Research Assistant

Northeastern University Boston, MA Jan 2025 – Dec 2025

• Implemented knowledge distillation pipeline compressing Qwen image-to-image model into lightweight GAN architectures, achieving 65% parameter reduction while maintaining visual quality; integrated distillation with quantization-aware training for efficient deployment.
• Authored research paper BitSkip investigating compositional effects of quantization and early exit in LLM.
• Served as Teaching Assistant, mentoring students on distributed computing workflows including SLURM job scheduling, parallel processing architectures, and debugging assignments across CPU and GPU environments.

AI Engineer

BulkBeings Chennai, India May 2024 – Aug 2024

• Optimized deep learning frameworks for GPU performance by developing custom CUDA kernels and integrating optimizations into PyTorch training pipelines for Mixtral and Llama models, achieving 42% training acceleration across 8 A100 GPUs FSDP configuration through memory coalescing, warp-level primitives, and kernel fusion techniques.
• Designed and scaled ML cluster orchestration across cloud (GCP) and on-prem environments using Kubernetes; deployed Ray clusters for distributed training and online inference, managing GPU scheduling and resource allocation for multi-node training jobs.
• Engineered high-performance GPU kernels for attention mechanisms and feedforward layers using CUDA(CUTLASS) and OpenMP; debugged gradient explosion in multi-GPU distributed training by implementing mixed-precision strategies and gradient clipping, reducing OOM errors by 85% through systematic memory profiling.
• Built distributed data preprocessing pipeline with PySpark processing 200GB+ datasets, implementing sparse feature selection algorithms that reduced data transfer overhead by 35% while maintaining training convergence.

ML Engineer (Research)

BulkBeings Chennai, India May 2023 – Dec 2023

• Prototyped and productionalized OCR model (ViT+CRNN) achieving 98% precision while making the model 42% faster via ONNX/CUDA optimization on AWS EC2 instance with API gateway achieving sub-100ms p99 latency.
• Implemented observability stack with Prometheus, Grafana, and OpenTelemetry for ML inference services; built custom metrics dashboards tracking GPU utilization, model latency (P50/P95/P99), and throughput; configured alerting for SLA violations.
• Developed Two stage Conv1D-Transformer architecture for beat level ECG classification achieving (89% F1-score) on both ectopic and beats, applying quantization and kernel fusion to deploy on L4 GPU and meeting SLA constraints.
• Built automated retraining pipelines with Python scripts that extracted data from production databases using SQL and Airflow, reducing model refresh cycle from 2 weeks to 3 days and improving prediction accuracy by 12%.

ML Engineer Intern

Velozity Global Solutions Pvt Chennai, India Jan 2022 – Aug 2022

• Architected end-to-end ML pipelines in AWS, implementing predictive segments (high-value, at-risk) using XGBoost on behavioral patterns and validating clusters against campaign response data, resulting in a 45% increase in campaign conversion rates.
• Identified critical data leakage in feature pipeline causing 20% overestimation of model performance; redesigned temporal feature extraction logic ensuring proper train-test split, leading to more reliable production deployments.
• Led cross-functional team in developing retail mix optimization system for a supermarket chain in India using Bayesian hierarchical models (PyMC3), increasing recurring customer LTV prediction accuracy by 34%; shared optimization techniques with ML community.
• Built monitoring system detecting performance degradation in production models; implemented automated retraining pipeline triggered by drift detection, maintaining model accuracy above 90% threshold and preventing $50K potential revenue loss.

Technical Skills

Programming Languages

C++ Python CUDA HIP Triton C SQL PySpark R Shell Scripting

ML Frameworks & Libraries

PyTorch TensorFlow JAX Hugging Face TRL Unsloth vLLM Scikit-learn XGBoost

Cloud & Distributed

AWS GCP Docker Kubernetes HPC Multi-GPU SageMaker Vertex AI

Technical Achievements

GPU Kernel Optimization

Developed and optimized high-performance GPU kernels for inference/training workloads, demonstrating deep knowledge of memory hierarchy and compute/memory-bound optimization strategies.

Compiler Integration

Worked with graph compilers (CUDA, HIP) to optimize deep learning frameworks for various hardware architectures including AMD GPUs, ensuring streams integration.

Performance Crisis Resolution

Identified 70% redundant model calls through profiling; architected Redis caching system reducing monthly GPU costs by $12K and inference latency by 42%.