llm by bayjarvis

Star

README source code

LLM Training and Fine-tuning Projects

This repository contains various LLM training implementations using different optimization approaches.

Projects by Training Approach

Self-Supervised Alignment

  • MLX-GRPO: Group Relative Policy Optimization - Complete GRPO implementation for Apple Silicon using MLX framework with Qwen3-0.6B model support. Uses group comparisons for alignment without requiring human feedback data.

Human Feedback-Based Training

Supervised Fine-tuning

Architectural Implementations

  • Mixture of Experts (MoE) in PyTorch - A from-scratch implementation of a sparse Mixture of Experts layer in PyTorch, demonstrating a key technique for building large, efficient language models.