llm by bayjarvis
README source code
This repository contains various LLM training implementations using different optimization approaches.
- MLX-GRPO: Group Relative Policy Optimization - Complete GRPO implementation for Apple Silicon using MLX framework with Qwen3-0.6B model support. Uses group comparisons for alignment without requiring human feedback data.
- Harnessing Zephyr's Breeze: DPO Training on Mistral-7B-GPTQ - Direct Preference Optimization for language model alignment using human preference datasets on quantized models.
- Fine-tuning Zephyr 7B GPTQ with 4-Bit Quantization - Custom data fine-tuning with 4-bit quantization for efficient inference and deployment.
- Mixture of Experts (MoE) in PyTorch - A from-scratch implementation of a sparse Mixture of Experts layer in PyTorch, demonstrating a key technique for building large, efficient language models.