rlhf

Star

Here are 593 public repositories matching this topic...

hiyouga / LlamaFactory

Star

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Updated Jun 1, 2026
Python

LAION-AI / Open-Assistant

Star

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

python machine-learning ai nextjs discord-bot assistant language-model chatgpt rlhf

Updated Aug 17, 2024
Python

RUCAIBox / LLMSurvey

Star

The official GitHub page for the survey paper "A Survey of Large Language Models".

natural-language-processing pre-training pre-trained-language-models in-context-learning large-language-models llm llms chain-of-thought chatgpt rlhf instruction-tuning

Updated Mar 11, 2025
Python

InternLM / InternLM

Star

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Oct 30, 2025
Python

ymcui / Chinese-LLaMA-Alpaca-2

Star

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Apr 19, 2026
Python

huggingface / alignment-handbook

Star

Robust recipes to align language models with human and AI preferences

transformers llm rlhf

Updated May 26, 2026
Python

Gen-Verse / OpenClaw-RL

Star

OpenClaw-RL: Train any agent simply by talking

async gui-application coding slime tinker memory-systems skill-learning rlhf sglang grpo on-policy-distillation openclaw-skills open-claw

Updated May 23, 2026
Python

transformerlab / transformerlab-app

Sponsor

Star

The open source research environment for AI researchers to seamlessly train, evaluate, and scale models from local hardware to GPU clusters.

electron transformers llama lora diffusion mlx diffusion-models llms stability-diffusion rlhf

Updated Jun 1, 2026
Python

argilla-io / argilla

Star

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Jun 1, 2026
Python

Kiln-AI / Kiln

Star

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

Updated Jun 2, 2026
Python

PKU-Alignment / align-anything

Star

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

Updated Nov 27, 2025
Python

rasbt / reasoning-from-scratch

Star

Implement a reasoning LLM in PyTorch from scratch, step by step

python machine-learning reinforcement-learning ai deep-learning pytorch artificial-intelligence reasoning distillation large-language-models llm llms chain-of-thought rlhf test-time-compute reasoning-models grpo math-reasoning inference-time-scaling

Updated Jun 1, 2026
Jupyter Notebook

opendilab / awesome-RLHF

Star

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated May 20, 2026

hiyouga / ChatGLM-Efficient-Tuning

Star

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

transformers pytorch lora language-model alpaca fine-tuning peft huggingface chatgpt rlhf chatglm qlora chatglm2

Updated Oct 12, 2023
Python

Docta-ai / docta

Star

A Doctor for your data

data language-model data-curation data-centric-ai data-diagnosis data-centric-machine-learning rlhf

Updated Jan 14, 2025
Python

argilla-io / distilabel

Star

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Jun 1, 2026
Python

alibaba / ROLL

Star

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

rlhf agentic rlvr

Updated Jun 2, 2026
Python

walkinglabs / hands-on-modern-rl

Star

🚀 An open-source, hands-on curriculum bridging the gap from basic RL concepts to LLM alignment, RLVR, and advanced Agentic systems.

agent tutorial pytorch dpo reinforcemen llm rlhf agentic agentic-ai grpo llm-alignment agentic-rl

Updated Jun 1, 2026
Python

tatsu-lab / alpaca_eval

Star

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Aug 9, 2025
Jupyter Notebook

natolambert / rlhf-book

Sponsor

Star

Textbook on reinforcement learning from human feedback

ai alignment rlhf

Updated Jun 1, 2026
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

Here are 593 public repositories matching this topic...

hiyouga / LlamaFactory

LAION-AI / Open-Assistant

RUCAIBox / LLMSurvey

InternLM / InternLM

ymcui / Chinese-LLaMA-Alpaca-2

huggingface / alignment-handbook

Gen-Verse / OpenClaw-RL

transformerlab / transformerlab-app

argilla-io / argilla

Kiln-AI / Kiln

PKU-Alignment / align-anything

rasbt / reasoning-from-scratch

opendilab / awesome-RLHF

hiyouga / ChatGLM-Efficient-Tuning

Docta-ai / docta

argilla-io / distilabel

alibaba / ROLL

walkinglabs / hands-on-modern-rl

tatsu-lab / alpaca_eval

natolambert / rlhf-book

Improve this page

Add this topic to your repo