Adding a lot of skills for Hermes Gerhard
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
---
|
||||
name: fine-tuning-with-trl
|
||||
description: Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
|
||||
description: "TRL: SFT, DPO, PPO, GRPO, reward modeling for LLM RLHF."
|
||||
version: 1.0.0
|
||||
author: Orchestra Research
|
||||
license: MIT
|
||||
|
||||
Reference in New Issue
Block a user