parent
37730095a9
commit
fd32bcacfd
@ -1,5 +0,0 @@
|
|||||||
# Agent Action
|
|
||||||
|
|
||||||
- [ ] Research a bit more on this because I'm a bit outdated on the training side
|
|
||||||
- [ ] How does the dataset look like?
|
|
||||||
- [ ] How to evaluate the performance?
|
|
@ -0,0 +1,10 @@
|
|||||||
|
# Agentic Reward Modeling
|
||||||
|
|
||||||
|
- <https://medium.com/@techsachin/agentic-reward-modeling-combine-human-preferences-with-verifiable-correctness-signals-for-reliable-76c408b3491c>
|
||||||
|
- <https://arxiv.org/pdf/2502.19328>
|
||||||
|
- <https://github.com/THU-KEG/Agentic-Reward-Modeling>
|
||||||
|
- <https://www.themoonlight.io/en/review/agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals-for-reliable-reward-systems>
|
||||||
|
|
||||||
|
- [x] Research a bit more on this because I'm a bit outdated on the training side
|
||||||
|
- [x] How does the dataset look like?
|
||||||
|
- [x] How to evaluate the performance?
|
@ -0,0 +1,21 @@
|
|||||||
|
# Anti-dumb reward extact match chunk prompt
|
||||||
|
|
||||||
|
@reward-functions.md @train_autodidact_1B.py @rl_helpers.py
|
||||||
|
|
||||||
|
I need to implement this function, you check the idea in @reward-functions.md . the function need to somehow be able to compare the grouth truth document chunk that the question and answer is created from, which is
|
||||||
|
|
||||||
|
- data in data/data_v1/saved_data/questions.json
|
||||||
|
- data sample:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"chunk_id": 1,
|
||||||
|
"question": "What was the location of the first pad abort of the mission?",
|
||||||
|
"answer": "White Sands Missile Range",
|
||||||
|
"difficulty": "easy"
|
||||||
|
},
|
||||||
|
```
|
||||||
|
|
||||||
|
- chunk content in data/data_v1/saved_data/chunks.pkl
|
||||||
|
- chunk id is mapped to the chunk content
|
||||||
|
- im dumb please make it easy for me to implement
|
@ -1,7 +1,7 @@
|
|||||||
# Adaptive Search Behavior
|
# Adaptive Search Behavior
|
||||||
|
|
||||||
- [Agent Action](agent-action.md) -> mostly recognize missing something -> perform "refined query"
|
- [Agent Action](agent-action.md) -> mostly recognize missing something -> perform "refined query"
|
||||||
- [ ] As a model trainer, I want to inspect the full chat state of the agent to know what's going on so I can improve it -> implement a simple cli inspect tool after training, just print out full chat state.
|
- [x] As a model trainer, I want to inspect the full chat state of the agent to know what's going on so I can improve it -> implement a simple cli inspect tool after training, just print out full chat state.
|
||||||
- Example from AutoDidact:
|
- Example from AutoDidact:
|
||||||
|
|
||||||
```markdown
|
```markdown
|
After Width: | Height: | Size: 1.6 KiB |
After Width: | Height: | Size: 2.1 MiB |
After Width: | Height: | Size: 4.8 MiB |
@ -0,0 +1,373 @@
|
|||||||
|
# Brain Rotting Multiple GPU Workflow for Dummies
|
||||||
|
|
||||||
|
## Problem: Working with Multiple GPUs Without Race Conditions
|
||||||
|
|
||||||
|
Running multiple training processes on different GPUs can lead to:
|
||||||
|
|
||||||
|
- Output directory conflicts
|
||||||
|
- Checkpoint corruption
|
||||||
|
- Resource contention
|
||||||
|
- Difficult debugging and tracking
|
||||||
|
|
||||||
|
This guide gives you dead simple solutions using only basic scripts.
|
||||||
|
|
||||||
|
## Directory Structure for Sanity
|
||||||
|
|
||||||
|
First, set up a clean directory structure to keep runs separate:
|
||||||
|
|
||||||
|
```
|
||||||
|
project/
|
||||||
|
├── scripts/
|
||||||
|
│ ├── train_gpu0.sh
|
||||||
|
│ ├── train_gpu1.sh
|
||||||
|
│ └── monitor_gpus.sh
|
||||||
|
├── src/
|
||||||
|
│ └── train.py
|
||||||
|
└── runs/
|
||||||
|
├── gpu0/ # Training on GPU 0
|
||||||
|
│ ├── checkpoints/
|
||||||
|
│ └── logs/
|
||||||
|
└── gpu1/ # Training on GPU 1
|
||||||
|
├── checkpoints/
|
||||||
|
└── logs/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Simple Shell Scripts for GPU Management
|
||||||
|
|
||||||
|
### 1. Dedicated GPU Training Script (train_gpu0.sh)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Assign this process to GPU 0 only
|
||||||
|
export CUDA_VISIBLE_DEVICES=0
|
||||||
|
|
||||||
|
# Create timestamped run directory
|
||||||
|
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||||
|
OUTPUT_DIR="runs/gpu0/${TIMESTAMP}"
|
||||||
|
mkdir -p $OUTPUT_DIR/checkpoints
|
||||||
|
mkdir -p $OUTPUT_DIR/logs
|
||||||
|
|
||||||
|
# Run with output redirect to log file
|
||||||
|
python src/train.py \
|
||||||
|
--output_dir $OUTPUT_DIR \
|
||||||
|
--batch_size 32 \
|
||||||
|
--learning_rate 1e-4 \
|
||||||
|
> $OUTPUT_DIR/logs/training.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Second GPU Script (train_gpu1.sh)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Assign this process to GPU 1 only
|
||||||
|
export CUDA_VISIBLE_DEVICES=1
|
||||||
|
|
||||||
|
# Create timestamped run directory
|
||||||
|
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||||
|
OUTPUT_DIR="runs/gpu1/${TIMESTAMP}"
|
||||||
|
mkdir -p $OUTPUT_DIR/checkpoints
|
||||||
|
mkdir -p $OUTPUT_DIR/logs
|
||||||
|
|
||||||
|
# Run with output redirect to log file
|
||||||
|
python src/train.py \
|
||||||
|
--output_dir $OUTPUT_DIR \
|
||||||
|
--batch_size 32 \
|
||||||
|
--learning_rate 1e-4 \
|
||||||
|
> $OUTPUT_DIR/logs/training.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Simple GPU Monitoring Script (monitor_gpus.sh)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Simple GPU monitoring loop with timestamps
|
||||||
|
while true; do
|
||||||
|
clear
|
||||||
|
echo "======== $(date) ========"
|
||||||
|
nvidia-smi
|
||||||
|
sleep 5
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Checkpoint Management Without Race Conditions
|
||||||
|
|
||||||
|
In your `train.py`, implement safe checkpoint saving:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import torch
|
||||||
|
import shutil
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
def save_checkpoint(model, optimizer, epoch, step, args):
|
||||||
|
"""Save checkpoint safely without race conditions"""
|
||||||
|
# Get process-specific info for uniqueness
|
||||||
|
pid = os.getpid()
|
||||||
|
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||||
|
|
||||||
|
# Create temporary directory with unique name
|
||||||
|
checkpoint_dir = Path(args.output_dir) / "checkpoints"
|
||||||
|
checkpoint_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
temp_dir = checkpoint_dir / f"temp_{pid}_{timestamp}"
|
||||||
|
temp_dir.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Save to temporary location first
|
||||||
|
checkpoint_path = temp_dir / "checkpoint.pt"
|
||||||
|
torch.save({
|
||||||
|
'epoch': epoch,
|
||||||
|
'step': step,
|
||||||
|
'model_state_dict': model.state_dict(),
|
||||||
|
'optimizer_state_dict': optimizer.state_dict(),
|
||||||
|
}, checkpoint_path)
|
||||||
|
|
||||||
|
# Create final directory name
|
||||||
|
final_dir = checkpoint_dir / f"checkpoint_epoch{epoch}_step{step}"
|
||||||
|
|
||||||
|
# Atomic rename operation (safer than copying files)
|
||||||
|
shutil.move(str(temp_dir), str(final_dir))
|
||||||
|
|
||||||
|
# Clean up old checkpoints (keep only last 5)
|
||||||
|
checkpoints = sorted([d for d in checkpoint_dir.iterdir()
|
||||||
|
if d.is_dir() and d.name.startswith("checkpoint_")])
|
||||||
|
for old_checkpoint in checkpoints[:-5]:
|
||||||
|
shutil.rmtree(old_checkpoint)
|
||||||
|
|
||||||
|
print(f"Saved checkpoint to {final_dir}")
|
||||||
|
return final_dir
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Multiple Training Jobs with Different Parameters
|
||||||
|
|
||||||
|
Create a parameter sweep script that launches multiple runs with different configs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# param_sweep.sh
|
||||||
|
|
||||||
|
# Define parameter grid
|
||||||
|
LEARNING_RATES=("1e-3" "5e-4" "1e-4")
|
||||||
|
BATCH_SIZES=(16 32 64)
|
||||||
|
|
||||||
|
# Loop through parameters and assign to GPUs
|
||||||
|
GPU=0
|
||||||
|
for lr in "${LEARNING_RATES[@]}"; do
|
||||||
|
for bs in "${BATCH_SIZES[@]}"; do
|
||||||
|
# Select GPU using modulo to cycle through available GPUs
|
||||||
|
SELECTED_GPU=$(($GPU % 2)) # Assuming 2 GPUs (0 and 1)
|
||||||
|
GPU=$((GPU + 1))
|
||||||
|
|
||||||
|
# Create run directory
|
||||||
|
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||||
|
RUN_NAME="lr${lr}_bs${bs}"
|
||||||
|
OUTPUT_DIR="runs/gpu${SELECTED_GPU}/${RUN_NAME}_${TIMESTAMP}"
|
||||||
|
mkdir -p $OUTPUT_DIR/checkpoints
|
||||||
|
mkdir -p $OUTPUT_DIR/logs
|
||||||
|
|
||||||
|
# Launch training in background
|
||||||
|
echo "Starting run on GPU ${SELECTED_GPU}: lr=${lr}, bs=${bs}"
|
||||||
|
CUDA_VISIBLE_DEVICES=$SELECTED_GPU python src/train.py \
|
||||||
|
--output_dir $OUTPUT_DIR \
|
||||||
|
--batch_size $bs \
|
||||||
|
--learning_rate $lr \
|
||||||
|
> $OUTPUT_DIR/logs/training.log 2>&1 &
|
||||||
|
|
||||||
|
# Wait a bit to stagger the starts
|
||||||
|
sleep 10
|
||||||
|
done
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "All jobs launched. Monitor with './scripts/monitor_gpus.sh'"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dead Simple Experiment Tracking Without MLflow
|
||||||
|
|
||||||
|
Create a simple CSV logger in your training script:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import csv
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
class SimpleLogger:
|
||||||
|
def __init__(self, log_dir):
|
||||||
|
self.log_dir = Path(log_dir)
|
||||||
|
self.log_dir.mkdir(exist_ok=True, parents=True)
|
||||||
|
|
||||||
|
# Initialize metrics CSV
|
||||||
|
self.metrics_file = self.log_dir / "metrics.csv"
|
||||||
|
self.header_written = False
|
||||||
|
|
||||||
|
# Keep track of best metrics
|
||||||
|
self.best_metrics = {}
|
||||||
|
|
||||||
|
def log_metrics(self, metrics, step):
|
||||||
|
"""Log metrics to CSV file"""
|
||||||
|
metrics["step"] = step
|
||||||
|
|
||||||
|
# Create or append to CSV
|
||||||
|
write_header = not self.metrics_file.exists()
|
||||||
|
|
||||||
|
with open(self.metrics_file, mode='a', newline='') as file:
|
||||||
|
writer = csv.DictWriter(file, fieldnames=metrics.keys())
|
||||||
|
if write_header:
|
||||||
|
writer.writeheader()
|
||||||
|
writer.writerow(metrics)
|
||||||
|
|
||||||
|
# Update best metrics
|
||||||
|
for key, value in metrics.items():
|
||||||
|
if key != "step":
|
||||||
|
if key not in self.best_metrics or value < self.best_metrics[key]["value"]:
|
||||||
|
self.best_metrics[key] = {"value": value, "step": step}
|
||||||
|
|
||||||
|
# Write best metrics summary
|
||||||
|
with open(self.log_dir / "best_metrics.txt", 'w') as f:
|
||||||
|
for key, data in self.best_metrics.items():
|
||||||
|
f.write(f"Best {key}: {data['value']} (step {data['step']})\n")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Finding and Comparing Results
|
||||||
|
|
||||||
|
Create a simple results aggregation script:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# aggregate_results.sh
|
||||||
|
|
||||||
|
echo "Run Directory,Best Loss,Best Accuracy,Training Time"
|
||||||
|
|
||||||
|
find runs/ -name "best_metrics.txt" | while read metrics_file; do
|
||||||
|
run_dir=$(dirname "$metrics_file")
|
||||||
|
best_loss=$(grep "Best loss" "$metrics_file" | cut -d' ' -f3)
|
||||||
|
best_acc=$(grep "Best accuracy" "$metrics_file" | cut -d' ' -f3)
|
||||||
|
|
||||||
|
# Get training time from log
|
||||||
|
log_file="$run_dir/logs/training.log"
|
||||||
|
start_time=$(head -n 1 "$log_file" | grep -oE '[0-9]{2}:[0-9]{2}:[0-9]{2}')
|
||||||
|
end_time=$(tail -n 10 "$log_file" | grep -oE '[0-9]{2}:[0-9]{2}:[0-9]{2}' | tail -n 1)
|
||||||
|
|
||||||
|
echo "$run_dir,$best_loss,$best_acc,$start_time-$end_time"
|
||||||
|
done | sort -t',' -k2n
|
||||||
|
```
|
||||||
|
|
||||||
|
## Simple Visualization Without External Tools
|
||||||
|
|
||||||
|
Create a basic plotting script using matplotlib:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# plot_results.py
|
||||||
|
import os
|
||||||
|
import glob
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Find all metrics.csv files
|
||||||
|
metrics_files = glob.glob("runs/**/metrics.csv", recursive=True)
|
||||||
|
|
||||||
|
plt.figure(figsize=(12, 8))
|
||||||
|
|
||||||
|
# Plot each run
|
||||||
|
for metrics_file in metrics_files:
|
||||||
|
run_name = Path(metrics_file).parent.name
|
||||||
|
df = pd.read_csv(metrics_file)
|
||||||
|
|
||||||
|
plt.plot(df['step'], df['loss'], label=f"{run_name} - loss")
|
||||||
|
|
||||||
|
plt.xlabel('Step')
|
||||||
|
plt.ylabel('Loss')
|
||||||
|
plt.title('Training Loss Comparison')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.savefig('loss_comparison.png')
|
||||||
|
plt.close()
|
||||||
|
|
||||||
|
# Create accuracy plot if available
|
||||||
|
plt.figure(figsize=(12, 8))
|
||||||
|
for metrics_file in metrics_files:
|
||||||
|
run_name = Path(metrics_file).parent.name
|
||||||
|
df = pd.read_csv(metrics_file)
|
||||||
|
|
||||||
|
if 'accuracy' in df.columns:
|
||||||
|
plt.plot(df['step'], df['accuracy'], label=f"{run_name} - accuracy")
|
||||||
|
|
||||||
|
plt.xlabel('Step')
|
||||||
|
plt.ylabel('Accuracy')
|
||||||
|
plt.title('Training Accuracy Comparison')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.savefig('accuracy_comparison.png')
|
||||||
|
```
|
||||||
|
|
||||||
|
## Process Management and GPU Allocation
|
||||||
|
|
||||||
|
Create a script to check GPU usage and allocate new jobs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# allocate_gpu.sh
|
||||||
|
|
||||||
|
# This script checks GPU usage and returns the index of the least utilized GPU
|
||||||
|
LEAST_BUSY_GPU=$(nvidia-smi --query-gpu=index,utilization.gpu --format=csv,noheader,nounits |
|
||||||
|
sort -t',' -k2n |
|
||||||
|
head -n 1 |
|
||||||
|
cut -d',' -f1)
|
||||||
|
|
||||||
|
echo $LEAST_BUSY_GPU
|
||||||
|
```
|
||||||
|
|
||||||
|
## Tips for Avoiding Race Conditions
|
||||||
|
|
||||||
|
1. **Always use unique output directories for each run**:
|
||||||
|
- Include timestamp, GPU ID, and PID in directory names
|
||||||
|
- Never share output directories between processes
|
||||||
|
|
||||||
|
2. **For checkpoint saving**:
|
||||||
|
- Save to temp directory first
|
||||||
|
- Use atomic operations like `shutil.move()` for final placement
|
||||||
|
- Don't depend on file locks (often unreliable with network filesystems)
|
||||||
|
|
||||||
|
3. **For data loading**:
|
||||||
|
- Use different random seeds per process
|
||||||
|
- Set `num_workers` appropriately (2-4 per GPU usually works well)
|
||||||
|
- Add process-specific buffer to avoid filesystem contention
|
||||||
|
|
||||||
|
4. **For logging**:
|
||||||
|
- Each process should write to its own log file
|
||||||
|
- Use timestamps in log entries
|
||||||
|
- Include GPU ID and PID in log messages
|
||||||
|
|
||||||
|
## Quick Commands Reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start training on GPU 0
|
||||||
|
./scripts/train_gpu0.sh
|
||||||
|
|
||||||
|
# Start training on GPU 1
|
||||||
|
./scripts/train_gpu1.sh
|
||||||
|
|
||||||
|
# Run parameter sweep across GPUs
|
||||||
|
./scripts/param_sweep.sh
|
||||||
|
|
||||||
|
# Monitor GPU usage
|
||||||
|
./scripts/monitor_gpus.sh
|
||||||
|
|
||||||
|
# Find GPU with lowest utilization
|
||||||
|
BEST_GPU=$(./scripts/allocate_gpu.sh)
|
||||||
|
echo "Least busy GPU is: $BEST_GPU"
|
||||||
|
|
||||||
|
# Aggregate results into CSV
|
||||||
|
./scripts/aggregate_results.sh > results_summary.csv
|
||||||
|
|
||||||
|
# Generate comparison plots
|
||||||
|
python scripts/plot_results.py
|
||||||
|
```
|
||||||
|
|
||||||
|
Remember: The simplest solution is usually the most maintainable. Keep your scripts straightforward, make each run independent, and use filesystem organization to avoid conflicts.
|
||||||
|
|
||||||
|
# TODO: Replace print statements with loguru logging for better debugging and log file management
|
@ -0,0 +1,25 @@
|
|||||||
|
# Chat Template 101
|
||||||
|
|
||||||
|
This repo was orignally created with the chat template of LLama instruct model family, so i need to somehow hackaround to be able to train new models base on deepseek-r1-distil-xxx
|
||||||
|
|
||||||
|
## Getting the intuition
|
||||||
|
|
||||||
|
- <https://huggingface.co/docs/transformers/main/chat_templating>
|
||||||
|
- > A chat template is **a part of the tokenizer** and it specifies how to convert conversations into a single tokenizable string in the expected model format.
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer
|
||||||
|
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
|
||||||
|
chat = [
|
||||||
|
{"role": "user", "content": "Hello, how are you?"},
|
||||||
|
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||||
|
{"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||||
|
]
|
||||||
|
|
||||||
|
tokenizer.apply_chat_template(chat, tokenize=False)
|
||||||
|
|
||||||
|
<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]
|
||||||
|
```
|
||||||
|
|
||||||
|
- 💡 OHhhhh can just make a jupyter notebook to play around with this
|
@ -0,0 +1,12 @@
|
|||||||
|
I'll give you two file below. your job is to create a script that bring the content of the chunk file to the question file, map by the chunk_id, which is the sequential number of the chunk in the chunk file. the new column should be called "chunk_content".
|
||||||
|
|
||||||
|
/home/thinhlpg/code/DeepSearch/data/data_v1/saved_data/questions.json
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"chunk_id": 1,
|
||||||
|
"question": "What was the location of the first pad abort of the mission?",
|
||||||
|
"answer": "White Sands Missile Range",
|
||||||
|
"difficulty": "easy"
|
||||||
|
},
|
||||||
|
|
||||||
|
/home/thinhlpg/code/DeepSearch/data/data_v1/saved_data/chunks.pkl
|
@ -1 +0,0 @@
|
|||||||
# Paraphrase Prompt
|
|
@ -0,0 +1,15 @@
|
|||||||
|
# Random Popup Idea 💡
|
||||||
|
|
||||||
|
```
|
||||||
|
# There are actually two ways to handle multiple function calls:
|
||||||
|
|
||||||
|
# 1. Sequential (One at a time)
|
||||||
|
Assistant: *makes search call 1*
|
||||||
|
System: *returns result 1*
|
||||||
|
Assistant: *analyzes result 1, makes search call 2 if needed*
|
||||||
|
System: *returns result 2*
|
||||||
|
|
||||||
|
# 2. Parallel (Using tool_calls array) 💡 -> how about training with this? each assistant response can have multiple function calls with different search queries
|
||||||
|
Assistant: *makes multiple search calls at once*
|
||||||
|
System: *returns all results together*
|
||||||
|
```
|
@ -0,0 +1 @@
|
|||||||
|
# Note on stuff that didn't work ❌
|
@ -0,0 +1 @@
|
|||||||
|
# Note on stuff that worked ✅
|
Loading…
Reference in new issue