parent
37730095a9
commit
fd32bcacfd
@ -1,5 +0,0 @@
|
||||
# Agent Action
|
||||
|
||||
- [ ] Research a bit more on this because I'm a bit outdated on the training side
|
||||
- [ ] How does the dataset look like?
|
||||
- [ ] How to evaluate the performance?
|
@ -0,0 +1,10 @@
|
||||
# Agentic Reward Modeling
|
||||
|
||||
- <https://medium.com/@techsachin/agentic-reward-modeling-combine-human-preferences-with-verifiable-correctness-signals-for-reliable-76c408b3491c>
|
||||
- <https://arxiv.org/pdf/2502.19328>
|
||||
- <https://github.com/THU-KEG/Agentic-Reward-Modeling>
|
||||
- <https://www.themoonlight.io/en/review/agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals-for-reliable-reward-systems>
|
||||
|
||||
- [x] Research a bit more on this because I'm a bit outdated on the training side
|
||||
- [x] How does the dataset look like?
|
||||
- [x] How to evaluate the performance?
|
@ -0,0 +1,21 @@
|
||||
# Anti-dumb reward extact match chunk prompt
|
||||
|
||||
@reward-functions.md @train_autodidact_1B.py @rl_helpers.py
|
||||
|
||||
I need to implement this function, you check the idea in @reward-functions.md . the function need to somehow be able to compare the grouth truth document chunk that the question and answer is created from, which is
|
||||
|
||||
- data in data/data_v1/saved_data/questions.json
|
||||
- data sample:
|
||||
|
||||
```
|
||||
{
|
||||
"chunk_id": 1,
|
||||
"question": "What was the location of the first pad abort of the mission?",
|
||||
"answer": "White Sands Missile Range",
|
||||
"difficulty": "easy"
|
||||
},
|
||||
```
|
||||
|
||||
- chunk content in data/data_v1/saved_data/chunks.pkl
|
||||
- chunk id is mapped to the chunk content
|
||||
- im dumb please make it easy for me to implement
|
@ -1,7 +1,7 @@
|
||||
# Adaptive Search Behavior
|
||||
|
||||
- [Agent Action](agent-action.md) -> mostly recognize missing something -> perform "refined query"
|
||||
- [ ] As a model trainer, I want to inspect the full chat state of the agent to know what's going on so I can improve it -> implement a simple cli inspect tool after training, just print out full chat state.
|
||||
- [x] As a model trainer, I want to inspect the full chat state of the agent to know what's going on so I can improve it -> implement a simple cli inspect tool after training, just print out full chat state.
|
||||
- Example from AutoDidact:
|
||||
|
||||
```markdown
|
After Width: | Height: | Size: 1.6 KiB |
After Width: | Height: | Size: 2.1 MiB |
After Width: | Height: | Size: 4.8 MiB |
@ -0,0 +1,373 @@
|
||||
# Brain Rotting Multiple GPU Workflow for Dummies
|
||||
|
||||
## Problem: Working with Multiple GPUs Without Race Conditions
|
||||
|
||||
Running multiple training processes on different GPUs can lead to:
|
||||
|
||||
- Output directory conflicts
|
||||
- Checkpoint corruption
|
||||
- Resource contention
|
||||
- Difficult debugging and tracking
|
||||
|
||||
This guide gives you dead simple solutions using only basic scripts.
|
||||
|
||||
## Directory Structure for Sanity
|
||||
|
||||
First, set up a clean directory structure to keep runs separate:
|
||||
|
||||
```
|
||||
project/
|
||||
├── scripts/
|
||||
│ ├── train_gpu0.sh
|
||||
│ ├── train_gpu1.sh
|
||||
│ └── monitor_gpus.sh
|
||||
├── src/
|
||||
│ └── train.py
|
||||
└── runs/
|
||||
├── gpu0/ # Training on GPU 0
|
||||
│ ├── checkpoints/
|
||||
│ └── logs/
|
||||
└── gpu1/ # Training on GPU 1
|
||||
├── checkpoints/
|
||||
└── logs/
|
||||
```
|
||||
|
||||
## Simple Shell Scripts for GPU Management
|
||||
|
||||
### 1. Dedicated GPU Training Script (train_gpu0.sh)
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# Assign this process to GPU 0 only
|
||||
export CUDA_VISIBLE_DEVICES=0
|
||||
|
||||
# Create timestamped run directory
|
||||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||
OUTPUT_DIR="runs/gpu0/${TIMESTAMP}"
|
||||
mkdir -p $OUTPUT_DIR/checkpoints
|
||||
mkdir -p $OUTPUT_DIR/logs
|
||||
|
||||
# Run with output redirect to log file
|
||||
python src/train.py \
|
||||
--output_dir $OUTPUT_DIR \
|
||||
--batch_size 32 \
|
||||
--learning_rate 1e-4 \
|
||||
> $OUTPUT_DIR/logs/training.log 2>&1
|
||||
```
|
||||
|
||||
### 2. Second GPU Script (train_gpu1.sh)
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# Assign this process to GPU 1 only
|
||||
export CUDA_VISIBLE_DEVICES=1
|
||||
|
||||
# Create timestamped run directory
|
||||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||
OUTPUT_DIR="runs/gpu1/${TIMESTAMP}"
|
||||
mkdir -p $OUTPUT_DIR/checkpoints
|
||||
mkdir -p $OUTPUT_DIR/logs
|
||||
|
||||
# Run with output redirect to log file
|
||||
python src/train.py \
|
||||
--output_dir $OUTPUT_DIR \
|
||||
--batch_size 32 \
|
||||
--learning_rate 1e-4 \
|
||||
> $OUTPUT_DIR/logs/training.log 2>&1
|
||||
```
|
||||
|
||||
### 3. Simple GPU Monitoring Script (monitor_gpus.sh)
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
|
||||
# Simple GPU monitoring loop with timestamps
|
||||
while true; do
|
||||
clear
|
||||
echo "======== $(date) ========"
|
||||
nvidia-smi
|
||||
sleep 5
|
||||
done
|
||||
```
|
||||
|
||||
## Checkpoint Management Without Race Conditions
|
||||
|
||||
In your `train.py`, implement safe checkpoint saving:
|
||||
|
||||
```python
|
||||
import os
|
||||
import time
|
||||
import torch
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
def save_checkpoint(model, optimizer, epoch, step, args):
|
||||
"""Save checkpoint safely without race conditions"""
|
||||
# Get process-specific info for uniqueness
|
||||
pid = os.getpid()
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
|
||||
# Create temporary directory with unique name
|
||||
checkpoint_dir = Path(args.output_dir) / "checkpoints"
|
||||
checkpoint_dir.mkdir(exist_ok=True)
|
||||
|
||||
temp_dir = checkpoint_dir / f"temp_{pid}_{timestamp}"
|
||||
temp_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Save to temporary location first
|
||||
checkpoint_path = temp_dir / "checkpoint.pt"
|
||||
torch.save({
|
||||
'epoch': epoch,
|
||||
'step': step,
|
||||
'model_state_dict': model.state_dict(),
|
||||
'optimizer_state_dict': optimizer.state_dict(),
|
||||
}, checkpoint_path)
|
||||
|
||||
# Create final directory name
|
||||
final_dir = checkpoint_dir / f"checkpoint_epoch{epoch}_step{step}"
|
||||
|
||||
# Atomic rename operation (safer than copying files)
|
||||
shutil.move(str(temp_dir), str(final_dir))
|
||||
|
||||
# Clean up old checkpoints (keep only last 5)
|
||||
checkpoints = sorted([d for d in checkpoint_dir.iterdir()
|
||||
if d.is_dir() and d.name.startswith("checkpoint_")])
|
||||
for old_checkpoint in checkpoints[:-5]:
|
||||
shutil.rmtree(old_checkpoint)
|
||||
|
||||
print(f"Saved checkpoint to {final_dir}")
|
||||
return final_dir
|
||||
```
|
||||
|
||||
## Running Multiple Training Jobs with Different Parameters
|
||||
|
||||
Create a parameter sweep script that launches multiple runs with different configs:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# param_sweep.sh
|
||||
|
||||
# Define parameter grid
|
||||
LEARNING_RATES=("1e-3" "5e-4" "1e-4")
|
||||
BATCH_SIZES=(16 32 64)
|
||||
|
||||
# Loop through parameters and assign to GPUs
|
||||
GPU=0
|
||||
for lr in "${LEARNING_RATES[@]}"; do
|
||||
for bs in "${BATCH_SIZES[@]}"; do
|
||||
# Select GPU using modulo to cycle through available GPUs
|
||||
SELECTED_GPU=$(($GPU % 2)) # Assuming 2 GPUs (0 and 1)
|
||||
GPU=$((GPU + 1))
|
||||
|
||||
# Create run directory
|
||||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||
RUN_NAME="lr${lr}_bs${bs}"
|
||||
OUTPUT_DIR="runs/gpu${SELECTED_GPU}/${RUN_NAME}_${TIMESTAMP}"
|
||||
mkdir -p $OUTPUT_DIR/checkpoints
|
||||
mkdir -p $OUTPUT_DIR/logs
|
||||
|
||||
# Launch training in background
|
||||
echo "Starting run on GPU ${SELECTED_GPU}: lr=${lr}, bs=${bs}"
|
||||
CUDA_VISIBLE_DEVICES=$SELECTED_GPU python src/train.py \
|
||||
--output_dir $OUTPUT_DIR \
|
||||
--batch_size $bs \
|
||||
--learning_rate $lr \
|
||||
> $OUTPUT_DIR/logs/training.log 2>&1 &
|
||||
|
||||
# Wait a bit to stagger the starts
|
||||
sleep 10
|
||||
done
|
||||
done
|
||||
|
||||
echo "All jobs launched. Monitor with './scripts/monitor_gpus.sh'"
|
||||
```
|
||||
|
||||
## Dead Simple Experiment Tracking Without MLflow
|
||||
|
||||
Create a simple CSV logger in your training script:
|
||||
|
||||
```python
|
||||
import csv
|
||||
from pathlib import Path
|
||||
|
||||
class SimpleLogger:
|
||||
def __init__(self, log_dir):
|
||||
self.log_dir = Path(log_dir)
|
||||
self.log_dir.mkdir(exist_ok=True, parents=True)
|
||||
|
||||
# Initialize metrics CSV
|
||||
self.metrics_file = self.log_dir / "metrics.csv"
|
||||
self.header_written = False
|
||||
|
||||
# Keep track of best metrics
|
||||
self.best_metrics = {}
|
||||
|
||||
def log_metrics(self, metrics, step):
|
||||
"""Log metrics to CSV file"""
|
||||
metrics["step"] = step
|
||||
|
||||
# Create or append to CSV
|
||||
write_header = not self.metrics_file.exists()
|
||||
|
||||
with open(self.metrics_file, mode='a', newline='') as file:
|
||||
writer = csv.DictWriter(file, fieldnames=metrics.keys())
|
||||
if write_header:
|
||||
writer.writeheader()
|
||||
writer.writerow(metrics)
|
||||
|
||||
# Update best metrics
|
||||
for key, value in metrics.items():
|
||||
if key != "step":
|
||||
if key not in self.best_metrics or value < self.best_metrics[key]["value"]:
|
||||
self.best_metrics[key] = {"value": value, "step": step}
|
||||
|
||||
# Write best metrics summary
|
||||
with open(self.log_dir / "best_metrics.txt", 'w') as f:
|
||||
for key, data in self.best_metrics.items():
|
||||
f.write(f"Best {key}: {data['value']} (step {data['step']})\n")
|
||||
```
|
||||
|
||||
## Finding and Comparing Results
|
||||
|
||||
Create a simple results aggregation script:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# aggregate_results.sh
|
||||
|
||||
echo "Run Directory,Best Loss,Best Accuracy,Training Time"
|
||||
|
||||
find runs/ -name "best_metrics.txt" | while read metrics_file; do
|
||||
run_dir=$(dirname "$metrics_file")
|
||||
best_loss=$(grep "Best loss" "$metrics_file" | cut -d' ' -f3)
|
||||
best_acc=$(grep "Best accuracy" "$metrics_file" | cut -d' ' -f3)
|
||||
|
||||
# Get training time from log
|
||||
log_file="$run_dir/logs/training.log"
|
||||
start_time=$(head -n 1 "$log_file" | grep -oE '[0-9]{2}:[0-9]{2}:[0-9]{2}')
|
||||
end_time=$(tail -n 10 "$log_file" | grep -oE '[0-9]{2}:[0-9]{2}:[0-9]{2}' | tail -n 1)
|
||||
|
||||
echo "$run_dir,$best_loss,$best_acc,$start_time-$end_time"
|
||||
done | sort -t',' -k2n
|
||||
```
|
||||
|
||||
## Simple Visualization Without External Tools
|
||||
|
||||
Create a basic plotting script using matplotlib:
|
||||
|
||||
```python
|
||||
# plot_results.py
|
||||
import os
|
||||
import glob
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
from pathlib import Path
|
||||
|
||||
# Find all metrics.csv files
|
||||
metrics_files = glob.glob("runs/**/metrics.csv", recursive=True)
|
||||
|
||||
plt.figure(figsize=(12, 8))
|
||||
|
||||
# Plot each run
|
||||
for metrics_file in metrics_files:
|
||||
run_name = Path(metrics_file).parent.name
|
||||
df = pd.read_csv(metrics_file)
|
||||
|
||||
plt.plot(df['step'], df['loss'], label=f"{run_name} - loss")
|
||||
|
||||
plt.xlabel('Step')
|
||||
plt.ylabel('Loss')
|
||||
plt.title('Training Loss Comparison')
|
||||
plt.legend()
|
||||
plt.grid(True)
|
||||
plt.tight_layout()
|
||||
plt.savefig('loss_comparison.png')
|
||||
plt.close()
|
||||
|
||||
# Create accuracy plot if available
|
||||
plt.figure(figsize=(12, 8))
|
||||
for metrics_file in metrics_files:
|
||||
run_name = Path(metrics_file).parent.name
|
||||
df = pd.read_csv(metrics_file)
|
||||
|
||||
if 'accuracy' in df.columns:
|
||||
plt.plot(df['step'], df['accuracy'], label=f"{run_name} - accuracy")
|
||||
|
||||
plt.xlabel('Step')
|
||||
plt.ylabel('Accuracy')
|
||||
plt.title('Training Accuracy Comparison')
|
||||
plt.legend()
|
||||
plt.grid(True)
|
||||
plt.tight_layout()
|
||||
plt.savefig('accuracy_comparison.png')
|
||||
```
|
||||
|
||||
## Process Management and GPU Allocation
|
||||
|
||||
Create a script to check GPU usage and allocate new jobs:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# allocate_gpu.sh
|
||||
|
||||
# This script checks GPU usage and returns the index of the least utilized GPU
|
||||
LEAST_BUSY_GPU=$(nvidia-smi --query-gpu=index,utilization.gpu --format=csv,noheader,nounits |
|
||||
sort -t',' -k2n |
|
||||
head -n 1 |
|
||||
cut -d',' -f1)
|
||||
|
||||
echo $LEAST_BUSY_GPU
|
||||
```
|
||||
|
||||
## Tips for Avoiding Race Conditions
|
||||
|
||||
1. **Always use unique output directories for each run**:
|
||||
- Include timestamp, GPU ID, and PID in directory names
|
||||
- Never share output directories between processes
|
||||
|
||||
2. **For checkpoint saving**:
|
||||
- Save to temp directory first
|
||||
- Use atomic operations like `shutil.move()` for final placement
|
||||
- Don't depend on file locks (often unreliable with network filesystems)
|
||||
|
||||
3. **For data loading**:
|
||||
- Use different random seeds per process
|
||||
- Set `num_workers` appropriately (2-4 per GPU usually works well)
|
||||
- Add process-specific buffer to avoid filesystem contention
|
||||
|
||||
4. **For logging**:
|
||||
- Each process should write to its own log file
|
||||
- Use timestamps in log entries
|
||||
- Include GPU ID and PID in log messages
|
||||
|
||||
## Quick Commands Reference
|
||||
|
||||
```bash
|
||||
# Start training on GPU 0
|
||||
./scripts/train_gpu0.sh
|
||||
|
||||
# Start training on GPU 1
|
||||
./scripts/train_gpu1.sh
|
||||
|
||||
# Run parameter sweep across GPUs
|
||||
./scripts/param_sweep.sh
|
||||
|
||||
# Monitor GPU usage
|
||||
./scripts/monitor_gpus.sh
|
||||
|
||||
# Find GPU with lowest utilization
|
||||
BEST_GPU=$(./scripts/allocate_gpu.sh)
|
||||
echo "Least busy GPU is: $BEST_GPU"
|
||||
|
||||
# Aggregate results into CSV
|
||||
./scripts/aggregate_results.sh > results_summary.csv
|
||||
|
||||
# Generate comparison plots
|
||||
python scripts/plot_results.py
|
||||
```
|
||||
|
||||
Remember: The simplest solution is usually the most maintainable. Keep your scripts straightforward, make each run independent, and use filesystem organization to avoid conflicts.
|
||||
|
||||
# TODO: Replace print statements with loguru logging for better debugging and log file management
|
@ -0,0 +1,25 @@
|
||||
# Chat Template 101
|
||||
|
||||
This repo was orignally created with the chat template of LLama instruct model family, so i need to somehow hackaround to be able to train new models base on deepseek-r1-distil-xxx
|
||||
|
||||
## Getting the intuition
|
||||
|
||||
- <https://huggingface.co/docs/transformers/main/chat_templating>
|
||||
- > A chat template is **a part of the tokenizer** and it specifies how to convert conversations into a single tokenizable string in the expected model format.
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
|
||||
chat = [
|
||||
{"role": "user", "content": "Hello, how are you?"},
|
||||
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
||||
{"role": "user", "content": "I'd like to show off how chat templating works!"},
|
||||
]
|
||||
|
||||
tokenizer.apply_chat_template(chat, tokenize=False)
|
||||
|
||||
<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]
|
||||
```
|
||||
|
||||
- 💡 OHhhhh can just make a jupyter notebook to play around with this
|
@ -0,0 +1,12 @@
|
||||
I'll give you two file below. your job is to create a script that bring the content of the chunk file to the question file, map by the chunk_id, which is the sequential number of the chunk in the chunk file. the new column should be called "chunk_content".
|
||||
|
||||
/home/thinhlpg/code/DeepSearch/data/data_v1/saved_data/questions.json
|
||||
[
|
||||
{
|
||||
"chunk_id": 1,
|
||||
"question": "What was the location of the first pad abort of the mission?",
|
||||
"answer": "White Sands Missile Range",
|
||||
"difficulty": "easy"
|
||||
},
|
||||
|
||||
/home/thinhlpg/code/DeepSearch/data/data_v1/saved_data/chunks.pkl
|
@ -1 +0,0 @@
|
||||
# Paraphrase Prompt
|
@ -0,0 +1,15 @@
|
||||
# Random Popup Idea 💡
|
||||
|
||||
```
|
||||
# There are actually two ways to handle multiple function calls:
|
||||
|
||||
# 1. Sequential (One at a time)
|
||||
Assistant: *makes search call 1*
|
||||
System: *returns result 1*
|
||||
Assistant: *analyzes result 1, makes search call 2 if needed*
|
||||
System: *returns result 2*
|
||||
|
||||
# 2. Parallel (Using tool_calls array) 💡 -> how about training with this? each assistant response can have multiple function calls with different search queries
|
||||
Assistant: *makes multiple search calls at once*
|
||||
System: *returns all results together*
|
||||
```
|
@ -0,0 +1 @@
|
||||
# Note on stuff that didn't work ❌
|
@ -0,0 +1 @@
|
||||
# Note on stuff that worked ✅
|
Loading…
Reference in new issue