46 Commits (56911a73f9aadca357ff292a067c14c130bb04d4)
 

Author SHA1 Message Date
automaticcat 56911a73f9 Update README.md
1 month ago
thinhlpg 1a18cd7bfd feat: update training and evaluation configurations (editable agent generation scripts)
1 month ago
thinhlpg 77f121662f test: add tests for reward_retry function scenarios
1 month ago
thinhlpg c8714e0f6b feat: enhance reward_retry function to handle missing answer tags
1 month ago
thinhlpg bf480574a2 fix: minor bug
1 month ago
thinhlpg 3081d6e36b test: added tests for new reward functions: search strategy and search diversity
1 month ago
thinhlpg 4de31e0f30 feat: expand reward functions with new strategies and diversity checks
1 month ago
thinhlpg d0e6068055 fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
1 month ago
thinhlpg 1bd609dfae test: enhance reward correctness tests with validation logic
1 month ago
thinhlpg 338655e563 feat: refine user prompt logic for improved clarity and structure
1 month ago
thinhlpg 6d994feeb2 feat: enhance evaluation scripts for base and LoRA models
1 month ago
thinhlpg da60b52bd1 feat: refactor download and upload scripts for improved argument handling (more notebook friendly :D)
1 month ago
thinhlpg fa3c0562fe feat: add evaluation scripts for base and LoRA models
1 month ago
thinhlpg 1047e2fa1c chore: update .gitignore and requirements for unsloth versions
1 month ago
thinhlpg 83f86869f6 chore: update .gitignore and add new toys data files
1 month ago
thinhlpg 133cb1ab90 test: add Qwen tokenizer adapter tests
1 month ago
thinhlpg 6efe01d5ff chore: update Makefile and requirements for testing
1 month ago
thinhlpg af7f38c792 feat: add code for qwen architecture
1 month ago
thinhlpg e7915a6a8e feat: add util script to upload/download checkpoints
1 month ago
thinhlpg 9009440663 chore: disable logging, enable torch complie
1 month ago
thinhlpg d2f03b96ab feat: enhance evaluation script and remove deprecated shell script
1 month ago
thinhlpg 908768458c chore: update Makefile and requirements for testing
1 month ago
thinhlpg 90b45c62ab docs: update docs and notebooks for the past few days, (observation, debugging)
1 month ago
thinhlpg 3910ef343a test: add unit tests for agent, reward functions, and tokenizer adapters
1 month ago
thinhlpg 31dcbf5d8a feat: refactor whole code base, add logic for training R1 distil base models, change some template and reward logics
1 month ago
thinhlpg c90c03267e feat: change user prompt template to search-r1 inspried format
1 month ago
thinhlpg 58dcf9a99d refactor: simplify inference script by removing logger, load 16 bit model intead of raw lora finetuned
1 month ago
thinhlpg da79e986b6 feat: add new script and functionality in train script to save model in 16 bit format
1 month ago
thinhlpg f6b6cca2ce feat: add multiple reference notebooks for model training and inference
1 month ago
thinhlpg 04593fa8fd style: change line length to 119, organize imports
1 month ago
thinhlpg abb18b10d8 feat: add CLI inference script with search functionality
1 month ago
thinhlpg fe70896023 chore: add Makefile for installation, code quality checks, style formatting, cleanup, and other tasks
1 month ago
thinhlpg 60233f2113 chore: update .gitignore
1 month ago
thinhlpg fd32bcacfd chores: update worklog and research progress
1 month ago
thinhlpg 37730095a9 feat: add eval scripts that compare base model performance with the grpo trained model
1 month ago
thinhlpg 7f2f43aa46 chore: clean up notebooks
1 month ago
thinhlpg 3c2deaced9 refactor: restructure code base, better centralize logging logic
1 month ago
thinhlpg 04d56325bb feat: add new reward functions, add less dumb data generation logic, implement better logging
2 months ago
thinhlpg b22b02ea1d feat: changed `<reasoning>` tags to `<think>
2 months ago
thinhlpg 7d4de89186 chore: update worklog 250324
2 months ago
thinhlpg 1bdee261b6 feat: add draft data generation and documentation
2 months ago
thinhlpg f19354a8c9 chore: clean up notebook output
2 months ago
thinhlpg f60ab499eb chore: update worklog
2 months ago
thinhlpg a58722e16f feat: add initial project structure and core functionality
2 months ago
Thinh Le 91c2476c28 chore: initial commit - the ugliest code i've ever written 💀
2 months ago
Thinh Le bf32fdd897 Initial commit
2 months ago