0b4bf54833feat: update demo from DeepSearch to ReZero, adjusting related logging and UI components
thinhlpg
2025-04-15 05:19:52 +0000
9738b80353feat: update max generations and output length in evaluation scripts, add memory fraction to server launch
thinhlpg
2025-04-15 05:04:33 +0000
7ee65269fbfeat: add new evaluation notebook for model testing and checkpoint evaluation
thinhlpg
2025-04-15 05:02:29 +0000
bec864038bfeat: increase max tokens and new tokens in evaluation scripts
thinhlpg
2025-04-14 09:09:01 +0000
dfa420fa49feat: expand Makefile with serving and evaluation commands
thinhlpg
2025-04-14 07:28:10 +0000
6ba963aca3feat: streamline data preparation in Makefile with a single command
thinhlpg
2025-04-14 06:31:23 +0000
424459d840feat: update evaluation scripts to enhance model configuration and dataset loading, including increased max tokens and added logging
thinhlpg
2025-04-14 05:58:27 +0000
bf9f2c4102docs: update README with setup instructions, quick demo, and data preparation steps for better clarity and usability
thinhlpg
2025-04-14 03:13:20 +0000
1e7514f98echore: remove outdated documentation files to clean up project structure
thinhlpg
2025-04-14 02:49:25 +0000
333d1e596efeat: add prepare-dev-data target and script for Musique dev data transformation
thinhlpg
2025-04-13 20:04:35 +0000
504f0c6c8efeat: update reward_em_chunk to match only the LAST required paragraph of the reasoning chain and adjust related tests
thinhlpg
2025-04-11 18:39:18 +0000
358875a035feat: enhance reward_em_chunk function to match multiple paragraphs, add test
thinhlpg
2025-04-11 17:21:51 +0000
2df9f39fdafeat: update model configuration (longer context) and dataset loading logic for improved performance and flexibility
thinhlpg
2025-04-11 17:20:57 +0000
4a1d45271dfeat: add scripts for musique data processing
thinhlpg
2025-04-11 17:18:18 +0000
74aa673866chores: add cook notebook for musique and model reasoning pattern
thinhlpg
2025-04-11 00:59:12 +0000
1a18cd7bfdfeat: update training and evaluation configurations (editable agent generation scripts)
thinhlpg
2025-04-04 10:11:23 +0700
77f121662ftest: add tests for reward_retry function scenarios
thinhlpg
2025-04-04 09:59:07 +0700
c8714e0f6bfeat: enhance reward_retry function to handle missing answer tags
thinhlpg
2025-04-04 09:58:44 +0700
bf480574a2fix: minor bug
thinhlpg
2025-04-04 00:54:40 +0700
3081d6e36btest: added tests for new reward functions: search strategy and search diversity
thinhlpg
2025-04-04 00:28:04 +0700
4de31e0f30feat: expand reward functions with new strategies and diversity checks
thinhlpg
2025-04-04 00:27:40 +0700
d0e6068055fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
thinhlpg
2025-04-03 23:23:42 +0700
908768458cchore: update Makefile and requirements for testing
thinhlpg
2025-04-03 10:28:32 +0700
90b45c62abdocs: update docs and notebooks for the past few days, (observation, debugging)
thinhlpg
2025-04-03 10:27:17 +0700
3910ef343atest: add unit tests for agent, reward functions, and tokenizer adapters
thinhlpg
2025-04-03 10:20:40 +0700
31dcbf5d8afeat: refactor whole code base, add logic for training R1 distil base models, change some template and reward logics
thinhlpg
2025-04-03 10:19:06 +0700
c90c03267efeat: change user prompt template to search-r1 inspried format
thinhlpg
2025-04-01 06:55:38 +0700
58dcf9a99drefactor: simplify inference script by removing logger, load 16 bit model intead of raw lora finetuned
thinhlpg
2025-04-01 04:52:13 +0700
da79e986b6feat: add new script and functionality in train script to save model in 16 bit format
thinhlpg
2025-04-01 04:51:24 +0700
f6b6cca2cefeat: add multiple reference notebooks for model training and inference
thinhlpg
2025-04-01 04:18:39 +0700
04593fa8fdstyle: change line length to 119, organize imports
thinhlpg
2025-04-01 04:08:31 +0700