- Added 'logs/' directory to .gitignore to exclude log files.
- Introduced log_chat_state function to log chat states and rewards to JSONL files.
- Updated reward functions to log chat states with validation results for better tracking and debugging.
- Updated test cases to include role and tag validation for assistant messages.
- Ensured that only properly formatted messages with answer tags are accepted.
- Added new test for validating various incorrect formats and their expected outcomes.
- Introduced `eval_base.py` for evaluating base model performance.
- Introduced `eval_lora.py` for evaluating LoRA model performance with additional LoRA weight handling.
- Updated eval.py to streamline model evaluation using vLLM and unsloth.
- Deleted eval.sh as its functionality is now integrated into eval.py.
- Updated .gitignore to exclude eval_logs directory.
- observation: model hallucniate the search result, docs about debugigng and adapting to r1 distil base model, notebooks on the detail of making training r1 distil works
- Break down rl_helpers into smaller modules
- Removed deprecated rl_helpers module to streamline the codebase.
- Enhance initial user prompt template inspired by Search-R1
- Added `train_autodidact_1B.py` for quick test.
- Update `00_worklog.md`, `dataset.md`, and `reward-functions.md` to reflect new training strategies and reward functions.
- Updated `00_worklog.md` to reflect optimizations for speed and quality in dataset generation.
- Introduced new documentation files: `choosing-llm-and-prompt-101.md`, `ds-pipeline-v0.md`, and `paraphrase-prompt.md` for better clarity on LLM choices and dataset pipeline.
- Added a Jupyter notebook `250324_generate_data_anatomy.ipynb` to explore the data generation process
- Added initial files from AutoDiact as starting point
- Enhanced `README.md` with project overview and setup instructions. .
- Removed `ugly_code_file.py` as part of cleanup.
- Added various documentation files and assets for project clarity.
- Included Jupyter notebooks for training and experimentation.
Dropping this absolute disaster of a code file to break the paralysis.
No more overthinking, no more perfectionism—just write, make it work, and refine later.
Starting this repo with the most unreadable, unformatted, and ugly code possible.
The goal? Trick my brain into not caring about style—just build.
This mess exists to remind me that progress > perfection.
Ship first, clean up later.