Added logic to return 0 if the final message from the assistant does not contain answer tags (no matter how hard you try, you won't get anything if no result 💀)
- Added 'logs/' directory to .gitignore to exclude log files.
- Introduced log_chat_state function to log chat states and rewards to JSONL files.
- Updated reward functions to log chat states with validation results for better tracking and debugging.
- Updated test cases to include role and tag validation for assistant messages.
- Ensured that only properly formatted messages with answer tags are accepted.
- Added new test for validating various incorrect formats and their expected outcomes.
- Introduced `eval_base.py` for evaluating base model performance.
- Introduced `eval_lora.py` for evaluating LoRA model performance with additional LoRA weight handling.
- Updated eval.py to streamline model evaluation using vLLM and unsloth.
- Deleted eval.sh as its functionality is now integrated into eval.py.
- Updated .gitignore to exclude eval_logs directory.
- observation: model hallucniate the search result, docs about debugigng and adapting to r1 distil base model, notebooks on the detail of making training r1 distil works
- Break down rl_helpers into smaller modules
- Removed deprecated rl_helpers module to streamline the codebase.
- Enhance initial user prompt template inspired by Search-R1
- Added `train_autodidact_1B.py` for quick test.
- Update `00_worklog.md`, `dataset.md`, and `reward-functions.md` to reflect new training strategies and reward functions.
- Updated `00_worklog.md` to reflect optimizations for speed and quality in dataset generation.
- Introduced new documentation files: `choosing-llm-and-prompt-101.md`, `ds-pipeline-v0.md`, and `paraphrase-prompt.md` for better clarity on LLM choices and dataset pipeline.
- Added a Jupyter notebook `250324_generate_data_anatomy.ipynb` to explore the data generation process
- Added initial files from AutoDiact as starting point
- Enhanced `README.md` with project overview and setup instructions. .
- Removed `ugly_code_file.py` as part of cleanup.
- Added various documentation files and assets for project clarity.
- Included Jupyter notebooks for training and experimentation.