- observation: model hallucniate the search result, docs about debugigng and adapting to r1 distil base model, notebooks on the detail of making training r1 distil works
- Added `train_autodidact_1B.py` for quick test.
- Update `00_worklog.md`, `dataset.md`, and `reward-functions.md` to reflect new training strategies and reward functions.
- Added initial files from AutoDiact as starting point
- Enhanced `README.md` with project overview and setup instructions. .
- Removed `ugly_code_file.py` as part of cleanup.
- Added various documentation files and assets for project clarity.
- Included Jupyter notebooks for training and experimentation.