- [ ] @thinhlpg transfers the project to @bachvudinh
- [ ] Modify `generate_dataset.py` (**ONLY AFTER** the simple training and benchmark works):
- [ ] Modify `generate_dataset.py` (**ONLY AFTER** the simple training and benchmark works):
- [ ] As a data dataset maker, I want to change from LLama 3.1 8B to API call, like claude, gemini or openai. Originally they use 3.1 8B for `Self-Bootstrapping` demonstration, but the dataset quality is low, for sure.
- [ ] As a data dataset maker, I want to change from LLama 3.1 8B to API call, like claude, gemini or openai. Originally they use 3.1 8B for `Self-Bootstrapping` demonstration, but the dataset quality is low, for sure.
- [ ] Experimenting with different chunking strategies
- [ ] Experimenting with different chunking strategies
@ -19,17 +20,17 @@
## 250324
## 250324
- [ ] @thinhlpg transfers the project to @bachvudinh
- [ ] Train the model v0
- [ ] Make the dataset v0
## 250323
- [ ] Upload dataset v0 to HF Hub
- [ ] Train the model
- [ ] Make the dataset
- [ ] Upload datasets to HF Hub
- Initial dataset from AutoDidact
- Initial dataset from AutoDidact
- Paraphrased sdataset
- Paraphrased sdataset
- [ ] Make a simple gradio demo app
- [ ] Make a simple gradio demo app
## 250323
- brain.exe and back.exe refused to work 😭
## 250322
## 250322
- [x] Moving all the scattered and disorganized stuffs that've been working on for the past week into this repo.
- [x] Moving all the scattered and disorganized stuffs that've been working on for the past week into this repo.