You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2.8 KiB

Worklog

Backlog

yymmdd

  • task description

250324

  • @thinhlpg transfers the project to @bachvudinh

250323

  • Train the model
  • Make the dataset
  • Upload datasets to HF Hub - Initial dataset from AutoDidact - Paraphrased sdataset
  • Make a simple gradio demo app

250322

  • Moving all the scattered and disorganized stuffs that've been working on for the past week into this repo.
  • Write proposal for DeepSearch
  • As a new member of research team, i'm curious on how did we do GRPO with Alphamaze?, so that I can inherit the good stuff and improve the workflow!!!
    • Alphamaze?
    • https://www.menlo.ai/blog/alpha-maze
    • https://arxiv.org/pdf/2502.14669
    • Our training process involved two key stages: creating a specialized dataset and then using a combination of supervised fine-tuning (SFT) and reinforcement learning (RL) to train the model.

    • LLaMA-Factory for SFT (1.5B 6xA6000 1.5 hour) and Unsloth for GRPO
    • 💡 Hmm so for SFT we have 50% successful data and 50% retry data, and full successful data for GRPO. Can I also apply this to DeepSearch as well? #HACK

250321

250320

250319

250318