ReZero-Search-LLM-Agent-Fork/notebooks/Llama_3_1_8b_2x_faster_infe...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IqM-T1RTzY6C"
   },
   "source": [
    "To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n",
    "<div class=\"align-center\">\n",
    "  <a href=\"https://github.com/unslothai/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
    "  <a href=\"https://discord.gg/u54VK8m8tk\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord button.png\" width=\"145\"></a>\n",
    "  <a href=\"https://ko-fi.com/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png\" width=\"145\"></a></a> Join Discord if you need help + support us if you can!\n",
    "</div>\n",
    "\n",
    "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "2eSvM9zX_2d3"
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "# Installs Unsloth, Xformers (Flash Attention) and all other packages!\n",
    "!pip install unsloth\n",
    "# Get latest Unsloth\n",
    "!pip install --upgrade --no-deps \"unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "r2v_X2fA0Df5"
   },
   "source": [
    "If you want to finetune Llama-3 2x faster and use 70% less VRAM, go to our [finetuning notebook](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing)!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 371,
     "referenced_widgets": [
      "7ca2facea2414549ab2a0a3fd07a0723",
      "78676eb3a75f45ed860b488a42fb9bc5",
      "b69b75dd3b5745f4a7f869f7ba6593d4",
      "ea10da0d677849c28bfdfc9206ac1b73",
      "493f6a1af5e94b5eb88a4935361a88c3",
      "1a4ef35565f54388afdf1839055eb5c8",
      "07ae7e9f6861403ea8064a98f4063754",
      "fc2ce36bb32f4c648ccb06d65a4e2aea",
      "bf44ec9413b24dda932f575ebe32b222",
      "2ad28d8e556b4e17b0859aa3683d31c6",
      "e32220b1d3f14386834bcd3736ac9888",
      "d9c5466214ed4f06876a89c64c29048f",
      "1e0c170f075b43ad9e2d39fcdbaa0a06",
      "00abd0bd091f4258b7f61d80ecb3118e",
      "0f2c8641a38043d2a218ec0809e7a7d3",
      "948972fa1a2e400097b590ad47ff6f3d",
      "47a13a9b02ad4181b74cc1f5ddeb3f3f",
      "7c01111484b14a5999d48393085b0264",
      "fb1a288281fc446788fdd0f432b4732c",
      "2f026e4c1ff34aafbf7c5deff6c1e66f",
      "acc52213c033476885ee30fb92f55e23",
      "9caec4d2cb6e4a72b7155912787d3139",
      "2abdab3fdd4f4a70a8faaa7b0e39f811",
      "4d432e4349964027bae76c34f5734fcf",
      "630c170c04494811b21992d6886a3015",
      "cb9a92b76a1d4b189e7ed783bde524c9",
      "ccc49354213c4c1394855986a0e07483",
      "7bf239e457f04e15b4b0e458f0dd1e19",
      "f5c6558ef929438eb117159de0285a58",
      "447aa145c09c4c6f8bd9ce3ea8b5cd44",
      "9c82690bd07346d1ab875bdf5ed80154",
      "5200dae3ba2346c1bca27530b82b38b4",
      "c7a3b0cba7ee49ba929e0038d9c03e98",
      "36250ac65fcb4756a4368f98902a1617",
      "705a1673de934cce8418f7893fbf780a",
      "1f7ed3fc3a824e119cf559af8bc7c370",
      "cafc86be3bd94bc79cb07bc94461ec5d",
      "0c190d903a38402ab452de0a0100a8eb",
      "b3362bda7942432692049558d0cdf14e",
      "4afb54f2995447e68ae3a8129325ffdc",
      "31c7337bfd504409992b7e04b713c243",
      "9b5a45647bd74e8c878cf6ba43b27190",
      "b9c9010af82b4678af43a20546f3ad49",
      "9b818c8e8bac40edb97abf8a266ab28e",
      "ebaa8d22074d495ea675f51dc2d5a4d6",
      "8fecfa968d55466ba8f42730d32bd9b4",
      "dd0e8e4cb46945df8dbec4fefd720358",
      "60a80b3316c14ce79945e06cb190be39",
      "4f0e06a2e2204dad9f29a1697edd4218",
      "66527058da914c73b48221011a2b605a",
      "58413234f68b484693f0719166942214",
      "84b3814386ee4d04b9817f0da2108725",
      "01ede422dc5f4d4da7f913880db4984d",
      "2e9f87f895344785855cca39bb64d5e2",
      "5512e17f3864445a875e640937f296ef",
      "932920c0ee834e3e8ffe53ebf7833fa7",
      "cb65c93ecef84c2c947c67481326691c",
      "73c3f04df8044f34ab45b92c174e121e",
      "13b8be4a5af14bfdbbce24d185187906",
      "21d8ebf7cbfd4e2b9db111d163aad8b8",
      "3122a97298124d0ab0e6297960c132e2",
      "cc829a40832849f586b077f6d8e5795b",
      "3f5a8b6d049244959c1558c56483f643",
      "11fea1d7e4a8426d98165fa5ca7fc650",
      "fc3715b03c5249438965886892ba07bb",
      "05d6f6c3efed4ecf9893c8aa86fe4176"
     ]
    },
    "id": "QmUBVEnvCDJv",
    "outputId": "14ca09b9-e1ff-4f91-b98c-7a1ed22eef3a"
   },
   "outputs": [],
   "source": [
    "from unsloth import FastLanguageModel\n",
    "\n",
    "# 4bit pre quantized models we support for 4x faster downloading + no OOMs.\n",
    "fourbit_models = [\n",
    "    \"unsloth/mistral-7b-instruct-v0.2-bnb-4bit\",\n",
    "    \"unsloth/gemma-7b-it-bnb-4bit\",\n",
    "] # More models at https://huggingface.co/unsloth\n",
    "\n",
    "model, tokenizer = FastLanguageModel.from_pretrained(\n",
    "    model_name = \"unsloth/Meta-Llama-3.1-8B-Instruct\",\n",
    "    max_seq_length = 8192,\n",
    "    load_in_4bit = True,\n",
    "    # token = \"hf_...\", # use one if using gated models like meta-llama/Llama-2-7b-hf\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Yvwf0OEtchS_"
   },
   "outputs": [],
   "source": [
    "from transformers import TextStreamer\n",
    "from unsloth.chat_templates import get_chat_template\n",
    "tokenizer = get_chat_template(\n",
    "    tokenizer,\n",
    "    chat_template = \"llama-3.1\",\n",
    "    mapping = {\"role\" : \"from\", \"content\" : \"value\", \"user\" : \"human\", \"assistant\" : \"gpt\"}, # ShareGPT style\n",
    ")\n",
    "FastLanguageModel.for_inference(model) # Enable native 2x faster inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "qTRSvafleB2Q"
   },
   "source": [
    "Change the \"value\" part to call the model!\n",
    "\n",
    "Unsloth makes inference natively 2x faster!! No need to change or do anything!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "e2pEuRb1r2Vg",
    "outputId": "5cfe649f-cd72-469d-b73e-88748df49690"
   },
   "outputs": [],
   "source": [
    "messages = [\n",
    "                               # EDIT HERE!\n",
    "    {\"from\": \"human\", \"value\": \"Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,\"},\n",
    "]\n",
    "inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = \"pt\").to(\"cuda\")\n",
    "\n",
    "text_streamer = TextStreamer(tokenizer)\n",
    "_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1024, use_cache = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "yo7qGcu-cqCW",
    "outputId": "9959df0e-e04b-4af0-c624-d640f896f441"
   },
   "outputs": [],
   "source": [
    "messages = [\n",
    "    {\"from\": \"human\", \"value\": \"Describe the tallest tower in the world.\"},\n",
    "]\n",
    "inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = \"pt\").to(\"cuda\")\n",
    "\n",
    "text_streamer = TextStreamer(tokenizer)\n",
    "_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1024, use_cache = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xmQcutr7cxmc",
    "outputId": "2cb3f9af-7db6-40cf-9b21-e67a3f841adb"
   },
   "outputs": [],
   "source": [
    "messages = [\n",
    "    {\"from\": \"human\", \"value\": \"What is Unsloth?\"},\n",
    "]\n",
    "inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = \"pt\").to(\"cuda\")\n",
    "\n",
    "text_streamer = TextStreamer(tokenizer)\n",
    "_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 1024, use_cache = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Zt9CHJqO6p30"
   },
   "source": [
    "And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n",
    "\n",
    "If you want to finetune Llama-3 2x faster and use 70% less VRAM, go to our [finetuning notebook](https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing)!\n",
    "\n",
    "Some other links:\n",
    "1. Zephyr DPO 2x faster [free Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing)\n",
    "2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)\n",
    "3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)\n",
    "4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)\n",
    "5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)\n",
    "6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!\n",
    "7. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)\n",
    "9. Gemma 6 trillion tokens is 2.5x faster! [free Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)\n",
    "\n",
    "<div class=\"align-center\">\n",
    "  <a href=\"https://github.com/unslothai/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
    "  <a href=\"https://discord.gg/u54VK8m8tk\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord.png\" width=\"145\"></a>\n",
    "  <a href=\"https://ko-fi.com/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png\" width=\"145\"></a></a> Support our work if you can! Thanks!\n",
    "</div>"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}