You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

106 lines
4.0 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LLama 3.1 \n",
"\n",
"\"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\nCutting Knowledge Date: December 2023\\nToday Date: 26 Jul 2024\\n\\n<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\nHello, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\nI'm doing great. How can I help you today?<|eot_id|><|start_header_id|>system<|end_header_id|>\\n\\nYou are a friendly chatbot who always responds in the style of a pirate<|eot_id|>\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoTokenizer\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"meta-llama/meta-Llama-3.1-8B-Instruct\")\n",
"chat = [\n",
" {\"role\": \"user\", \"content\": \"Hello, how are you?\"},\n",
" {\"role\": \"assistant\", \"content\": \"I'm doing great. How can I help you today?\"},\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a friendly chatbot who always responds in the style of a pirate\",\n",
" },\n",
"]\n",
"\n",
"tokenizer.apply_chat_template(chat, tokenize=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Qwen 1.5B R1 Distill\n",
"\n",
"\"<begin▁of▁sentence>You are a friendly chatbot who always responds in the style of a pirate<User>Hello, how are you?<Assistant>I'm doing great. How can I help you today?<end▁of▁sentence>\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoTokenizer\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B\")\n",
"chat = [\n",
" {\"role\": \"user\", \"content\": \"Hello, how are you?\"},\n",
" {\"role\": \"assistant\", \"content\": \"I'm doing great. How can I help you today?\"},\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"You are a friendly chatbot who always responds in the style of a pirate\",\n",
" },\n",
"]\n",
"\n",
"tokenizer.apply_chat_template(chat, tokenize=False)\n",
"tokenizer.apply_chat_template(chat, tokenize=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ✅ Compare the two\n",
"\n",
"- \"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\nCutting Knowledge Date: December 2023\\nToday Date: 26 Jul 2024\\n\\n<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\n**Hello, how are you?**<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n**I'm doing great. How can I help you today**?<|eot_id|><|start_header_id|>system<|end_header_id|>\\n\\n**You are a friendly chatbot who always responds in the style of a pirate**<|eot_id|>\"\n",
"- \"<begin▁of▁sentence>**You are a friendly chatbot who always responds in the style of a pirate**<User>**Hello, how are you?**<Assistant>**I'm doing great. How can I help you today?**<end▁of▁sentence>\"\n",
"- Ok make sense now!, so the structure of r1-distil doesn't have closing tags for most of the tags\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Just curious, did Alpha Maze touch anything with the chat template?\n",
"- Nope, as alpha maze task isn't as complicated as agent tool call stuffs, so it doesn't need to tweak the chat template"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "deepsearch-py311",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}