You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
141 lines
4.1 KiB
141 lines
4.1 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Cook vllm - sglang - test eval script\n",
|
|
"## OpenAI API for reference\n",
|
|
"- https://platform.openai.com/docs/api-reference/completions/create\n",
|
|
"- https://platform.openai.com/docs/api-reference/completions/create"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cook vllm\n",
|
|
"- https://docs.vllm.ai/en/latest/getting_started/examples/gradio_webserver.html\n",
|
|
"- https://docs.vllm.ai/en/latest/getting_started/examples/gradio_openai_chatbot_webserver.html\n",
|
|
"- https://docs.vllm.ai/en/latest/getting_started/examples/basic.html#gguf"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!vllm serve meta-llama/Llama-3.2-1B-Instruct --dtype auto"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cook sglang\n",
|
|
"- https://docs.sglang.ai/backend/server_arguments.html\n",
|
|
"- > lora_paths: You may provide a list of adapters to your model as a list. Each batch element will get model response with the corresponding lora adapter applied. Currently cuda_graph and radix_attention are not supported with this option so you need to disable them manually. We are still working on through these issues.\n",
|
|
"- > To enable multi-GPU data parallelism, add --dp 2. Data parallelism is better for throughput if there is enough memory."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# %pip install \"sglang[all]>=0.4.5\" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python -q\n",
|
|
"!python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-1B-Instruct --dp 2 --port 30000"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import requests\n",
|
|
"\n",
|
|
"url = f\"http://localhost:30000/v1/chat/completions\"\n",
|
|
"\n",
|
|
"data = {\n",
|
|
" \"model\": \"meta-llama/Llama-3.2-1B-Instruct\",\n",
|
|
" \"messages\": [{\"role\": \"user\", \"content\": \"Hello em an com chua?\"}],\n",
|
|
"}\n",
|
|
"\n",
|
|
"response = requests.post(url, json=data)\n",
|
|
"print(response.json())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Cook eval script"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Command to launch SGLang server\n",
|
|
"!python3 -m sglang.launch_server \\\n",
|
|
" --model-path \"janhq/250404-llama-3.2-3b-instruct-grpo-03-s250\" \\\n",
|
|
" --context-length 8192 \\\n",
|
|
" --enable-metrics \\\n",
|
|
" --dtype bfloat16 \\\n",
|
|
" --host 0.0.0.0 \\\n",
|
|
" --port 8002 \\\n",
|
|
" --trust-remote-code \\\n",
|
|
" --disable-overlap \\\n",
|
|
" --disable-radix-cache"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# cd scripts/evaluation\n",
|
|
"!python run_eval.py \\\n",
|
|
" --config_path eval_config.yaml \\\n",
|
|
" --method_name research \\\n",
|
|
" --data_dir {root/path/to/evaluation/data} \\\n",
|
|
" --dataset_name bamboogle \\\n",
|
|
" --split test \\\n",
|
|
" --save_dir {your-save-dir} \\\n",
|
|
" --save_note research_qwen7b_ins\n",
|
|
" --sgl_remote_url {your-launched-sgl-url} \\\n",
|
|
" --remote_retriever_url {your-hosted-retriever-url} \\\n",
|
|
" --generator_model {your-local-model-path} \\\n",
|
|
" --apply_chat True"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "deepsearch-py311",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.11"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|