Phi-3.5-mini-instruct代码实例：Python调用vLLM API+Chainlit前端示例-编程阁

Phi-3.5-mini-instruct代码实例：Python调用vLLM API+Chainlit前端示例

1. 模型简介

Phi-3.5-mini 是一个轻量级的开放模型，属于 Phi-3 模型家族。它基于高质量的数据集构建，包括合成数据和经过筛选的公开网站数据，特别关注推理密集型任务。该模型支持长达128K令牌的上下文长度，并经过严格的训练过程：

监督微调（Supervised Fine-Tuning）
近端策略优化（Proximal Policy Optimization）
直接偏好优化（Direct Preference Optimization）

这些训练方法确保了模型能够精确遵循指令，同时具备强大的安全措施。

2. 环境准备

2.1 验证模型部署

在开始使用前，我们需要确认模型服务已成功部署。可以通过以下命令检查日志：

cat /root/workspace/llm.log

如果看到类似下面的输出，表示模型已成功加载：

[INFO] Model loaded successfully [INFO] vLLM API server started on port 8000

2.2 安装必要依赖

确保已安装以下Python包：

pip install chainlit requests

3. Python调用vLLM API

3.1 基础API调用示例

下面是一个简单的Python脚本，演示如何通过vLLM API调用Phi-3.5-mini-instruct模型：

import requests import json def generate_text(prompt, max_tokens=200): url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} data = { "model": "phi-3.5-mini-instruct", "prompt": prompt, "max_tokens": max_tokens, "temperature": 0.7 } response = requests.post(url, headers=headers, data=json.dumps(data)) return response.json() # 示例调用 result = generate_text("请用简单的语言解释量子计算") print(result["choices"][0]["text"])

3.2 流式响应处理

对于长文本生成，可以使用流式响应来提高用户体验：

def generate_text_stream(prompt, max_tokens=200): url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} data = { "model": "phi-3.5-mini-instruct", "prompt": prompt, "max_tokens": max_tokens, "temperature": 0.7, "stream": True } with requests.post(url, headers=headers, data=json.dumps(data), stream=True) as response: for chunk in response.iter_lines(): if chunk: decoded_chunk = chunk.decode('utf-8') if decoded_chunk.startswith('data:'): data = json.loads(decoded_chunk[5:]) yield data["choices"][0]["text"] # 示例调用 for text in generate_text_stream("写一篇关于人工智能的短文"): print(text, end="", flush=True)

4. Chainlit前端集成

4.1 基础Chainlit应用

创建一个简单的Chainlit应用来与模型交互：

import chainlit as cl import requests import json @cl.on_message async def main(message: cl.Message): # 调用vLLM API url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} data = { "model": "phi-3.5-mini-instruct", "prompt": message.content, "max_tokens": 500, "temperature": 0.7 } response = requests.post(url, headers=headers, data=json.dumps(data)) result = response.json() # 发送响应 await cl.Message(content=result["choices"][0]["text"]).send()

4.2 增强版Chainlit应用

添加更多交互功能和更好的用户体验：

import chainlit as cl import requests import json from typing import Optional @cl.on_chat_start async def start_chat(): settings = await cl.ChatSettings( [ cl.input_widget.Slider( id="temperature", label="Temperature", initial=0.7, min=0, max=1, step=0.1 ), cl.input_widget.Slider( id="max_tokens", label="Max Tokens", initial=500, min=100, max=2000, step=100 ) ] ).send() @cl.on_message async def main(message: cl.Message): # 获取用户设置 settings = cl.user_session.get("settings") # 创建消息元素 msg = cl.Message(content="") await msg.send() # 调用vLLM API url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} data = { "model": "phi-3.5-mini-instruct", "prompt": message.content, "max_tokens": settings["max_tokens"] if settings else 500, "temperature": settings["temperature"] if settings else 0.7, "stream": True } full_response = "" with requests.post(url, headers=headers, data=json.dumps(data), stream=True) as response: for chunk in response.iter_lines(): if chunk: decoded_chunk = chunk.decode('utf-8') if decoded_chunk.startswith('data:'): data = json.loads(decoded_chunk[5:]) token = data["choices"][0]["text"] full_response += token await msg.stream_token(token) await msg.update()

5. 实际应用示例

5.1 代码解释器

创建一个能够解释代码的Chainlit应用：

import chainlit as cl import requests import json SYSTEM_PROMPT = """你是一个专业的代码解释器。用户会提供一段代码，你需要： 1. 解释代码的功能 2. 指出可能的改进点 3. 提供优化建议 4. 用简单的语言说明复杂概念""" @cl.on_message async def explain_code(message: cl.Message): prompt = f"{SYSTEM_PROMPT}\n\n请解释以下代码：\n```\n{message.content}\n```" # 调用vLLM API url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} data = { "model": "phi-3.5-mini-instruct", "prompt": prompt, "max_tokens": 800, "temperature": 0.5 } response = requests.post(url, headers=headers, data=json.dumps(data)) result = response.json() # 发送响应 await cl.Message(content=result["choices"][0]["text"]).send()

5.2 文档生成器

创建一个自动生成文档的应用：

import chainlit as cl import requests import json @cl.on_message async def generate_docs(message: cl.Message): prompt = f"""根据以下需求生成详细的技术文档： 需求描述： {message.content} 文档要求： 1. 包含概述、功能说明、使用示例三部分 2. 使用Markdown格式 3. 代码示例要完整可运行 4. 语言简洁专业""" # 调用vLLM API url = "http://localhost:8000/v1/completions" headers = {"Content-Type": "application/json"} data = { "model": "phi-3.5-mini-instruct", "prompt": prompt, "max_tokens": 1000, "temperature": 0.6 } response = requests.post(url, headers=headers, data=json.dumps(data)) result = response.json() # 发送响应 await cl.Message(content=result["choices"][0]["text"]).send()