news 2026/4/24 11:30:29

华为昇腾310P废物利用——大模型推理服务

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
华为昇腾310P废物利用——大模型推理服务

华为昇腾310P废物利用

注:310P不支持bf16、W4A4
带宽200G,双芯版的300I duo, 有48g和96g两种
目前市面上所有昇腾的卡均不支持FP8

最终性能优化结果:
Qwen3-8B-W8A8
TPS :15Tokens/s

昇腾的PyTorch图模式使用和vllm-ascend的源码,里面有reduce-overhead和max-autotune两种模式,reduce-overhead只支持910B和910C,而且vllm-ascend里面写死了reduce-overhead模式

MindIE + Qwen 3-8B-W8A8

1. Launch the container on thehostdockerrun-it-d--net=host --shm-size=16g\--namemindie-qwen3-8b-310p\-w/workspace/MindIE-LLM/examples/atb_models\--device=/dev/davinci0:rwm\--device=/dev/davinci1:rwm\--device=/dev/davinci2:rwm\--device=/dev/davinci3:rwm\--device=/dev/davinci_manager:rwm\--device=/dev/hisi_hdc:rwm\--device=/dev/devmm_svm:rwm\-v/usr/local/Ascend/driver:/usr/local/Ascend/driver:ro\-v/usr/local/dcmi:/usr/local/dcmi:ro\-v/usr/local/bin/npu-smi:/usr/local/bin/npu-smi:ro\-v/usr/local/sbin:/usr/local/sbin:ro\-v/Users/zhaojiacheng/repos/MindIE-LLM:/workspace/MindIE-LLM\-v/home/s_zhaojiacheng:/home/s_zhaojiacheng\swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:3.0.0b2-300I-Duo-py311-openeuler24.03-lts\bashEnter the container:dockerexec-itmindie-qwen3-8b-310pbash2. Prepare the environment inside the containercd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh prepare-env3. Download the model from ModelScope Recommended: download directly into a normal directory, not only into the default cache.mkdir-p/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s modelscope download\--modelEco-Tech/Qwen3-8B-w8a8s-310\--local_dir/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s If you already downloaded it earlier into the default cache with: modelscope download--modelEco-Tech/Qwen3-8B-w8a8s-310thenflatten it into a real directory first:mkdir-p/home/s_zhaojiacheng/models/Qwen3-8B-w8a8scp-aL\/home/s_zhaojiacheng/.cache/modelscope/hub/models/Eco-Tech/Qwen3-8B-w8a8s-310/.\/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s/ Check the files exist:ls/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s4. Compress W8A8S into W8A8SCcd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh compress\--w8a8s-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8s\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc After it finishes, check the output directory exists:ls/home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc5. Start the OpenAI-compatible servercd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh serve\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc\--model-name qwen3-8b-w8a8sc\--port1025This should start mindie_llm_server and expose the OpenAI-compatible endpoint on127.0.0.1:1025.6. Verify theserviceList models: curlhttp://127.0.0.1:1025/v1/models Expected model id: qwen3-8b-w8a8sc Test one inference request: curlhttp://127.0.0.1:1025/v1/chat/completions\-H'Content-Type: application/json'\-d'{ "model": "qwen3-8b-w8a8sc", "messages": [ {"role": "user", "content": "What is deep learning?"} ], "max_tokens": 128, "stream": false }'Short version If you want the shortest working sequence inside the container:cd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh prepare-env modelscope download\--modelEco-Tech/Qwen3-8B-w8a8s-310\--local_dir/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s scripts/qwen3_8b_310p_w8a8sc.sh compress\--w8a8s-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8s\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc scripts/qwen3_8b_310p_w8a8sc.sh serve\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc\--model-name qwen3-8b-w8a8sc\--port1025Then test: curlhttp://127.0.0.1:1025/v1/models One important detail:forthis single-310P flow,donot try to serve Qwen3-8B-w8a8s-310 directly. The supported path is download W8A8S ->compress to W8A8SC ->serve W8A8SC. If you want, I can also rewrite this into one clean host-sidebashscript that doesdockerrun,dockerexec, download, compress, and serve end to end.
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/4/24 11:28:27

形态计算与软体机器人的生物启发原理及应用

1. 形态计算与软体机器人的生物启发原理形态计算(Morphological Computation)的核心思想是将计算任务"卸载"到物理结构本身。这个概念最早由Pfeifer和Iida在2005年提出,他们观察到生物系统(如章鱼触手)通过形…

作者头像 李华
网站建设 2026/4/24 11:27:31

3步掌握网盘直链下载的终极方案:告别限速的浏览器魔法

3步掌握网盘直链下载的终极方案:告别限速的浏览器魔法 【免费下载链接】baiduyun 油猴脚本 - 一个免费开源的网盘下载助手 项目地址: https://gitcode.com/gh_mirrors/ba/baiduyun 还在为网盘下载速度慢、必须安装客户端而烦恼吗?网盘直链下载助手…

作者头像 李华
网站建设 2026/4/24 11:25:19

Blender3mfFormat终极指南:5分钟掌握专业3D打印格式转换

Blender3mfFormat终极指南:5分钟掌握专业3D打印格式转换 【免费下载链接】Blender3mfFormat Blender add-on to import/export 3MF files 项目地址: https://gitcode.com/gh_mirrors/bl/Blender3mfFormat Blender3mfFormat 是Blender生态系统中一个至关重要的…

作者头像 李华