华为昇腾310P废物利用——大模型推理服务-编程阁

华为昇腾310P废物利用

注：310P不支持bf16、W4A4
带宽200G，双芯版的300I duo, 有48g和96g两种
目前市面上所有昇腾的卡均不支持FP8

最终性能优化结果：
Qwen3-8B-W8A8
TPS ：15Tokens/s

昇腾的PyTorch图模式使用和vllm-ascend的源码，里面有reduce-overhead和max-autotune两种模式，reduce-overhead只支持910B和910C，而且vllm-ascend里面写死了reduce-overhead模式

MindIE + Qwen 3-8B-W8A8

1. Launch the container on thehostdockerrun-it-d--net=host --shm-size=16g\--namemindie-qwen3-8b-310p\-w/workspace/MindIE-LLM/examples/atb_models\--device=/dev/davinci0:rwm\--device=/dev/davinci1:rwm\--device=/dev/davinci2:rwm\--device=/dev/davinci3:rwm\--device=/dev/davinci_manager:rwm\--device=/dev/hisi_hdc:rwm\--device=/dev/devmm_svm:rwm\-v/usr/local/Ascend/driver:/usr/local/Ascend/driver:ro\-v/usr/local/dcmi:/usr/local/dcmi:ro\-v/usr/local/bin/npu-smi:/usr/local/bin/npu-smi:ro\-v/usr/local/sbin:/usr/local/sbin:ro\-v/Users/zhaojiacheng/repos/MindIE-LLM:/workspace/MindIE-LLM\-v/home/s_zhaojiacheng:/home/s_zhaojiacheng\swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:3.0.0b2-300I-Duo-py311-openeuler24.03-lts\bashEnter the container:dockerexec-itmindie-qwen3-8b-310pbash2. Prepare the environment inside the containercd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh prepare-env3. Download the model from ModelScope Recommended: download directly into a normal directory, not only into the default cache.mkdir-p/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s modelscope download\--modelEco-Tech/Qwen3-8B-w8a8s-310\--local_dir/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s If you already downloaded it earlier into the default cache with: modelscope download--modelEco-Tech/Qwen3-8B-w8a8s-310thenflatten it into a real directory first:mkdir-p/home/s_zhaojiacheng/models/Qwen3-8B-w8a8scp-aL\/home/s_zhaojiacheng/.cache/modelscope/hub/models/Eco-Tech/Qwen3-8B-w8a8s-310/.\/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s/ Check the files exist:ls/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s4. Compress W8A8S into W8A8SCcd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh compress\--w8a8s-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8s\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc After it finishes, check the output directory exists:ls/home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc5. Start the OpenAI-compatible servercd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh serve\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc\--model-name qwen3-8b-w8a8sc\--port1025This should start mindie_llm_server and expose the OpenAI-compatible endpoint on127.0.0.1:1025.6. Verify theserviceList models: curlhttp://127.0.0.1:1025/v1/models Expected model id: qwen3-8b-w8a8sc Test one inference request: curlhttp://127.0.0.1:1025/v1/chat/completions\-H'Content-Type: application/json'\-d'{ "model": "qwen3-8b-w8a8sc", "messages": [ {"role": "user", "content": "What is deep learning?"} ], "max_tokens": 128, "stream": false }'Short version If you want the shortest working sequence inside the container:cd/workspace/MindIE-LLM scripts/qwen3_8b_310p_w8a8sc.sh prepare-env modelscope download\--modelEco-Tech/Qwen3-8B-w8a8s-310\--local_dir/home/s_zhaojiacheng/models/Qwen3-8B-w8a8s scripts/qwen3_8b_310p_w8a8sc.sh compress\--w8a8s-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8s\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc scripts/qwen3_8b_310p_w8a8sc.sh serve\--w8a8sc-weight /home/s_zhaojiacheng/models/Qwen3-8B-w8a8sc\--model-name qwen3-8b-w8a8sc\--port1025Then test: curlhttp://127.0.0.1:1025/v1/models One important detail:forthis single-310P flow,donot try to serve Qwen3-8B-w8a8s-310 directly. The supported path is download W8A8S ->compress to W8A8SC ->serve W8A8SC. If you want, I can also rewrite this into one clean host-sidebashscript that doesdockerrun,dockerexec, download, compress, and serve end to end.

形态计算与软体机器人的生物启发原理及应用

1. 形态计算与软体机器人的生物启发原理形态计算（Morphological Computation）的核心思想是将计算任务"卸载"到物理结构本身。这个概念最早由Pfeifer和Iida在2005年提出，他们观察到生物系统（如章鱼触手）通过形…

李华

Qt实战：用QTableWidget+QSS快速构建一个高颜值数据表格（支持深色模式）

Qt实战：用QTableWidgetQSS打造高颜值数据表格（支持深色模式） 在数据密集型应用中，表格控件的视觉表现直接影响用户的操作效率和体验满意度。Qt框架中的QTableWidget作为经典的数据展示组件，其默认外观往往难以满足现代…

李华

3步掌握网盘直链下载的终极方案：告别限速的浏览器魔法

3步掌握网盘直链下载的终极方案：告别限速的浏览器魔法【免费下载链接】baiduyun 油猴脚本 - 一个免费开源的网盘下载助手项目地址: https://gitcode.com/gh_mirrors/ba/baiduyun 还在为网盘下载速度慢、必须安装客户端而烦恼吗？网盘直链下载助手…

李华

Blender3mfFormat终极指南：5分钟掌握专业3D打印格式转换

Blender3mfFormat终极指南：5分钟掌握专业3D打印格式转换【免费下载链接】Blender3mfFormat Blender add-on to import/export 3MF files 项目地址: https://gitcode.com/gh_mirrors/bl/Blender3mfFormat Blender3mfFormat 是Blender生态系统中一个至关重要的…

李华

保姆级教程：手把手教你用mmWave Studio 2.1.1配置IWR6843雷达参数并采集数据到本地

毫米波雷达开发实战：IWR6843DCA1000EVM数据采集全流程解析毫米波雷达技术正在工业检测、智能交通和安防监控等领域快速普及。作为TI毫米波传感器家族中的明星产品，IWR6843凭借其60-64GHz工作频段和集成化的单芯片设计，成为众多开发者的首选。…

李华

别再手动写Arduino代码了！用LabVIEW图形化编程，10分钟搞定温湿度传感器数据采集

用LabVIEW图形化编程快速实现Arduino温湿度监测当创客们第一次接触Arduino开发时，往往会被繁琐的代码编写和调试过程劝退。想象一下这样的场景：你刚拿到心仪的DHT11温湿度传感器，迫不及待想看看实时数据，却要花几个小时研究数据手…

李华