news 2026/5/9 19:06:30

PTO-ISA库开发者规则

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
PTO-ISA库开发者规则

This file lists some rules and limitations on the implementation of this library for pto-isa developers.

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

Not following the rules can result in any of the following:

  1. Can't compile (including source-code level compile errors and crash in compiler)
  2. Functionally incorrect (e.g., precision issues)
  3. Bad performance

1 - Remember thatpto::(Conv)Tile::data()returns vector type instead of pointer type in auto mode

The return type of.data()member function isTileDType, which is defined differently in manual vs auto mode. In manual mode this is simply a pointer, while in auto mode it's a vector type. See the details ininclude/pto/common/memory.hpp.

You should always keep this in mind to avoid using the returned value of.data()function directly as a pointer type outside tile functions.

2 - Avoid default initializer for a struct/class member

It's a very common practice to default-initialize data members in a struct or class in C++, for instance:

struct ConvTile { public: ... int shape[ConvTileDetail::MAX_CONVTILE_DIM] = {1}; };

This turns out to cause problems for the SROA pass in the compiler (SROA can't eliminate theAllocaInstof the struct plus all the load and store instructions associated with it). At least in auto mode, please DON'T default initialize the members:

#ifdef __PTO_AUTO__ // In auto mode, do not have default initialization in the class definition itself for its members int shape[ConvTileDetail::MAX_CONVTILE_DIM]; #else int shape[ConvTileDetail::MAX_CONVTILE_DIM] = {1}; #endif

Even though we are programming in C++, we encourage to use POD (Plain Old Data) Aggregate programming to describe structs and classes that is compatible with the C-programming language.

3 - Explicit synchronization is still needed inside tile functions and their callees

TL;DR:

  • Useset_flag,wait_flagorpipe_barrierexplicitly in tile functions and all of their callees.
  • UsePtoSetWaitFlagorTSYNCanywhere else.

Reason: The auto-sync will NOT traverse inside tile functions; as a matter of fact, the whole auto mode compiler works on the tile function level, meaning that everything inside tile function is a complete black box to auto-mode.

For this reason, if any synchronization is needed inside tile function, the library developers should still add synchronizations manually. That's why usingPtoSetWaitFlagandTSYNCwon't work in auto mode because it's no-op. Most of the cases this interface is used by kernel developers.

4 - Avoid usingTASSIGNfor implementation

Currently implementations of some pto instructions directly useTASSIGN_IMPL. This may be a problem for auto mode because it's no-op.

If you useTASSIGNjust to alias 2 tiles, you should useTRESHAPEorTSUBVIEWto achieve the same goal depending on your needs. Anything else won't work for auto mode.

For instance, if you callTASSIGNto allocate memory based on some kind of algorithm, this will never work for auto-mode because the compiler can't possibly recognize the specific algorithm logic and do the same allocation as you want to do in manual mode.

After all, the whole memory allocation in auto mode is based on each individual tile's liveness analysis, without knowing any other context. This is why the current implementation ofTPUSHandTPOPwon't work for auto mode.

5 - Some general rules for*_IMPLfunctions

Some consistency must be ensured for*_IMPLand tile function interface:

  • The function signature must havePTO_INTERNALmacro
  • Its implementation should directly call tile functions inside, don't call any non-tile functions unless they're inlined.
  • Always call.data()function to pass into tile functions, or return-by-reference for all return values of.data(). For example:
TExp(dstTile.data(), srcTile.data()); // correct auto dst = dstTile.data(); // wrong: return by value auto &src = srcTile.data(); // correct: return by reference TExp(dst, src);

6 - Some general rules for tile functions

  • Ensure to usetypename <...>::TileDTypeinstead oftypename <...>::DType *for tile types on tile function parameters
  • Ensure thesetypename <...>::TileDTypeparameters are pass-by-value, not by pointer or reference
  • Ensure__in__or__out__attributes are properly attached to thesetypename <...>::TileDTypeparameters
  • Always call__cce_get_tile_ptron thesetypename <...>::TileDTypearguments to get a tile's underlying buffer pointer
  • The return type should always bevoid. Otherwise the compiler's assumption about TF interface is broken and it's an undefined behavior. Please make all return values as pass-by-value arguments, even just for a single scalar.

7 - Avoid having runtime control flow before tile functions

Having runtime control flows imposes great challenges for auto-sync to work properly. We encourage developers to either try to remove these runtime conditions if they're not necessary or move them inside tile functions if possible.

Some examples includeTROWEXPANDDIV_IMPLandTMULS_IMPL.

【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/9 19:04:59

CANN/ops-math MaskedSelectV3算子

MaskedSelectV3 【免费下载链接】ops-math 本项目是CANN提供的数学类基础计算算子库&#xff0c;实现网络在NPU上加速计算。 项目地址: https://gitcode.com/cann/ops-math 产品支持情况 产品是否支持Ascend 950PR/Ascend 950DT√Atlas A3 训练系列产品/Atlas A3 推理系…

作者头像 李华
网站建设 2026/5/9 19:00:35

CANN ArgMax 算子 API 描述

ArgMax 算子 API 描述 【免费下载链接】cann-bench 评测AI在处理CANN领域代码任务的能力&#xff0c;涵盖算子生成、算子优化等领域&#xff0c;支撑模型选型、训练效果评估&#xff0c;统一量化评估标准&#xff0c;识别Agent能力短板&#xff0c;构建CANN领域评测平台&#x…

作者头像 李华
网站建设 2026/5/9 18:56:44

AI赋能建筑电气工程:从图纸审查到智慧运维的实战指南

1. 项目概述&#xff1a;当AI遇见建筑电气与电子工程如果你在建筑行业&#xff0c;特别是电气与电子工程领域摸爬滚打过几年&#xff0c;一定会对几个场景深有感触&#xff1a;图纸改了又改&#xff0c;现场管线打架&#xff0c;设备清单对不上&#xff0c;调试阶段问题百出&am…

作者头像 李华
网站建设 2026/5/9 18:55:34

CANN/metadef删除算子输入边API

DelInputWithCond 【免费下载链接】metadef Ascend Metadata Definition 项目地址: https://gitcode.com/cann/metadef 函数功能 根据算子属性&#xff0c;删除算子指定输入边。 函数原型 [!NOTE]说明 数据类型为string的接口后续版本会废弃&#xff0c;建议使用数据类…

作者头像 李华
网站建设 2026/5/9 18:55:06

用Prolog构建《权力的游戏》知识图谱与逻辑推理

1. 项目概述&#xff1a;当逻辑编程遇上奇幻剧集去年冬天重刷《权力的游戏》时&#xff0c;我突发奇想&#xff1a;能不能用这部剧的人物关系来学习Prolog&#xff1f;这个诞生于1972年的逻辑编程语言&#xff0c;其核心正是通过事实(Facts)和规则(Rules)描述世界。而维斯特洛大…

作者头像 李华
网站建设 2026/5/9 18:54:23

区块链跨链与Layer 2扩容:原理、选型与Web3应用实战

1. 项目概述&#xff1a;从孤岛到大陆的Web 3.0基建革命 如果你在2020年之前接触过DeFi&#xff0c;大概率体验过这样的场景&#xff1a;想把以太坊上的ETH拿到Polygon上去用&#xff0c;需要经历一个漫长且昂贵的过程——将ETH存入中心化交易所&#xff0c;等待确认&#xff…

作者头像 李华