news 2026/5/9 12:58:35

CANNBot向量掩码约束

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
CANNBot向量掩码约束

Vector Mask Constraints

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

Read this file when implementing or debugging A2 vec mask behavior in the simulator.

Goal

Keep mask semantics explicit so that:

  • every vec op either clearly consumes the current mask or clearly ignores it
  • set_mask/reset_maskstate is reproducible across tests
  • masked arithmetic matches the repository's agreed behavior instead of ad-hoc guesses

1. Current mask state

Each vec lane/runtime owns onevector_mask: uint8[256].

Default state:

  • all 256 entries are1

Interpretation:

  • one mask slot corresponds to one logical vec element
  • the active prefix depends on dtype / logical repeat payload size

Typical active prefix lengths:

  • float/int32:mask[0:64]
  • half/int16:mask[0:128]
  • int8/uint8:mask[0:256]

Current simulator behavior:

  • each repeat reuses the same mask prefix for the dtype
  • the mask doesnotadvance per repeat

2.set_mask/reset_mask

Logical mask parts

Instruction semantics:

  • lowandhighare treated asuint64
  • bitioflowwritesmask[i]
  • bitiofhighwritesmask[64 + i]
  • mask[128:256]stays unchanged byset_mask

Current bit order:

  • bit0-> lowest mask slot in the covered range
  • bit63-> highest slot in the covered range

Stub call shape

Current a2 stub API is:

  • set_mask(mask_high, mask_low)

So the emitted instruction swaps the two call arguments:

  • set_mask(hi, lo)emits instruction fieldshigh=hi,low=lo

This is why many call sites that only want the low 64-bit prefix use:

  • set_mask(0, low_mask)

Validated repository test:

  • testcases/simulator/micro/test_simulator_v2_muladddst_mask.py
  • testcases/simulator/micro/test_simulator_v2_vec_ops_extended.py

reset_mask()

  • resets the fullmask[0:256]back to1

3. Ops that consume the current mask

3.1 Unary / binary / unaryscalar

These ops consume mask atdst writeback.

Rule:

  • mask == 1-> write the newly computed result
  • mask == 0-> keep the olddstvalue unchanged

This currently applies to:

Unary
  • exp
  • ln
  • abs
  • rec
  • sqrt
  • rsqrt
  • vnot
  • relu
Binary
  • add
  • sub
  • mul
  • div
  • vmax
  • vmin
  • vand
  • vor
  • muladddst
Unaryscalar
  • adds
  • muls
  • vmaxs
  • vmins
  • lrelu
  • axpy
Dup
  • dup
Cast
  • cast

Additional cast rule:

  • the active cast-element domain is determined by thewiderofsrc/dstdtypes
  • for example,float <-> halfuses64mask slots per full repeat, not128

3.2 Additive group reductions

For these ops, mask doesnotpreserve olddst. Instead,mask == 0means the correspondingsrc slot contributes0.

This currently applies to:

  • cadd
  • cgadd
  • cpadd

Detailed behavior:

  • cadd: masked-off elements do not contribute to the full-repeat sum; ifalllanes in the active prefix are masked off, the destination scalar isnot overwritten
  • cgadd: masked-off elements do not contribute to their block-local sum; if one block's mask prefix is entirely zero, that block's destination scalar isnot overwritten
  • cpadd: masked-off elements are zeroed before flat-stream pairwise add

3.3 Max/min reductions with masked infinities

For these ops, masked-off source slots do not preserve olddstdirectly. Instead they are replaced with an extreme sentinel before reduction.

Max family

Masked-off slots act like-inf.

This currently applies to:

  • cmax
  • cgmax

Detailed behavior:

  • cmax: masked-off elements are replaced by-inf; ifalllanes in the active prefix are masked off, the destination scalar isnot overwritten
  • cgmax: masked-off elements are replaced by-inf; if one block's mask prefix is entirely zero, that block's destination scalar isnot overwritten
Min family

Masked-off slots act like+inf.

This currently applies to:

  • cmin
  • cgmin

Detailed behavior:

  • cmin: masked-off elements are replaced by+inf; ifalllanes in the active prefix are masked off, the destination scalar isnot overwritten
  • cgmin: masked-off elements are replaced by+inf; if one block's mask prefix is entirely zero, that block's destination scalar isnot overwritten

4. Ops that explicitly do NOT consume the current mask

These ops currently ignore the vec mask entirely and follow only their own normal semantics:

  • select
  • compare
  • compare_scalar
  • gather
  • scatter
  • sort32
  • mergesort4
  • mergesort_2seq
  • brcb

For the current V2 path,compare(...)/compare_scalar(...)andselect(...)also use their own explicit control tensor semantics:

  • the current vec mask fromset_mask(...)is irrelevant to them
  • when the control tensor is a packed-bituint8mask, the shape is[..., N // 8], not an expanded[..., N]byte-per-element mask
  • practical example: packed causal control for a[64, 64]half-tile isTensor(DT.uint8, [64, 8], Position.UB)

5. Deprecated / pending pieces

Deprecated

  • set_cmpmask
    • deprecated because its role overlaps withcompare()/compare_scalar()result generation
    • kept only for compatibility, not for simulator feature growth

Pending / not yet decided

Mask semantics are still not fixed for:

  • cast

If you touch new undecided ops later, document the decision here first.

6. Implementation rules

When adding mask support to a new vec op, decide which of these two patterns it belongs to:

Pattern A: mask gates writeback

Use for elementwise ops.

Meaning:

  • compute the full result
  • write only where mask is1
  • preserve olddstwhere mask is0

Pattern B: mask zeroes src contribution

Use for additive reductions.

Meaning:

  • masked-offsrcslots act like0
  • reduction result is still written normally

Do not mix the two patterns casually.

7. Validation checklist

When testing mask behavior:

  • verify default mask is all ones
  • verifyset_maskchanges onlymask[0:128]
  • verifyreset_maskrestores full ones
  • for writeback-gated ops, confirm masked-off lanes keep olddst
  • for additive reductions, confirm masked-off lanes contribute zero instead of preserving olddst

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/9 12:58:09

CANN/ops-tensor张量算子库

ops-tensor 【免费下载链接】ops-tensor ops-tensor 是 CANN &#xff08;Compute Architecture for Neural Networks&#xff09;算子库中提供张量类计算的基础算子库&#xff0c;采用模块化设计&#xff0c;支持灵活的算子开发和管理。 项目地址: https://gitcode.com/cann…

作者头像 李华
网站建设 2026/5/9 12:57:32

基于FPGA的医疗AI边缘计算:从模型轻量化到硬件部署实战

1. 项目概述&#xff1a;当AI遇上硬件加速最近几年&#xff0c;AI在医疗影像诊断领域的应用已经不是什么新鲜事&#xff0c;但大多数方案都跑在云端服务器或者高性能GPU上。我们团队当时接到了一个挺有意思的挑战&#xff1a;能不能把一套用于辅助诊断的智能检测系统&#xff0…

作者头像 李华
网站建设 2026/5/9 12:56:56

CANN/ge的AddResource API

AddResource 【免费下载链接】ge GE&#xff08;Graph Engine&#xff09;是面向昇腾的图编译器和执行器&#xff0c;提供了计算图优化、多流并行、内存复用和模型下沉等技术手段&#xff0c;加速模型执行效率&#xff0c;减少模型内存占用。 GE 提供对 PyTorch、TensorFlow 前…

作者头像 李华
网站建设 2026/5/9 12:56:47

生活处处有小美好

平凡的生活&#xff0c;藏着数不尽的温柔与小美好。晨起一缕暖阳洒进窗台&#xff0c;吹散一夜慵懒&#xff1b;闲暇时泡上一杯清茶&#xff0c;静静放空思绪&#xff0c;抛开生活里的琐碎烦恼。走在街边能偶遇盛放的花草&#xff0c;晚风拂过带来阵阵清凉。不必追逐远方的繁华…

作者头像 李华
网站建设 2026/5/9 12:54:34

ML/AI教育如何留住人才?专业信心、社会价值与软技能是关键

1. 项目概述&#xff1a;为什么我们需要关注ML/AI学生的“留下来”问题&#xff1f;机器学习与人工智能&#xff0c;这两个词如今几乎成了科技领域的“顶流”。无论是新闻里关于ChatGPT的讨论&#xff0c;还是招聘网站上动辄百万年薪的算法工程师岗位&#xff0c;都在不断强化一…

作者头像 李华
网站建设 2026/5/9 12:53:32

Transformer模型在法律AI中的应用:从BERT理解到GPT生成

1. 项目概述&#xff1a;当法律遇上Transformer几年前&#xff0c;我还在为一个大型律所的项目头疼&#xff0c;团队需要从堆积如山的合同里找出所有涉及“知识产权转让”的条款。当时我们用的是基于规则的关键词匹配&#xff0c;结果要么漏掉一堆变体表述&#xff0c;要么把“…

作者头像 李华