Kuiper Infer

发表于 2024-12-24 更新于 2025-01-03

github repo: https://github.com/zjhellofss/kuiperdatawhale.git

目录
Tensor
Compute Graph
KuiperInfer对计算图的封装
构建计算图关系和执行顺序
- 拓扑排序
- 基于深度优先的拓扑排序计算步骤
Operator & Register Factory
- Layer 类型的定义
Convolution & Pooling Operator
- 池化算子的定义
- 卷积算子的定义
Expression Layer
ResNet & YOLOv5 Infer
homework
- course1
- course2
- course3
- course4
- course5
- course6
- course7

Tensor

类设计

template <>
class Tensor<float>{
 public:
     Tensor(const std::vector<uint32_t>& shapes); // 三维张量构造函数
     uint32_t rows() const;
     uint32_t cols() const;
     uint32_t channels() const;
     uint32_t size() const; // 返回元素个数
     const std::vector<uint32_t>& raw_shapes() const; // 张量实际大小
     void Fill(const std::vector<float>& values, bool row_major); // 填充指定值
     ···
     ···
     ···
 private:
     std::vector<uint32_t> raw_shapes_;  // 张量数据的实际尺寸大小
     arma::fcube data_;                  // 张量数据
}

Compute Graph

计算图相关概念
- Operator: 深度学习计算图中的计算节点。
- Layer: 计算节点中运算的具体执行者，Layer类先读取输入张量中的数据，然后对输入张量进行计算，得到的结果存放到计算节点的输出张量中，当然，不同的算子中Layer的计算过程会不一致。
- Tensor: 用于存放多维数据的数据结构，方便数据在计算节点之间传递，同时该结构也封装矩阵乘、点积等与矩阵相关的基本操作。
- Graph: 有多个Operator串联得到的有向无环图，规定了各个计算节点（Operator）执行的流程和顺序。
PNNX 计算图优势
- 使用模板匹配（pattern matching）的方法将匹配到的子图用对应等价的大算子替换掉，不会像模型导出 ONNX 算子一样细碎。
- 在PyTorch中编写的简单算术表达式在转换为PNNX后，会保留表达式的整体结构，而不会被拆分成许多小的加减乘除算子。
- PNNX项目中有大量图优化的技术，包括了算子融合，常量折叠和消除，公共表达式消除等技术。
PNNX 计算图格式
- PNNX由图结构(Graph), 运算符(Operator)和操作数(Operand)这三种结构组成的，设计非常简洁。
- Graph的核心作用是管理计算图中的运算符和操作数。下面将对这两个概念进行说明：
  1. Operator类用来表示计算图中的运算符（算子），比如一个模型中的Convolution, Pooling等算子；
  2. Operand类用来表示计算图中的操作数，即与一个运算符有关的输入和输出张量；
  3. Graph类的成员函数提供了方便的接口用来创建和访问操作符和操作数，以构建和遍历计算图。同时，它也是模型中运算符（算子）和操作数的集合。

PNNX 运算符结构

class Operator
{
public:
   std::vector<Operand*> inputs; // 输入
   std::vector<Operand*> outputs; // 输出

   std::string type; // 类型：conv/linear/pooling
   std::string name; // 名称

   std::vector<std::string> inputnames; 
   std::map<std::string, Parameter> params; // 参数：如 `stride`, `padding`, `kernel size` 等
   std::map<std::string, Attribute> attrs; // 权重属性
};

PNNX 操作数结构

class Operand
{
public:
   void remove_consumer(const Operator* c);
   Operator* producer;
   std::vector<Operator*> consumers;
   
   int type;
   std::vector<int> shape;

   std::string name;
   std::map<std::string, Parameter> params;
};

操作数结构中的producer和customers, 分别表示产生这个操作数的算子和使用这个操作数的算子。
值得注意的是产生这个操作数的算子只能有一个，而使用这个操作数的算子可以有很多个。

KuiperInfer对计算图的封装

PNNX Operator -> RuntimeOperator

构建计算图关系和执行顺序

拓扑排序

对于一个有向无环图，拓扑排序总能够找到一个节点序列，在这个序列中，每个节点的前驱节点都能排在这个节点的前面。什么是前驱节点呢，也就是对于有向图中任意一条边的起点，可以认为它是终点节点的前驱节点。

基于深度优先的拓扑排序计算步骤

有计算排序的函数为ReverseTopo. ReverseTopo有参数current_op.

选定一个入度为零的节点(current_op)，入度为零指的是该节点没有前驱节点或所有前驱节点已经都被执行过，在选定的同时将该节点的已执行标记置为True，并将该节点传入到ReverseTopo函数中；
遍历1步骤中节点的后继节点(current_op->output_operators)；
如果1的某个后继节点没有被执行过(已执行标记为False)，则递归将该后继节点传入到ReverseTopo函数中；
第2步中的遍历结束后，将当前节点放入到执行队列(topo_operators_)中。

当该函数结束后，对执行队列中的排序结果做逆序就得到了最终拓扑排序的结果，来看看具体的代码：

void RuntimeGraph::ReverseTopo(
    const std::shared_ptr<RuntimeOperator>& current_op) {
  CHECK(current_op != nullptr) << "current operator is nullptr";
  current_op->has_forward = true;
  const auto& next_ops = current_op->output_operators;
  for (const auto& [_, op] : next_ops) {
    if (op != nullptr) {
      if (!op->has_forward) {
        this->ReverseTopo(op);
      }
    }
  }
  for (const auto& [_, op] : next_ops) {
    CHECK_EQ(op->has_forward, true);
  }
  this->topo_operators_.push_back(current_op);
}

Operator & Register Factory

Layer 类型的定义

计算节点被称之为RuntimeOperator, 具体的结构定义如下的代码所示：

struct RuntimeOperator {
virtual ~RuntimeOperator();

bool has_forward = false;
std::string name;      /// 计算节点的名称
std::string type;      /// 计算节点的类型
std::shared_ptr<Layer> layer;  /// 节点对应的计算Layer
   
std::map<std::string, std::shared_ptr<RuntimeOperand>>
      input_operands;  /// 节点的输入操作数
std::shared_ptr<RuntimeOperand> output_operands;  /// 节点的输出操作数
std::vector<std::shared_ptr<RuntimeOperand>>
      input_operands_seq;  /// 节点的输入操作数，顺序排列
std::map<std::string, std::shared_ptr<RuntimeOperator>>
      output_operators;  /// 输出节点的名字和节点对应
...
}

在一个计算节点(RuntimeOperator)中，记录了与该节点相关的类型、名称，以及输入输出数等信息。其中最重要的是layer变量，它是具体计算的实施者。

通过访问RuntimeOperator的输入数(input_operand)，layer可以获取计算所需的输入张量数据，并根据layer各派生类别中定义的计算函数(forward)对输入张量数据进行计算。计算完成后，计算结果将存储在该节点的输出数(output_operand)中。

以下的代码位于include/abstract/layer.hpp中，它是所有算子的父类，如果要实现项目中其他的算子，都需要继承于该类作为派生类并重写其中的计算函数(forward)。

class Layer {
public:
explicit Layer(std::string layer_name) : layer_name_(std::move(layer_name)) {}

virtual ~Layer() = default;

/**
   * Layer的执行函数
   * @param inputs 层的输入
   * @param outputs 层的输出
   * @return 执行的状态
   */
virtual InferStatus Forward(
      const std::vector<std::shared_ptr<Tensor<float>>>& inputs,
      std::vector<std::shared_ptr<Tensor<float>>>& outputs);

/**
   * Layer的执行函数
   * @param current_operator 当前的operator
   * @return 执行的状态
   */
virtual InferStatus Forward();
}

以上的代码定义了Layer类的构造函数，它只需要一个layer_name变量来指定该算子的名称。重点关注带有参数的Forward方法，它是算子中定义的计算函数。

这个函数有两个参数，分别是inputs和outputs。它们是在计算过程中所需的输入和输出张量数组。每个算子的派生类都需要重写这个带参数的Forward方法，并在其中定义计算的具体逻辑。

class Layer {
   ...
   ...
protected:
std::weak_ptr<RuntimeOperator> runtime_operator_;
std::string layer_name_;  /// Layer的名称  
}

在Layer类中有两个成员变量。一个是在构造函数中指定的算子名称 layer_name，另一个是与该算子关联的计算节点变量 RuntimeOperator。在之前回顾了 RuntimeOperator 的定义：

struct RuntimeOperator {
...
std::shared_ptr<Layer> layer;  /// 节点对应的计算Layer
...
}

classDiagram
      RuntimeOperator <-- Layer
      Layer <-- RuntimeOperator 
      class RuntimeOperator{
         - Tensor Array inputs
         - Tensor Array outputs
         - Layer Reference layer
      }
      class Layer{
      - RuntimeOperator Reference runtime_operator
         + Forward(void) InferStatus
      + Forward(inputs,outputs) InferStatus
      }
      Layer <|-- ReLULayer
      Layer <|-- ConvLayer
      Layer <|-- MaxPoolingLayer
      class ReLULayer{
      + Forward(inputs,outputs) InferStatus
      }
      class ConvLayer{
      + Forward(inputs,outputs) InferStatus
      }
         class MaxPoolingLayer{
      + Forward(inputs,outputs) InferStatus
      }

RuntimeOperator与该节点对应的 Layer 相关联，而 Layer 也关联了它所属的 RuntimeOperator，因此它们之间是双向关联的关系。

Layer 类中不带参数的 Forward 方法。这个方法是所有算子的父类方法，它的作用是准备输入和输出数据，并使用这些数据调用每个派生类算子中各自实现的计算过程（上文提到的带参数的 Forward 函数）。

InferStatus Layer::Forward() {
LOG_IF(FATAL, this->runtime_operator_.expired())
      << "Runtime operator is expired or nullptr";
// 获取算子相关的计算节点
const auto& runtime_operator = this->runtime_operator_.lock();
// 准备节点layer计算所需要的输入
const std::vector<std::shared_ptr<RuntimeOperand>>& input_operand_datas = runtime_operator->input_operands_seq;
// layer的输入
std::vector<std::shared_ptr<Tensor<float>>> layer_input_datas;
for (const auto& input_operand_data : input_operand_datas) {
   for (const auto& input_data : input_operand_data->datas) {
      layer_input_datas.push_back(input_data);
   }
}
...
...
}

在Layer类的不带参数的Forward方法中，首先获取与该Layer相对应的计算节点RuntimeOperator。它们之间是双向关联的关系，一个算子对应一个计算节点（RuntimeOperator），一个计算节点对应一个算子(Layer)。

从计算节点中得到该节点对应的输入数input_operand_datas以及该输入数存储的张量数据layer_input_datas. 随后，再从计算节点中取出对应的输出数output_operand_datas.

const std::shared_ptr<RuntimeOperand>& output_operand_datas =
      runtime_operator->output_operands;
InferStatus status = runtime_operator->layer->Forward(
      layer_input_datas, output_operand_datas->datas);

在以上的步骤中，从计算节点RuntimeOperator中获取了相关的输入数和输出数，随后再使用对应的输入和输出张量去调用子类算子各自实现的，带参数的Forward函数。

graph LR

父类Layer中Foward不带参数的版本--准备输入输出--> 各子类Layer中Foward带参数的版本;
各子类Layer中Foward带参数的版本-->Relu::Foward带参数版本
各子类Layer中Foward带参数的版本-->Conv::Foward带参数版本
各子类Layer中Foward带参数的版本-->MaxPool::Forward带参数版本

Convolution & Pooling Operator

池化算子的定义

池化算子常用于缓解深度神经网络对位置的过度敏感性。

池化算子会在固定形状的窗口（即池化窗口）内对输入数据的元素进行计算，计算结果可以是池化窗口内元素的最大值或平均值，这种运算被称为最大池化或平均池化。

对于带填充的池化算子，输出特征图的大小和输入特征图的大小之间有以下等式关系：
$$
output ,size= floor(\frac{input,size+2\times padding-pooling,size}{stride}+1)
$$

卷积算子的定义

卷积是信号处理和图像处理中常用的运算操作之一。它通过将输入信号（如图像、音频等）与一个卷积核（也称为滤波器或权重）进行相乘和累加的过程，用于在深度神经网络中提取特定的特征。因此，可以说卷积是最常用的算子之一。

卷积定义二维表示：

$$Y[i, j] = \sum_{m} \sum_{n} H[m, n] \cdot X[i+m, j+n]$$

其中，$X$表示输入矩阵，$H$表示卷积核，$Y$表示输出矩阵，$i$和$j$表示输出矩阵中的输出像素坐标，$m$和$n$表示卷积核中的坐标，$i+m$和$j+n$用于将卷积核和输入矩阵进行对齐，分别表示输入图像中的某个元素坐标。通过这两个偏移量，可以确定卷积核在输入矩阵中的位置，并将其与对应位置的像素值相乘，然后求和得到输出矩阵的每个元素 $Y[i,j]$。

二维卷积计算过程直观展示如下图，卷积核以滑动窗口的形式，从输入中划过，计算点积并求和，得到卷积后的输出存于output中。
单通道可以直观地被拓展成多通道，只需对多个单通道的卷积结果求和即可（请注意，下图中的kernel属于同一个卷积核中的不同通道），此时需要注意的是输入的通道数与卷积核的通道数需要保持一致。

如下图所示，可以看到一个多通道的输入和一个多通道的卷积核进行卷积计算，最后得到了一个单通道的输出output. 输入张量的通道数需要和卷积核的通道数个数相同,这里都是2个通道

input第一个通道和kernel第一个通道对应位置内求卷积

input第二个通道和kernel第二个通道对应位置内求卷积

两者相加，得到对应位置的输出
对于单通道输出，只需要一个卷积核就可以完成，如果想要使得输出为多通道，则需使用多个不同的卷积核，即卷积核个数对应输出通道个数。

如下图所示，可以看到，如果使用两个卷积核，最后会产生一个多通道的输出output，它有两个通道，分别为c1和c2.

有一个输入，输入的通道数为2

取出input_channel = 1的输入通道，它需要分别和卷积核1的通道1做卷积，和卷积核2的通道1做卷积，再把二者相加。

取出输入的第一个通道input c1

取出input_channel=2，输入第二个通道，它需要和卷积核1的通道2做卷积，和卷积核2的通道2做卷积，再把二者相加。
组卷积（group conv），顾名思义就是将卷积分组，即在深度上进行分组，假设group=2，则表示将原有的输入数据分成2组，如上图图所示，原本一个卷积核管全部通道，当分组之后，一个卷积核只需要管$\frac{input,channel}{group} = 2 / 2 = 1$个通道，即如下图所示。
分组卷积早在AlexNet便得到了应用，Alex认为组卷积能够增加卷积核之间的对角相关性，并减少训练参数，不容易过拟合，达到类似正则的效果。从下图可以看出，如果对一个多通道的输入运用组卷积，最后得到了一个多通道的输出output, 它有两个通道，分别为c1和c2.

总结：以上是二维卷积的基本定义，二维卷积的直观解释。普通卷积核的通道数需要与输入数据的通道数保持一致，而卷积核的数量则代表了输出数据的通道数。分组卷积核的通道数为输入数据通道数/分组数在卷积计算中，输入输出大小的维度有以下的对应关系：
$$
output, size = floor(\frac{input,size+ 2\times padding-kernel ,size}{stride }+1)
$$

上图例子中：output size = ((4+2*0-3)/1+1) = 2

Expression Layer

表达式的定义

PNNX中的表达式就是一个二元的计算过程，类似如下：

1 2	output_mid = input1 + input2; output = output_mid * input3;

在PNNX的表达式层（Expression Layer）中，提供了一种计算表达式，该表达式能够在一定程度上折叠计算过程并消除中间变量。例如，在残差结构中的add操作在PNNX中就是一个表达式层。

下面是PNNX中对上述过程的计算表达式表示，其中的@0和@1代表之前提到的计算数RuntimeOperand，用于表示计算表达式中的输入节点。

1	mul(@2, add(@0, @1));

尽管这个抽象表达式看起来比较简单，但实际上可能存在更为复杂的情况，例如以下的例子。因此，在这种情况下，需要一个强大而可靠的表达式解析和语法树构建功能。

1	add(add(mul(@0, @1), mul(@2, add(add(add(@0, @2), @3), @4))), @5);

词法解析

词法的定义

词法解析的目的是将**add(@0, mul(@1, @2))**拆分为多个Token，拆分后的Token依次为：

Identifier: add
Left bracket: (
Input number: @0
Comma: ,
Identifier: mul
Left bracket: (
Input number: @1
Comma: ,
Input number: @2
Right bracket: )

Token的类型定义如下：

enum class TokenType {
  TokenUnknown = -9,
  TokenInputNumber = -8,
  TokenComma = -7,
  TokenAdd = -6,
  TokenMul = -5,
  TokenLeftBracket = -4,
  TokenRightBracket = -3,
};

Token的定义如下，包括以下变量：

Token类型，包括add（加法），mul（乘法），bracket（左右括号）等；
Token在原句子中的开始和结束位置，即start_pos和end_pos；

对于表达式**add(@0, mul(@1, @2))**，可以将它切分为多个Token，其中Token(add)的start_pos为0，end_pos为3。Token(left bracket)的start_pos为3，end_pos为4。Token(@0)的start_pos为4，end_pos为5，以此类推。

// 词语Token
struct Token {
    TokenType token_type = TokenType::TokenUnknown;
    int32_t start_pos = 0; // 词语开始的位置
    int32_t end_pos = 0;   // 词语结束的位置
    Token(TokenType token_type, int32_t start_pos, int32_t end_pos)
        : token_type(token_type), start_pos(start_pos), end_pos(end_pos) {

        }
};

最后，在词法解析结束后，需要将这些 Token（词语）按照它们的出现顺序和层级关系组成一棵语法树。

// 语法树的节点
struct TokenNode {
    int32_t num_index = -1;
    std::shared_ptr<TokenNode> left = nullptr;   // 语法树的左节点
    std::shared_ptr<TokenNode> right = nullptr;  // 语法树的右节点
    TokenNode(int32_t num_index, std::shared_ptr<TokenNode> left,
              std::shared_ptr<TokenNode> right);
    TokenNode() = default;
};

词法的解析

判断句子是否为空

1	CHECK(!statement_.empty()) << "The input statement is empty!";

移除句子中的空格

statement_.erase(std::remove_if(statement_.begin(), statement_.end(),
                                [](char c) { return std::isspace(c); }),
                 statement_.end());
CHECK(!statement_.empty()) << "The input statement is empty!";

如果表达式层中有表达式为add(@0, @1)，删除其中的空格后就会得到新的表达式add(@0,@1)。

逐个解析句子的字符

for (int32_t i = 0; i < statement_.size();) {
    char c = statement_.at(i);
    if (c == 'a') {
        CHECK(i + 1 < statement_.size() && statement_.at(i + 1) == 'd')
            << "Parse add token failed, illegal character: "
            << statement_.at(i + 1);
        CHECK(i + 2 < statement_.size() && statement_.at(i + 2) == 'd')
            << "Parse add token failed, illegal character: "
            << statement_.at(i + 2);
        Token token(TokenType::TokenAdd, i, i + 3);
        tokens_.push_back(token);
        std::string token_operation =
            std::string(statement_.begin() + i, statement_.begin() + i + 3);
        token_strs_.push_back(token_operation);
        i = i + 3;
    } 
}

假设字符 c 表示当前的字符。如果 c 等于字符 ‘a’，根据的词法规定，Token 中以 ‘a’ 开头的情况只有 add。因此，需要判断接下来的两个字符是否分别是 ‘d’ 和 ‘d’。如果不是，则报错。如果是的话，则初始化一个新的 Token，并保存其在表达式中的初始和结束位置。

举个例子，如果表达式中的单词以 ‘a’ 开头，那么它只能是 add，而不能是其他词汇表之外的单词，例如 axc 等情况。

CHECK(i + 1 < statement_.size() && statement_.at(i + 1) == 'd')
    << "Parse add token failed, illegal character: "
    << statement_.at(i + 1);
CHECK(i + 2 < statement_.size() && statement_.at(i + 2) == 'd')
    << "Parse add token failed, illegal character: "
    << statement_.at(i + 2);
Token token(TokenType::TokenAdd, i, i + 3);
tokens_.push_back(token);
std::string token_operation =
    std::string(statement_.begin() + i, statement_.begin() + i + 3);
token_strs_.push_back(token_operation);

如果在第一行中，判断第二个字符是否为 ‘d’；若是，在第二行中，判断第三个字符是否也是 ‘d’。如果满足条件，将初始化一个 Token 实例，并保存该单词在句子中的起始位置和结束位置。

同样地，如果某个字符 c 是 ‘m’，需要判断接下来的字符是否是 ‘u’ 和 ‘l’。如果不满足条件，则说明的表达式中出现了词汇表之外的单词（因为词汇表只允许以 ‘m’ 开头的单词是 “mul”）。如果满足条件，同样会初始化一个 Token 实例，并保存该单词的起始和结束位置，以及 Token 的类型。

else if (c == '@') {
    CHECK(i + 1 < statement_.size() && std::isdigit(statement_.at(i + 1)))
        << "Parse number token failed, illegal character: " << c;
    int32_t j = i + 1;
    for (; j < statement_.size(); ++j) {
        if (!std::isdigit(statement_.at(j))) {
            break;
        }
    }
    Token token(TokenType::TokenInputNumber, i, j);
    CHECK(token.start_pos < token.end_pos);
    tokens_.push_back(token);
    std::string token_input_number = std::string(statement_.begin() + i, statement_.begin() + j);
    token_strs_.push_back(token_input_number);
    i = j;
}

如果第一个字符是 ‘@’，需要读取 ‘@’ 后面的所有数字，例如对于@31231，需要读取@符号之后的所有数字。如果紧跟在 ‘@’ 后面的字符不是数字，则报错。如果是数字，则将这些数字全部读取并组成一个单词（Token）。

else if (c == ',') {
      Token token(TokenType::TokenComma, i, i + 1);
      tokens_.push_back(token);
      std::string token_comma =
          std::string(statement_.begin() + i, statement_.begin() + i + 1);
      token_strs_.push_back(token_comma);
      i += 1;
}

如果第一个字符是’,’逗号，那么直接读取这个字符作为一个新的Token。

最后，在正确解析和创建这些 Token 后，将它们放入名为 tokens 的数组中，以便进行后续处理。

1	tokens_.push_back(token);

语法解析

语法树的定义

struct TokenNode {
    int32_t num_index = -1;
    std::shared_ptr<TokenNode> left = nullptr;
    std::shared_ptr<TokenNode> right = nullptr;
    TokenNode(int32_t num_index, std::shared_ptr<TokenNode> left, std::shared_ptr<TokenNode> right);
    TokenNode() = default;
};

在进行语法分析时，可以根据词法分析得到的 token 数组构建抽象语法树。抽象语法树是一个由二叉树组成的结构，每个节点都存储了操作符号或值，并通过左子节点和右子节点与其他节点连接。

对于表达式 “add (@0, @1)”，当 num_index 等于 1 时，表示计算数为 @0；当 num_index 等于 2 时，表示计算数为 @1。若 num_index 为负数，则说明当前节点是一个计算节点，如 “mul” 或 “add” 等。

以下是一个简单的示例：

1
2
3

   add
  /   \
@0     @1

在这个示例中，根节点是 “add”，左子节点是 “@0”，右子节点是 “@1”。这个抽象语法树表示了一个将 “@0” 和 “@1” 进行相加的表达式。

通过将词法分析得到的 token 数组解析并构建抽象语法树，可以进一步对表达式进行语义分析和求值等操作。

递归向下的解析

语法解析的过程是递归向下的,定义在Generate_函数中。

std::shared_ptr<TokenNode> ExpressionParser::Generate_(int32_t &index) {
    CHECK(index < this->tokens_.size());
    const auto current_token = this->tokens_.at(index);
    CHECK(current_token.token_type == TokenType::TokenInputNumber
          || current_token.token_type == TokenType::TokenAdd || current_token.token_type == TokenType::TokenMul);
}

这个函数处理的对象是词法解析的Token（单词）数组，因为Generate_是一个递归函数，所以index参数指向Token数组中的当前处理位置.

current_token表示当前被处理的Token，它作为当前递归层的第一个Token，必须是以下类型之一。

1
2
3

TokenInputNumber = 0,
TokenAdd = 2,
TokenMul = 3,

如果当前Token的类型是输入数字类型，那么会直接返回一个操作数Token作为叶子节点，不再进行下一层递归（如下）。例如，在表达式add(@0, @1)中的@0和@1被归类为输入数字类型的Token，在解析到这两个Token时会直接创建并返回语法树节点TokenNode。

if (current_token.token_type == TokenType::TokenInputNumber) {
    uint32_t start_pos = current_token.start_pos + 1;
    uint32_t end_pos = current_token.end_pos;
    CHECK(end_pos > start_pos);
    CHECK(end_pos <= this->statement_.length());
    const std::string &str_number =
        std::string(this->statement_.begin() + start_pos, this->statement_.begin() + end_pos);
    return std::make_shared<TokenNode>(std::stoi(str_number), nullptr, nullptr);

}

如果当前Token的类型是mul或者add，需要进行下一层递归来构建对应的左子节点和右子节点。

例如，在处理add(@1,@2)时，遇到add token之后，如下的第一行代码，需要做以下的两步：

首先判断是否存在左括号（left bracket）
然后继续向下递归以获取@1，如下的第14行到17行代码，但由于@1代表的是数字类型，递归后立即返回，如以上代码块中第一行对数字类型Token的处理。

else if (current_token.token_type == TokenType::TokenMul || current_token.token_type == TokenType::TokenAdd) {
    std::shared_ptr<TokenNode> current_node = std::make_shared<TokenNode>();
    current_node->num_index = -int(current_token.token_type);

    index += 1;
    CHECK(index < this->tokens_.size());
    // 判断add之后是否有( left bracket
    CHECK(this->tokens_.at(index).token_type == TokenType::TokenLeftBracket);

    index += 1;
    CHECK(index < this->tokens_.size());
    const auto left_token = this->tokens_.at(index);
    // 判断当前需要处理的left token是不是合法类型
    if (left_token.token_type == TokenType::TokenInputNumber
        || left_token.token_type == TokenType::TokenAdd || left_token.token_type == TokenType::TokenMul) {
        // (之后进行向下递归得到@0
        current_node->left = Generate_(index);
    } else {
        LOG(FATAL) << "Unknown token type: " << int(left_token.token_type);
    }
}

在第17行当左子树递归构建完毕后，将它赋值到add节点的左子树上。对于表达式add(@0, @1)，将左子树连接到current_node的left指针中，随后开始构建右子树。

1 2	graph TB; 1((add))-->2((ant 0))

index += 1; 
// 当前的index指向add(@1,@2)中的逗号
CHECK(index < this->tokens_.size());
// 判断是否是逗号
CHECK(this->tokens_.at(index).token_type == TokenType::TokenComma);

index += 1;
CHECK(index < this->tokens_.size());
// current_node->right = Generate_(index);构建右子树
const auto right_token = this->tokens_.at(index);
if (right_token.token_type == TokenType::TokenInputNumber
    || right_token.token_type == TokenType::TokenAdd || right_token.token_type == TokenType::TokenMul) {
  current_node->right = Generate_(index);
} else {
  LOG(FATAL) << "Unknown token type: " << int(left_token.token_type);
}

index += 1;
CHECK(index < this->tokens_.size());
CHECK(this->tokens_.at(index).token_type == TokenType::TokenRightBracket);
return current_node;

随后需要判断@0之后是否存在comma token，如上代码中的第五行。在构建右子树的过程中，对于表达式add(@1,@2)，当index指向逗号的位置时，首先需要判断是否存在逗号。接下来，开始构建右子树，在右子树的向下递归分析中，会得到@2作为一个叶子节点。

当右子树构建完成后，将该节点（即Generate_返回的TokenNode，此处为一个叶子节点，其数据为@1）放置于current_node的right指针中。

graph TB;
1((add))-->2((ant 0))

1((add))-->3((ant 1))

对语法树的转换

逆波兰式

来以一个简单的例子来说明，对于计算式add(@0,@1)，首先遇到的节点是add，但在遇到add时缺少进行计算所需的具体数据@0和@1。

因此，需要进行逆波兰转换，将操作数放在前面，计算放在后面。该转换的实现非常简单，只需对原有的二叉树进行后续遍历即可：

void ReversePolish(const std::shared_ptr<TokenNode> &root_node,
                   std::vector<std::shared_ptr<TokenNode>> &reverse_polish) {
    if (root_node != nullptr) {
        ReversePolish(root_node->left, reverse_polish);
        ReversePolish(root_node->right, reverse_polish);
        reverse_polish.push_back(root_node);
    }
}

逆波兰式化后的表达如下：

对于 add (@0,@1)，逆波兰式为：@0,@1,add

对于 add(mul(@0,@1),@2)，逆波兰式为：@0,@1,mul,@2,add

通过逆波兰转换，可以将原式转换为计算式的输入数放在前面，操作符号放在后面的形式。逆波兰式的特点是消除了括号的需求，使得计算顺序更加清晰和直观。

过程总述

经过这样的转换，可以确保在每次遇到计算节点时所需的操作数已经准备就绪。

首先，传入一个表达式字符串，例如add(mul(@0,@1),@2)
接下来，对add(mul(@0,@1),@2)进行词法分析，将其拆分为多个tokens，在拆分过程中需要进行词法校验。
然后，根据已知的tokens数组，通过递归向下遍历进行语法分析，从而得到相应的计算二叉树。计算二叉树的各个节点可以是add、mul或者@0、@1等。
最后，对计算二叉树进行逆波兰变换，得到的逆波兰式如下：@0,@1,mul,@2,add。

ResNet & YOLOv5 Infer

TODO

homework

course1

// axby.cpp
void Axby(const arma::fmat &x, const arma::fmat &w, const arma::fmat &b,
          arma::fmat &y) {
  // 把代码写这里 完成y = w * x + b的运算
  y = w * x + b;
}

void EPowerMinus(const arma::fmat &x, arma::fmat &y) {
  // 把代码写这里 完成y = e^{-x}的运算
  arma::fmat eMat(x.n_rows, x.n_cols);
  eMat.fill(std::exp(1.0));
  y = arma::pow(eMat, -x);
}

course2

Tensor::Flatten
Tensor::Padding

// tensor.cpp
void Tensor<float>::Flatten(bool row_major) {
  const std::vector<uint32_t> flatten_size = {this->size()};
  this->Reshape(flatten_size, row_major);
}

void Tensor<float>::Padding(const std::vector<uint32_t>& pads,
                            float padding_value) {
    CHECK(!this->data_.empty());
    CHECK_EQ(pads.size(), 4);
    // 四周填充的维度
    uint32_t pad_rows1 = pads.at(0);  // up
    uint32_t pad_rows2 = pads.at(1);  // bottom
    uint32_t pad_cols1 = pads.at(2);  // left
    uint32_t pad_cols2 = pads.at(3);  // right

    const uint32_t rows = this->rows();
    const uint32_t cols = this->cols();
    const uint32_t channels = this->data_.n_slices;
    const uint32_t new_rows = rows + pad_rows1 + pad_rows2;
    const uint32_t new_cols = cols + pad_cols1 + pad_cols2;

    arma::fcube new_data = arma::fcube(new_rows, new_cols, channels);
    new_data.fill(padding_value);

    // 方式一：通过循环逐个赋值填充（记录开始时间，精确到纳秒）
    auto start_loop = std::chrono::high_resolution_clock::now();
    for (uint32_t c = 0; c < channels; ++c) {
        for (uint32_t i = 0; i < rows; ++i) {
            for (uint32_t j = 0; j < cols; ++j) {
                new_data.at(i + pad_rows1, j + pad_cols1, c) = this->data_.at(i, j, c);
            }
        }
    }
    auto end_loop = std::chrono::high_resolution_clock::now();
    auto duration_loop = std::chrono::duration_cast<std::chrono::nanoseconds>(end_loop - start_loop).count();
    std::cout << "Time taken by loop-based padding (in nanoseconds): " << duration_loop << " ns" << std::endl;

    // 重置new_data，重新填充初始值，为方式二做准备
    new_data.fill(padding_value);

    // 方式二：使用subcube赋值填充（记录开始时间，精确到纳秒）
    auto start_subcube = std::chrono::high_resolution_clock::now();
    new_data.subcube(pad_rows1, pad_cols1, 0, new_rows - pad_rows2 - 1,
                     new_cols - pad_cols2 - 1, channels - 1) = this->data_;
    auto end_subcube = std::chrono::high_resolution_clock::now();
    auto duration_subcube = std::chrono::duration_cast<std::chrono::nanoseconds>(end_subcube - start_subcube).count();
    std::cout << "Time taken by subcube-based padding (in nanoseconds): " << duration_subcube << " ns" << std::endl;

    this->data_ = std::move(new_data);
    this->raw_shapes_ = std::vector<uint32_t>{channels, new_rows, new_cols};
}

course3

RuntimeGraph::InitGraphParams

// runtime_ir.cpp
void RuntimeGraph::InitGraphParams(
      const std::map<std::string, pnnx::Parameter> &params,
      const std::shared_ptr<RuntimeOperator> &runtime_operator) {
   for (const auto &[name, parameter]: params) {
      const int type = parameter.type;
      switch (type) {
            case int(RuntimeParameterType::kParameterUnknown): {
               RuntimeParameter *runtime_parameter = new RuntimeParameter;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }

            case int(RuntimeParameterType::kParameterBool): {
               RuntimeParameterBool *runtime_parameter = new RuntimeParameterBool;
               runtime_parameter->value = parameter.b;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }

            case int(RuntimeParameterType::kParameterInt): {
               RuntimeParameterInt *runtime_parameter = new RuntimeParameterInt;
               runtime_parameter->value = parameter.i;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }

            case int(RuntimeParameterType::kParameterFloat): {
               RuntimeParameterFloat *runtime_parameter = new RuntimeParameterFloat;
               runtime_parameter->value = parameter.f;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }

            case int(RuntimeParameterType::kParameterString): {
               RuntimeParameterString *runtime_parameter = new RuntimeParameterString;
               runtime_parameter->value = parameter.s;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }

            case int(RuntimeParameterType::kParameterIntArray): {
               RuntimeParameterIntArray *runtime_parameter =
                        new RuntimeParameterIntArray;
               runtime_parameter->value = parameter.ai;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }

            case int(RuntimeParameterType::kParameterFloatArray): {
               RuntimeParameterFloatArray *runtime_parameter =
                        new RuntimeParameterFloatArray;
               runtime_parameter->value = parameter.af;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }
            case int(RuntimeParameterType::kParameterStringArray): {
               RuntimeParameterStringArray *runtime_parameter =
                        new RuntimeParameterStringArray;
               runtime_parameter->value = parameter.as;
               runtime_operator->params.insert({name, runtime_parameter});
               break;
            }
            default: {
               LOG(FATAL) << "Unknown parameter type: " << type;
            }
      }
   }
}

course4

TopoSort

1 2	// runtime_ir.hpp void KahnTopoSort();

// runtime_ir.cpp
void RuntimeGraph::KahnTopoSort() {
    std::unordered_map<std::shared_ptr<RuntimeOperator>, int> in_degree;
    std::queue<std::shared_ptr<RuntimeOperator>> zero_in_degree_queue;

    // 计算所有节点的入度
    for (const auto& op : operators_) {
        in_degree[op] = 0;
    }
    for (const auto& op : operators_) {
        for (const auto& [_, next_op] : op->output_operators) {
            if (next_op != nullptr) {
                in_degree[next_op]++;
            }
        }
    }

    // 找到所有入度为0的节点
    for (const auto& [op, degree] : in_degree) {
        if (degree == 0) {
            zero_in_degree_queue.push(op);
        }
    }

    // 处理队列中的节点
    while (!zero_in_degree_queue.empty()) {
        auto op = zero_in_degree_queue.front();
        zero_in_degree_queue.pop();
        topo_operators_.push_back(op);

        for (const auto& [_, next_op] : op->output_operators) {
            if (next_op != nullptr) {
                in_degree[next_op]--;
                if (in_degree[next_op] == 0) {
                    zero_in_degree_queue.push(next_op);
                }
            }
        }
    }

    // 检查是否存在环
    if (topo_operators_.size() != operators_.size()) {
        throw std::runtime_error("Graph has a cycle");
    }
}

course5

Sigmoid Layer

// sigmoid.hpp
#ifndef KUIPER_INFER_SOURCE_LAYER_BINOCULAR_SIGMOID_HPP_
#define KUIPER_INFER_SOURCE_LAYER_BINOCULAR_SIGMOID_HPP_
#include "layer/abstract/non_param_layer.hpp"

namespace kuiper_infer {
class SigmoidLayer : public NonParamLayer {
    public:
        SigmoidLayer() : NonParamLayer("Sigmoid") {}
        InferStatus Forward(
            const std::vector<std::shared_ptr<Tensor<float>>>& inputs,
            std::vector<std::shared_ptr<Tensor<float>>>& outputs) override;
        static ParseParameterAttrStatus GetInstance(
            const std::shared_ptr<RuntimeOperator>& op,
            std::shared_ptr<Layer>& sigmoid_layer);
};
} // namespace kuiper_infer
#endif  // KUIPER_INFER_SOURCE_LAYER_BINOCULAR_SIGMOID_HPP_

// sigmoid.cpp
#include "sigmoid.hpp"
#include "layer/abstract/layer_factory.hpp"

namespace kuiper_infer {
InferStatus SigmoidLayer::Forward(
   const std::vector<std::shared_ptr<Tensor<float>>> &inputs,
   std::vector<std::shared_ptr<Tensor<float>>> &outputs) { 
  if (inputs.empty()) {
    LOG(ERROR) << "The input tensor array in the relu layer is empty";
    return InferStatus::kInferFailedInputEmpty;
  }
  if (inputs.size() != outputs.size()) {
    LOG(ERROR) << "The input and output tensor array size of the relu layer do "
                  "not match";
    return InferStatus::kInferFailedInputOutSizeMatchError;
  }

  const uint32_t batch_size = inputs.size();
  for (uint32_t i = 0; i < batch_size; ++i) {
    const sftensor &input_data = inputs.at(i);
    const sftensor &output_data = outputs.at(i);
    if (input_data == nullptr || input_data->empty()) {
      LOG(ERROR)
          << "The input tensor array in the relu layer has an empty tensor "
          << i << " th";
      return InferStatus::kInferFailedInputEmpty;
    }
    if (output_data != nullptr && !output_data->empty()) {
      if (input_data->shapes() != output_data->shapes()) {
        LOG(ERROR) << "The input and output tensor shapes of the relu "
                      "layer do not match "
                   << i << " th";
        return InferStatus::kInferFailedInputOutSizeMatchError;
      }
    }
  }

  for (uint32_t i = 0; i < batch_size; ++i) {
    const std::shared_ptr<Tensor<float>> &input = inputs.at(i);
    CHECK(input == nullptr || !input->empty())
            << "The input tensor array in the relu layer has an empty tensor " << i
            << " th";

    std::shared_ptr<Tensor<float>> output = outputs.at(i);
    if (output == nullptr || output->empty()) {
      DLOG(ERROR)
          << "The output tensor array in the relu layer has an empty tensor "
          << i << " th";
      output = std::make_shared<Tensor<float>>(input->shapes());
      outputs.at(i) = output;
    }
    CHECK(output->shapes() == input->shapes())
            << "The input and output tensor shapes of the relu layer do not match "
            << i << " th";
    for (uint32_t j = 0; j < input->size(); ++j) {
      float value = input->index(j);
      output->index(j) = 1.f / (1.f + expf(-value));
    }
  }
  return InferStatus::kInferSuccess;
}

ParseParameterAttrStatus SigmoidLayer::GetInstance(
    const std::shared_ptr<RuntimeOperator> &op,
    std::shared_ptr<Layer> &sigmoid_layer) {
  CHECK(op != nullptr) << "Sigmod layer op is nullptr";
  sigmoid_layer = std::make_shared<SigmoidLayer>();
  return ParseParameterAttrStatus::kParameterAttrParseSuccess;
}

LayerRegistererWrapper kSigmoidGetInstance("nn.Sigmoid", SigmoidLayer::GetInstance);
}  // namespace kuiper_infer

course6

create_layer_group_convforward

// test_conv.cpp
TEST(test_registry, create_layer_group_convforward) {
  const uint32_t batch_size = 1;
  std::vector<sftensor> inputs(batch_size);
  std::vector<sftensor> outputs(batch_size);

  const uint32_t in_channel = 2;
  for (uint32_t i = 0; i < batch_size; ++i) {
    sftensor input = std::make_shared<ftensor>(in_channel, 4, 4);
    input->data().slice(0) = "1,2,3,4;"
                             "5,6,7,8;"
                             "9,10,11,12;"
                             "13,14,15,16;";

    input->data().slice(1) = "1,2,3,4;"
                             "5,6,7,8;"
                             "9,10,11,12;"
                             "13,14,15,16;";
    inputs.at(i) = input;
  }
  const uint32_t kernel_h = 3;
  const uint32_t kernel_w = 3;
  const uint32_t stride_h = 1;
  const uint32_t stride_w = 1;
  const uint32_t kernel_count = 2;
  const uint32_t group = 2;
  std::vector<sftensor> weights;
  for (uint32_t i = 0; i < kernel_count; ++i) {
    sftensor kernel = std::make_shared<ftensor>(in_channel / group, kernel_h, kernel_w);
    for (uint32_t j = 0; j < (in_channel / group); ++j) {
      kernel->data().slice(j) = arma::fmat("1,2,3;"
                                           "3,2,1;"
                                           "1,2,3;");
    }
    weights.push_back(kernel);
  }
  ConvolutionLayer conv_layer(kernel_count, in_channel, kernel_h, kernel_w, 0,
                              0, stride_h, stride_w, group, false);
  conv_layer.set_weights(weights);
  conv_layer.Forward(inputs, outputs);
  outputs.at(0)->Show();
}

course7

词法和语法解析中支持sin(三角函数)操作
如果操作符是单输入数，例如问题1中的sin函数，的Forward函数应该做出什么改动能获得正确的计算结果。

// tensor_utils.hpp
/**
 * sin(@num)
 * @param tensor 输入张量
 * @return 张量 sin 的结果
 */
std::shared_ptr<Tensor<float>> TensorElementSin(
    const std::shared_ptr<Tensor<float>>& tensor);

// tensor_utils.cpp
std::shared_ptr<Tensor<float>> TensorElementSin(
            const std::shared_ptr<Tensor<float>>& tensor) {
    CHECK(tensor != nullptr);

    sftensor output_tensor = TensorCreate(tensor->shapes());
    const auto& input_data = tensor->data();
    auto& output_data = output_tensor->data();

    for (size_t i = 0; i < input_data.size(); i++) {
        output_data[i] = std::sin(input_data[i]);
    }

    return output_tensor;
}

// parse_expression.cpp
std::shared_ptr<TokenNode> ExpressionParser::Generate_(int32_t &index) { // recursive generate
  CHECK(index < this->tokens_.size());
  const auto current_token = this->tokens_.at(index);
  CHECK(current_token.token_type == TokenType::TokenInputNumber ||
      current_token.token_type == TokenType::TokenAdd ||
      current_token.token_type == TokenType::TokenMul ||
      current_token.token_type == TokenType::TokenSin);
  if (current_token.token_type == TokenType::TokenInputNumber) {
    uint32_t start_pos = current_token.start_pos + 1;
    uint32_t end_pos = current_token.end_pos;
    CHECK(end_pos > start_pos || end_pos <= this->statement_.length())
            << "Current token has a wrong length";
    const std::string &str_number =
        std::string(this->statement_.begin() + start_pos,
                    this->statement_.begin() + end_pos);
    return std::make_shared<TokenNode>(std::stoi(str_number), nullptr, nullptr);

  } else if (current_token.token_type == TokenType::TokenMul ||
      current_token.token_type == TokenType::TokenAdd) {
    std::shared_ptr<TokenNode> current_node = std::make_shared<TokenNode>();
    current_node->num_index = int(current_token.token_type);

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing left bracket!";
    CHECK(this->tokens_.at(index).token_type == TokenType::TokenLeftBracket);

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing correspond left token!";
    const auto left_token = this->tokens_.at(index);

    if (left_token.token_type == TokenType::TokenInputNumber ||
        left_token.token_type == TokenType::TokenAdd ||
        left_token.token_type == TokenType::TokenMul ||
        left_token.token_type == TokenType::TokenSin) {
      current_node->left = Generate_(index);
    } else {
      LOG(FATAL) << "Unknown token type: " << int(left_token.token_type);
    }

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing comma!";
    CHECK(this->tokens_.at(index).token_type == TokenType::TokenComma);

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing correspond right token!";
    const auto right_token = this->tokens_.at(index);
    if (right_token.token_type == TokenType::TokenInputNumber ||
        right_token.token_type == TokenType::TokenAdd ||
        right_token.token_type == TokenType::TokenMul ||
        right_token.token_type == TokenType::TokenSin) {
      current_node->right = Generate_(index);
    } else {
      LOG(FATAL) << "Unknown token type: " << int(right_token.token_type);
    }

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing right bracket!";
    CHECK(this->tokens_.at(index).token_type == TokenType::TokenRightBracket);
    return current_node;
  } else if (current_token.token_type == TokenType::TokenSin){
    std::shared_ptr<TokenNode> current_node = std::make_shared<TokenNode>();
    current_node->num_index = int(current_token.token_type);

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing left bracket!";
    CHECK(this->tokens_.at(index).token_type == TokenType::TokenLeftBracket);

    index += 1;
    const auto cur_token = this->tokens_.at(index);
    if (cur_token.token_type == TokenType::TokenInputNumber ||
        cur_token.token_type == TokenType::TokenAdd ||
        cur_token.token_type == TokenType::TokenMul ||
        cur_token.token_type == TokenType::TokenSin) {
      current_node->left = Generate_(index);
    } else {
      LOG(FATAL) << "Unknown token type: " << int(cur_token.token_type);
    }

    index += 1;
    CHECK(index < this->tokens_.size()) << "Missing right bracket!";
    CHECK(this->tokens_.at(index).token_type == TokenType::TokenRightBracket);
    return current_node;
  } else {
    LOG(FATAL) << "Unknown token type: " << int(current_token.token_type);
  }
}

// expression.cpp
InferStatus ExpressionLayer::Forward(
    const std::vector<std::shared_ptr<Tensor<float>>>& inputs,
    std::vector<std::shared_ptr<Tensor<float>>>& outputs) {
  if (inputs.empty()) {
    LOG(ERROR) << "The input tensor array in the expression layer is empty";
    return InferStatus::kInferFailedInputEmpty;
  }

  if (outputs.empty()) {
    LOG(ERROR) << "The output tensor array in the expression layer is empty";
    return InferStatus::kInferFailedOutputEmpty;
  }

  CHECK(this->parser_ != nullptr)
      << "The parser in the expression layer is null!";
  this->parser_->Tokenizer(false);
  const auto& expressions = this->parser_->tokens();
  CHECK(!expressions.empty())
      << "The expression parser failed to parse " << statement_;

  for (uint32_t i = 0; i < inputs.size(); ++i) {
    const sftensor& input_data = inputs.at(i);
    if (input_data == nullptr || input_data->empty()) {
      LOG(ERROR) << "The input tensor array in the expression layer has an "
                    "empty tensor "
                 << i << "th";
      return InferStatus::kInferFailedInputEmpty;
    }
  }

  const uint32_t batch_size = outputs.size();
  for (uint32_t i = 0; i < batch_size; ++i) {
    if (outputs.at(i) == nullptr || outputs.at(i)->empty()) {
      DLOG(ERROR) << "The output tensor array in the expression layer has an "
                     "empty tensor "
                  << i << "th";
      return InferStatus::kInferFailedOutputEmpty;
    }
    outputs.at(i)->Fill(0.f);
  }

  std::stack<std::vector<std::shared_ptr<Tensor<float>>>> op_stack;
  const std::vector<std::shared_ptr<TokenNode>>& token_nodes =
      this->parser_->Generate();
  for (const auto& token_node : token_nodes) {
    if (token_node->num_index >= 0) {
      // process operator
      uint32_t start_pos = token_node->num_index * batch_size;
      std::vector<std::shared_ptr<Tensor<float>>> input_token_nodes;
      for (uint32_t i = 0; i < batch_size; ++i) {
        CHECK(i + start_pos < inputs.size())
            << "The " << i
            << "th operand doesn't have appropriate number of tensors";
        // fixme 这里的张量拷贝是否有必要
        input_token_nodes.push_back(inputs.at(i + start_pos));
      }
      op_stack.push(input_token_nodes);
    } else {
      // process operation
      const int32_t op = token_node->num_index;
      if (op != int(TokenType::TokenAdd) && op != int(TokenType::TokenMul) && op != int(TokenType::TokenSin)) {
        LOG(FATAL) << "Unknown operator type: " << op;
      }
      if (op == int(TokenType::TokenSin)) {
          CHECK(op_stack.size() >= 1) << "The number of operand is less than one for sin operation";
          std::vector<std::shared_ptr<Tensor<float>>> input_node = op_stack.top();
          CHECK(input_node.size() == batch_size)
                          << "The operand doesn't have appropriate number of tensors, "
                             "which need "
                          << batch_size;
          op_stack.pop();
          std::vector<std::shared_ptr<Tensor<float>>> output_token_nodes(batch_size);
          for (uint32_t i = 0; i < batch_size; ++i) {
              // do execution
              output_token_nodes.at(i) = TensorElementSin(input_node.at(i)); // Modified
          }
          op_stack.push(output_token_nodes);
          continue; /// 跳过循环的其余部分进行sin操作
      } else {
        CHECK(op_stack.size() >= 2) << "The number of operand is less than two";
        std::vector<std::shared_ptr<Tensor<float>>> input_node1 = op_stack.top();

        CHECK(input_node1.size() == batch_size)
            << "The first operand doesn't have appropriate number of tensors, "
              "which need "
            << batch_size;
        op_stack.pop();

        std::vector<std::shared_ptr<Tensor<float>>> input_node2 = op_stack.top();
        CHECK(input_node2.size() == batch_size)
            << "The second operand doesn't have appropriate number of tensors, "
              "which need "
            << batch_size;
        op_stack.pop();

        std::vector<std::shared_ptr<Tensor<float>>> output_token_nodes(
            batch_size);
        for (uint32_t i = 0; i < batch_size; ++i) {
          // do execution
          if (op == int(TokenType::TokenAdd)) {
            output_token_nodes.at(i) =
                TensorElementAdd(input_node1.at(i), input_node2.at(i));
          } else if (op == int(TokenType::TokenMul)) {
            output_token_nodes.at(i) =
                TensorElementMultiply(input_node1.at(i), input_node2.at(i));
          } else if (op == int(TokenType::TokenSin)) {
            output_token_nodes.at(i) =
                TensorElementSin(input_node1.at(i));
          } else {
            LOG(FATAL) << "Unknown operator type: " << op;
          }
        }
        op_stack.push(output_token_nodes);
      }
    }
  }

  CHECK(op_stack.size() == 1)
      << "The expression has more than one output operand!";
  std::vector<sftensor> output_node = op_stack.top();
  op_stack.pop();
  for (int i = 0; i < batch_size; ++i) {
    CHECK(outputs.at(i) != nullptr && !outputs.at(i)->empty());
    CHECK(outputs.at(i)->shapes() == output_node.at(i)->shapes());
    outputs.at(i) = output_node.at(i);
  }
  return InferStatus::kInferSuccess;
}

目录