API Reference

This page provides a quick reference for all model classes, result types, and configuration methods available in the RZ/V2H RDK AI model packages.

For guidance on creating your own model, see How to Add a New Model.

Core Data Types (rzv_model)

These types are defined in rzv_model/base_model.hpp and rzv_model/utils.hpp.

ModelInput

struct ModelInput
{
  cv::Mat original_image;  // Input image (YUV422 or RGB format)
  cv::Rect roi;            // Region of interest within the image
};

ModelResult (base class)

All result types inherit from this. Contains timing information from each inference stage.

struct ModelResult
{
  float score = 0.0f;
  float preprocess_ms = 0.0f;   // Time spent in preprocessing
  float inference_ms = 0.0f;    // Time spent in DRP-AI inference
  float postprocess_ms = 0.0f;  // Time spent in postprocessing
};

KeyPoint / KeyPointResult

Used by pose estimation models (HRNetV2, RTMPose, MediaPipe).

struct KeyPoint
{
  float x;
  float y;
  float confidence;
  int class_id;
};

struct KeyPointResult : public ModelResult
{
  std::vector<KeyPoint> keypoints;
};

ModelShapeInfo

Provides tensor shape information extracted from the loaded model.

struct ModelShapeInfo
{
  std::vector<int64_t> input_shape;
  std::string input_dtype;
  std::vector<std::vector<int64_t>> output_shapes;
  std::vector<std::string> output_dtypes;

  int input_height() const;   // input_shape[2]
  int input_width() const;    // input_shape[3]
  int input_channels() const; // input_shape[1]
};

YUV422Format

enum class YUV422Format { YUYV, UYVY };

BaseModel Class

The base class for all AI models. Defined in rzv_model/base_model.hpp.

Public methods:

Method

Description

bool load(const std::string & model_path)

Load a DRP-AI model from the given directory path.

bool is_loaded() const

Check whether a model has been loaded.

const ModelShapeInfo & get_shape_info() const

Get input/output tensor shape information.

std::unique_ptr<T> run<T>(const ModelInput & input)

Run inference and return typed result. Returns nullptr on failure.

Protected methods (override in subclasses):

Method

Description

postprocess(output_tensors) (required)

Parse raw output tensors into a ModelResult.

preprocess(input)

Custom preprocessing before inference.

fallback_preprocess(input)

CPU fallback when hardware preprocessing is unavailable.

software_preprocess(input, imagenet, mean, std)

CPU preprocessing with optional ImageNet normalization.

extract_model_specific_shapes(shape_info)

Extract custom shapes after model load.

Protected helper methods:

Method

Description

letterbox(im, new_shape, color, center_align, do_resize)

Resize and pad image while maintaining aspect ratio.

is_preprocess_loaded()

Check if DRP-AI hardware preprocessing is available.

map_coordinates_to_original(point)

Map a point from preprocessed to original image coordinates.

map_size_to_original(size)

Map a size from preprocessed to original image coordinates.

set_padding_color(color)

Set the padding color for letterbox.

MODEL_INFO / MODEL_DEBUG / MODEL_WARN / MODEL_ERROR

Logging macros (uses spdlog).

Utils Class

Static utility functions defined in rzv_model/utils.hpp.

Method

Description

Utils::bgr_to_yuv422(bgr_image, format)

Convert BGR image to YUV422 (YUYV or UYVY).

Utils::rgba_to_yuv422(rgba_image, format)

Convert RGBA image to YUV422 (YUYV or UYVY).

Utils::non_maximum_suppression_batched(boxes, scores, class_ids, score_thresh, iou_thresh)

Batched NMS for cv::Rect2f (axis-aligned) or cv::RotatedRect (oriented) boxes.

Object Detection Models

The following models are provided for object detection tasks. Each model class inherits from BaseModel and implements the required methods for loading, preprocessing, inference, and postprocessing.

rzv_yolox – YoloxModel

Header: rzv_yolox/yolox_model.hpp | Inherits: BaseModel

Result type:

struct YOLOXDetection
{
  cv::Rect bbox;
  int class_id;
  float confidence;
  bool is_valid = false;
  std::string class_name;
};

struct YOLOXDetectionResult : public ModelResult
{
  std::vector<YOLOXDetection> detections;
};

Configuration methods:

Method

Description

set_class_names(class_names)

Set class labels (must match model training).

set_confidence_threshold(threshold)

Set detection confidence threshold (0.0 - 1.0).

set_iou_threshold(threshold)

Set NMS IoU threshold (0.0 - 1.0).

Quick example:

auto model = std::make_unique<rzv_model::YoloxModel>();
model->set_class_names({"hand"});
model->set_confidence_threshold(0.5f);
model->set_iou_threshold(0.4f);
model->load("path/to/yolox_model");

auto result = model->run<rzv_model::YOLOXDetectionResult>(input);

Model preparation: YOLOX | YOLOX - Convert for V2H

rzv_yolov8 – YOLOv8DetectModel

Header: rzv_yolov8/yolov8_detect_model.hpp | Inherits: YOLOv8Base -> BaseModel

Result type:

struct YOLOv8Detection
{
  cv::Rect bbox;
  int class_id;
  float confidence;
  bool is_valid = false;
  std::string class_name;
};

struct YOLOv8DetectionResult : public ModelResult
{
  std::vector<YOLOv8Detection> detections;
};

Configuration methods:

Method

Description

set_class_names(class_names)

Set class labels (must match model training).

set_confidence_threshold(threshold)

Set detection confidence threshold (0.0 - 1.0).

set_nms_threshold(threshold)

Set NMS threshold (0.0 - 1.0).

set_dfl_sigmoid_mode(mode)

Set DFL sigmoid optimization mode (see below).

set_cpu_dfl_multi_thread(enable)

Enable/disable multi-threaded CPU DFL processing.

DFL Sigmoid Modes (DFLSigmoidMode enum):

  • InDfl – Apply sigmoid during DFL processing (original).

  • AfterArgmax – Skip sigmoid in DFL, apply after argmax (faster).

  • AfterThreshold – Skip sigmoid in DFL, apply after threshold filtering (fastest, default).

Quick example:

auto model = std::make_unique<rzv_model::YOLOv8DetectModel>();
model->set_class_names({"paper", "rock", "scissor"});
model->set_confidence_threshold(0.5f);
model->set_nms_threshold(0.4f);
model->set_dfl_sigmoid_mode(rzv_model::DFLSigmoidMode::AfterThreshold);
model->set_cpu_dfl_multi_thread(false);
model->load("path/to/yolov8_model");

auto result = model->run<rzv_model::YOLOv8DetectionResult>(input);

Model preparation: Ultralytics YOLO | YOLOv8 - Convert for V2H

rzv_yolov8 – YOLOv8OBBModel

Header: rzv_yolov8/yolov8_obb_model.hpp | Inherits: YOLOv8Base -> BaseModel

For oriented bounding box detection (e.g., aerial/satellite imagery).

Result type:

struct YOLOv8OBBDetection
{
  cv::RotatedRect obbox;  // Oriented bounding box
  int class_id;
  float confidence;
  bool is_valid = false;
  std::string class_name;
};

struct YOLOv8OBBDetectionResult : public ModelResult
{
  std::vector<YOLOv8OBBDetection> detections;
};

Configuration methods: Same as YOLOv8DetectModel (inherits from YOLOv8Base).

Quick example:

auto model = std::make_unique<rzv_model::YOLOv8OBBModel>();
model->set_class_names({"ship", "plane", "vehicle"});
model->set_confidence_threshold(0.6f);
model->set_nms_threshold(0.5f);
model->load("path/to/yolov8_obb_model");

auto result = model->run<rzv_model::YOLOv8OBBDetectionResult>(input);

rzv_gold_yolo – GoldYoloModel

Header: rzv_gold_yolo/gold_yolo_model.hpp | Inherits: BaseModel

Result type:

struct GOLDYOLODetection
{
  cv::Rect bbox;
  int class_id;
  float confidence;
  bool is_valid = false;
  std::string class_name;
};

struct GOLDYOLODetectionResult : public ModelResult
{
  std::vector<GOLDYOLODetection> detections;
};

Configuration methods: Same as YoloxModel (set_class_names, set_confidence_threshold, set_iou_threshold).

Quick example:

auto model = std::make_unique<rzv_model::GoldYoloModel>();
model->set_class_names({"hand"});
model->set_confidence_threshold(0.5f);
model->set_iou_threshold(0.4f);
model->load("path/to/gold_yolo_model");

auto result = model->run<rzv_model::GOLDYOLODetectionResult>(input);

Pose Estimation Models

The following models are provided for pose estimation tasks. Each model class inherits from BaseModel and implements the required methods for loading, preprocessing, inference, and postprocessing.

rzv_hrnetv2 – HRNetV2Model

Header: rzv_hrnetv2/hrnetv2_model.hpp | Inherits: BaseModel

Returns KeyPointResult. No additional configuration methods beyond BaseModel.

Quick example:

auto model = std::make_unique<rzv_model::HRNetV2Model>();
model->load("path/to/hrnetv2_model");

auto result = model->run<rzv_model::KeyPointResult>(input);
for (const auto & kp : result->keypoints) {
    std::cout << "x=" << kp.x << " y=" << kp.y
              << " conf=" << kp.confidence << std::endl;
}

Model preparation: MMPose | Convert for V2H

rzv_rtmpose – RTMPoseModel

Header: rzv_rtmpose/rtmpose_model.hpp | Inherits: BaseModel

Returns KeyPointResult. No additional configuration methods beyond BaseModel.

Quick example:

auto model = std::make_unique<rzv_model::RTMPoseModel>();
model->load("path/to/rtmpose_model");

auto result = model->run<rzv_model::KeyPointResult>(input);

Model preparation: Same as HRNetV2 (MMPose).

rzv_mediapipe – MediaPipeHandLandmarkModel

Header: rzv_mediapipe/mediapipe_hand_landmark_model.hpp | Inherits: BaseModel

Returns HandLandmarkResult which extends KeyPointResult with handedness classification.

Result type:

struct HandLandmarkResult : public KeyPointResult
{
  float handedness;  // 0.0 = left hand, 1.0 = right hand
};

Quick example:

auto model = std::make_unique<rzv_model::MediaPipeHandLandmarkModel>();
model->load("path/to/mediapipe_hand_landmark_model");

auto result = model->run<rzv_model::HandLandmarkResult>(input);
std::cout << "Hand: " << (result->handedness > 0.5 ? "Right" : "Left") << std::endl;

Model preparation: MediaPipe

ROS 2 Utilities (rzv_model_utils_ros2)

Header: rzv_model_utils_ros2/model_utils.hpp

Provides helper functions for integrating AI models into ROS 2 nodes.

ModelConfig

struct ModelConfig
{
  std::string model_path;
  std::vector<std::string> class_names;
};

UtilsROS

Method

Description

UtilsROS::load_model_info(package, model_type, path_override, class_override)

Load model configuration from YAML with optional parameter overrides.

UtilsROS::encode_bboxes_to_pose_array(detections)

Convert detection bounding boxes to geometry_msgs/PoseArray.

UtilsROS::encode_diagonal_timing(result)

Encode inference timing into a diagnostic message.

YAML configuration format (config/models/models_config.yaml):

models:
  my_model:
    path: "models/my_model_name"
    names:
      0: class_a
      1: class_b