百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 技术分类 > 正文

轻量级YOLO检测与从头开始的目标追踪

ztj100 2024-12-12 16:14 9 浏览 0 评论

轻量级YOLO检测与从头开始的目标追踪

1. YOLO目标检测简介

在YOLO之前,除了R-CNN,还有另一种简单的框架是使用滑动窗口遍历整个输入框架,每个滑动窗口一次输入到单个CNN。这种简单的方法易于实现,但计算成本极高,使其不适合实时目标检测。

YOLO模型通过将整个输入帧传递给单个卷积神经网络(CNN)来反转滑动窗口框架,并输出一个3D张量值,其中每个横截面表示原始输入帧的子划分网格。这个3D张量的通道包含有关是否检测到任何感兴趣的对象、检测到的对象类别,以及每个网格单元中对象的尺寸的信息。

我在下面的图表中说明了YOLO的概念。考虑一张道路上汽车的图片,我们想将图像分割成3 x 3的网格框。因此,我们将创建一个CNN,其输出维度为3 x 3 x 通道数,其中每个通道可以是一个向量:

[物体中心点被检测的概率, X的中心点, Y的中心点, 物体的高度相对于网格框, 物体的宽度相对于网格框, 类别 A, 类别 B, 类别 C]

注意,上述向量仅保存网格单元中单个对象的信息。我们可以通过扩展向量来允许网格单元包含多个对象,附加第二个或更多对象的相关信息。

作者提供的图片。

作者提供的图片。

接下来,我们需要创建一个包含这些输出张量及相关信息的数据集。因此,我们将继续使用OpenCV模拟来创建我们的YOLO数据集。

2. 颗粒仿真与 YOLO 数据收集

我们的数据集将基于一个OpenCV模拟,模拟多种颜色的颗粒从各个方向在黑色画布上移动的情景。该数据集通过强制统一球体半径来简化YOLO检测任务,因此标记的宽度和高度在整个数据集中是相同的。

在仿真的开始,粒子从各个方向开始出现,并以0到180度之间的角度向前“游动”,直到它们沿着另一个边缘消失。在仿真过程中,我们收集每一帧及其相关的边界框。

使用仿真让我们摆脱了手动标注边界框的繁琐过程,加快了测试YOLO模型的过程。生成的代码如下:

import random  
import time  
import cv2  
import numpy as np  
  
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('simulation_detection.mp4', fourcc, 50.0, (frame_width, frame_height))  
  
def create_particle():  
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if uniform_random <= 0.25:  
        # 从底部开始  
        position = (random.randint(radius, frame_width - radius), radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 0.5:  
        # 从顶部开始  
        position = (random.randint(radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
    elif uniform_random <= 0.75:  
        # 从左侧开始  
        position = (radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(-90, 90)  
        start_pos = "left"  
    else:  
        # 从右侧开始  
        position = (frame_width - radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(90, 270)  
        start_pos = "right"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos}  
  
  
def move_particle(particle):  
      
    if particle['start_pos']=='bottom':  
        angle = random.randint(0, 180)  
    elif particle['start_pos']=='top':  
        angle = random.randint(180, 360)  
    elif particle['start_pos']=='left':  
        angle = random.randint(-90, 90)  
    elif particle['start_pos']=='right':  
        angle = random.randint(90, 270)  
      
    angle_rad = np.deg2rad(angle)  
    dx = int(particle['radius'] * np.cos(angle_rad))  
    dy = int(particle['radius'] * np.sin(angle_rad))  
    x, y = particle['position']  
    particle['position'] = (x + dx, y + dy)  
  
def is_off_screen(particle):  
    x, y = particle['position']  
    return x < 1 or x > frame_width-1 or y < 1 or y > frame_height-1  
  
def draw_frame(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
        x, y = particle['position']  
        # cv2.rectangle(frame, (x - 2* particle['radius'], y - 2 * particle['radius']), (x + 2 * particle['radius'], y + 2 * particle['radius']), (0, 255, 0), 1)  
        bounding_boxes.append({'x_center': x, 'y_center': y, 'width': particle['radius'], 'height': particle['radius']})  
          
    return frame, bounding_boxes  
  
  
def simulate_particles(total_data):  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0   
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame, bounding_boxes = draw_frame(particles)  
        total_data.append({'frame': frame, 'boundary_boxes': bounding_boxes})  
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
      
    return total_data  
  
  
total_data = []  
  
for i in range(12):  
    total_data = simulate_particles(total_data)

多彩粒子抖动的模拟。GIF由作者制作。

在收集原始帧及其相关的边界框后,我们将把这些数据安排成一个30x30的网格张量,具有9个输出通道,其中每个网格单元最多可以容纳3个粒子。由于粒子的宽度和高度是固定的,并且只有一种类型的对象需要被检测,因此问题大大简化。因此,对于每个网格单元中的每个粒子,我们只需考虑这个向量:

[物体中点被检测到的概率,X的中点,Y的中点]

每个 30 x 30 的网格张量将成为 y_true 列表中的一个单一数据点。我们还将每个 600 x 600 的帧调整大小为 240 x 240 的数据点,以供 X_true 列表使用。

def convert_data(total_data):  
  
    grid_size = 30  
    cell_size = 600 // grid_size  # 每个单元格为20x20像素  
  
    X_true = np.array([data['frame'] for data in total_data])  
    y_true = np.zeros((len(total_data), grid_size, grid_size, 9))    
  
    for i, data in enumerate(total_data):  
        frame = data['frame']  
        boxes = data['boundary_boxes']  
        for box in boxes:  
            x_center = box['x_center']  
            y_center = box['y_center']  
            width = box['width']  
            height = box['height']  
  
            # 确定网格单元的索引  
            grid_x = int(x_center / cell_size)   
            grid_y = int(y_center / cell_size)   
  
            if y_true[i, grid_y, grid_x, 0] == 0:  # 检查第一个槽位是否可用  
                y_true[i, grid_y, grid_x, 0] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 1] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 2] = (y_center % cell_size) / cell_size   # 局部 y_center  
                  
            elif y_true[i, grid_y, grid_x, 3] == 0:  # 检查第二个槽位是否可用  
                y_true[i, grid_y, grid_x, 3] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 4] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 5] = (y_center % cell_size) / cell_size   # 局部 y_center  
                  
            elif y_true[i, grid_y, grid_x, 6] == 0: # 检查第三个槽位是否可用  
                y_true[i, grid_y, grid_x, 6] = 1  # 粒子存在  
                y_true[i, grid_y, grid_x, 7] = (x_center % cell_size) / cell_size   # 局部 x_center  
                y_true[i, grid_y, grid_x, 8] = (y_center % cell_size) / cell_size   # 局部 y_center  
  
  
    return X_true, y_true  
  
X_true, y_true = convert_data(total_data)
from sklearn.model_selection import train_test_split  
  
resized_images = np.zeros((len(X_true), 240, 240, 3))    
  
for i in range(X_true.shape[0]):  
    resized_images[i] = cv2.resize(X_true[i], (240, 240))  
  
resized_images = resized_images / 255.0  
X_true = resized_images  
  
X_train, X_test, y_train, y_test = train_test_split(  
    X_true,  
    y_true,  
    test_size=0.03,  
    random_state=42  
)

3. 训练YOLO模型

接下来,我们准备实例化我们的YOLO模型,过程非常简单,可以轻松地通过TensorFlow Keras框架实现。除了卷积层外,我们还应用了三个2x2的最大池化层,将240x240的输入减少为30x30的输出。

更有趣的部分是损失函数的设计,其中包含物体的网格单元的损失值相比于没有物体的网格单元的损失值被放大,从而使模型优先“关注”含有物体的网格。因此,忽略没有物体的网格单元对应的损失值。

import tensorflow as tf  
from tensorflow.keras.models import Sequential  
from tensorflow.keras.optimizers import RMSprop, Adam  
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Reshape, Resizing  
  
model = Sequential([  
    Conv2D(32, (3, 3), padding='same', activation='relu'),  # 输入为(240,240,3)  
    MaxPooling2D(2, 2),  
    Conv2D(64, (3, 3), padding='same', activation='relu'),  
    MaxPooling2D(2, 2),  
    Conv2D(128, (3, 3), padding='same', activation='relu'),  
    MaxPooling2D(2, 2),  
    Conv2D(128, (3, 3), padding='same', activation='relu'),  
    Conv2D(256, (3, 3), padding='same', activation='relu'),  
    Conv2D(9, (1, 1), padding='same', activation='sigmoid'),  # 输出为(30, 30, 9)  
])  
  
def yolo_loss(y_true, y_pred):  
    # 存在掩码(物体存在时为1)  
    object_mask = y_true[:,:,:, 0]  
    object_mask_2 = y_true[:,:,:, 3]  
    object_mask_3 = y_true[:,:,:, 6]  
  
    # 缺失掩码(物体不存在时为1)  
    no_object_mask = 1 - object_mask  
    no_object_mask_2 = 1 - object_mask_2  
    no_object_mask_3 = 1 - object_mask_3  
  
    # 置信度的物体损失(物体所在单元的二元交叉熵)  
    object_loss_1 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 0], -1), tf.expand_dims(y_pred[:,:,:, 0], -1))  
    object_loss_2 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 3], -1), tf.expand_dims(y_pred[:,:,:, 3], -1))  
    object_loss_3 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 6], -1), tf.expand_dims(y_pred[:,:,:, 6], -1))  
    object_loss = tf.reduce_sum(object_loss_1 * object_mask) + tf.reduce_sum(object_loss_2 * object_mask_2) + tf.reduce_sum(object_loss_3 * object_mask_3)  
    object_loss *= 10  
  
    # 置信度的缺失物体损失(物体不存在单元的二元交叉熵)  
    no_object_loss_1 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 0], -1), tf.expand_dims(y_pred[:,:,:, 0], -1))  
    no_object_loss_2 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 3], -1), tf.expand_dims(y_pred[:,:,:, 3], -1))  
    no_object_loss_3 = tf.keras.losses.binary_crossentropy(tf.expand_dims(y_true[:,:,:, 6], -1), tf.expand_dims(y_pred[:,:,:, 6], -1))  
    no_object_loss = tf.reduce_sum(no_object_loss_1 * no_object_mask) + tf.reduce_sum(no_object_loss_2 * no_object_mask_2) + tf.reduce_sum(no_object_loss_3 * no_object_mask_3)  
  
    # 边界框损失(仅对有物体的单元)  
    bbox_loss = tf.reduce_sum(tf.square(y_true[:,:,:, 1:3] - y_pred[:,:,:, 1:3]) * tf.expand_dims(object_mask, -1))  
    bbox_loss += tf.reduce_sum(tf.square(y_true[:,:,:, 4:6] - y_pred[:,:,:, 4:6]) * tf.expand_dims(object_mask_2, -1))  
    bbox_loss += tf.reduce_sum(tf.square(y_true[:,:,:, 7:9] - y_pred[:,:,:, 7:9]) * tf.expand_dims(object_mask_3, -1))  
  
    # 总损失包括物体和缺失物体的损失  
    total_loss = object_loss + no_object_loss + bbox_loss  
      
    return total_loss  
  
model.compile(  
    optimizer=RMSprop(learning_rate=1e-3),   
    loss=yolo_loss  
)
model.fit(  
X_train,  
y_train,  
epochs=300,  
batch_size=8,  
validation_data=[X_test,y_test],  
verbose = 1,  
callbacks=callbacks  
)

4. 使用YOLO模型进行推理

在使用YOLO模型进行推理时,有时我们需要实现一种称为非极大值抑制的功能,以过滤掉多个指向同一对象的边界框。算法如下:

  1. 按照检测到的物体置信度以降序对边界框进行排序。
  2. 从具有最高置信度的边界框开始,计算其区域与其他每个边界框的交并比(IOU),如果IOU超过某个阈值,我们将移除该特定边界框的检测。

如果每个网格单元可以包含多个对象,则必须计算在不同网格单元中检测到的对象的非极大抑制。

尽管如此,对于每个对象多个边界框的问题在物体相对于网格单元较大的情况下更为突出。在我们的情况下,粒子的宽度几乎与网格单元相同,因此我们可以安全地省略推理中的非极大抑制。

以下是我们的实现:

    model =  tf.keras.models.load_model("YOLO_particle_detector", custom_objects={'yolo_loss': yolo_loss})
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  # 或 'DIVX'  
out = cv2.VideoWriter('inference_detections.mp4', fourcc, 50.0, (frame_width, frame_height))  
  
  
def convert_to_absolute_coordinates(predictions, cell_size=20):  
      
    absolute_predictions = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
            if x_grid[0] > 0.5:  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index)})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[5], 'grid': (y_index,x_index)})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[5], 'grid': (y_index,x_index)})  
              
    return absolute_predictions  
  
  
def detect(frame):  
      
    frame_resized = cv2.resize(frame,(240,240))  
    frame_normalized = frame_resized / 255  
      
    detections = model(np.expand_dims(frame_normalized,axis=0))  
    predictions = convert_to_absolute_coordinates(np.array(detections))  
      
    for prediction in predictions:      
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
        cv2.rectangle(frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
          
    return frame  
  
def draw_particles(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
          
    return frame  
  
def test_particles():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame = draw_particles(particles)  
        frame = detect(frame)  
          
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()

多彩颗粒的抖动仿真。边界框是使用训练模型推断的。GIF作者。

5. 对象跟踪简介

现在我们有了用于物体检测的YOLO模型,我们可以利用该模型进行下游的物体追踪任务。在这里,我们将从零开始构建一个定制的物体追踪模型,而不参考任何先前的资料。

我们的目标跟踪模型将对两个连续帧及其边界框检测进行推理。当一个新的未标记对象进入检测时,模型将在后面的帧中为其分配一个任意(或增量)标签。在第二步中,该帧将被分配为具有所有标记检测的前一帧。新的后续帧将根据其关联的标签检测与前一帧建立关联,模型将在后续帧中的未分配检测标签上进行推理。因此,这个周期持续下去,唯一的对象及其对应的标签在整个画布上被传播。

一个关于对象追踪如何工作的简单示意图。画布假设为8×8的网格。图片来源:作者。

我们随后设计了一个多输入CNN架构,该架构同时接受连续帧、YOLO检测输出和带有分配检测标签的张量(前一帧)作为输入,以生成训练输出,用于预测输出(后一帧)的检测标签。

下面的图表展示了架构的一个简单示意图。

值得注意的是,输入(前一帧)和输出(后一帧)中的检测身份必须进行独热编码,这也意味着我们必须设置每帧可以容纳的最大对象标签数量。

一个对象追踪 CNN 架构的简单示意图。 图片来源:作者。

6. 粒子模拟和物体追踪的数据收集

与YOLO目标检测相比,对象跟踪的数据收集非常相似,但还需要模拟粒子标签。在我们的模型中,我们假设第一个出现的粒子应从索引1开始,然后随着新粒子的加入,索引将逐步增加。当一个粒子从视野中消失时,它的标签将被回收并排队。排队的标签将在新的粒子出现时立即重新使用,而不是应用新的增量标签。

代码如下所示:

frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('simulation_tracking.mp4', fourcc, 40.0, (frame_width, frame_height))  
particles_disappeared = []  
particles_appeared = []  
particle_max_index = 0  
  
def create_particle_tracking():  
      
    global particles_disappeared, particles_appeared, particle_max_index  
      
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if not particles_disappeared:  
        particle_max_index += 1  
        particle_index = particle_max_index   
        particles_appeared.append(particle_index)  
    else:  
        particle_index = particles_disappeared[0]  
        particles_disappeared = particles_disappeared[1:]  
        particles_appeared.append(particle_index)  
      
      
    if uniform_random <= 0.25:  
        # 从底部开始  
        position = (random.randint(radius, frame_width - radius), radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 0.5:  
        # 从顶部开始  
        position = (random.randint(radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
    elif uniform_random <= 0.75:  
        # 从左侧开始  
        position = (radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(-90, 90)  
        start_pos = "left"  
    else:  
        # 从右侧开始  
        position = (frame_width - radius, random.randint(radius, frame_height - radius))  
        angle = random.randint(90, 270)  
        start_pos = "right"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos, 'particle_index': particle_index}  
  
  
def draw_frame_tracking(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
        # 绘制边界框  
        x, y = particle['position']  
        cv2.rectangle(frame, (x - 2* particle['radius'], y - 2 * particle['radius']), (x + 2 * particle['radius'], y + 2 * particle['radius']), (0, 255, 0), 1)  
        cv2.putText(frame,f"#{particle['particle_index']}", (x - particle['radius'] - 10,y - particle['radius'] - 15), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
        confidence = np.random.uniform(0.99,0.9999999)  
  
        bounding_boxes.append({'x_center': x, 'y_center': y, 'width': particle['radius']*4, 'height': particle['radius']*4, 'index': particle['particle_index'], 'confidence': confidence})  
          
    return frame, bounding_boxes  
  
total_data = []  
  
def simulate_particles_tracking():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 5 == 0:  
            total_particles_created += 1  
            particles.append(create_particle_tracking())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles_appeared.remove(particle['particle_index'])  
                particles_disappeared.append(particle['particle_index'])  
                particles.remove(particle)  
  
        frame, bounding_boxes = draw_frame_tracking(particles)  
        total_data.append({'frame': frame, 'boundary_boxes': bounding_boxes})  
        out.write(frame)  
        cv2.imshow('Frame', frame)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
      
    return total_data  
  
  
for i in range(80):  
    total_data = simulate_particles_tracking()

在我们收集了原始数据之后,将其处理成我们物体跟踪的CNN架构所需的格式需要更多的工作。需要注意的是,该模型需要几个输入数组,下面的代码基于我们收集的原始Python字典整洁地提取并捕获它们。

在将格式化的数据分割为训练集和测试集时,我们选择按时间顺序而非随机顺序进行分割,以便测试数据与训练数据无任何相关性,因为它们是按顺序收集的。

此外,为了训练大量的模拟数据,我们还将训练集和测试集转化为可以在有限的GPU内存资源上进行训练的生成器。

def resize(X_true):  
  
    resized_images = np.zeros((len(X_true), 240, 240, 3))    
    for i in range(X_true.shape[0]):  
        resized_images[i] = cv2.resize(X_true[i], (240, 240))  
  
    resized_images = resized_images / 255.0      
    return resized_images  
  
def convert_data_tracking(total_data):  
  
    grid_size = 30  
    cell_size = 600 // grid_size  # 每个单元格是20x20像素  
  
    # 初始化数组  
    first_frames = resize(np.array([data['frame'] for data in total_data[:-1]]))  
    second_frames = resize(np.array([data['frame'] for data in total_data[1:]]))  
  
    X_true_frames = np.concatenate([first_frames, second_frames],axis=-1)  
  
    del first_frames  
    del second_frames  
      
    X_true_detection = np.zeros((len(total_data), grid_size, grid_size, 12))  # 每个单元的12个输出  
    y_true = np.zeros((len(total_data)-1, grid_size, grid_size, 24))  
    X_true_first_indices = np.zeros((len(total_data)-1, grid_size, grid_size, 24))  
  
  
    for i, data in tqdm.tqdm(enumerate(total_data)):  
  
        boxes = data['boundary_boxes']  
        for box in boxes:  
            x_center = box['x_center']  
            y_center = box['y_center']  
            confidence = box['confidence']  
            particle_index = box['index']  
              
            # 确定网格单元索引  
            grid_x = int(x_center / cell_size)   
            grid_y = int(y_center / cell_size)  
  
            if X_true_detection[i, grid_y, grid_x, 0] == 0:  # 检查第一个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 0] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 1] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 2] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 9] = particle_index  
                  
            elif X_true_detection[i, grid_y, grid_x, 3] == 0:  # 检查第二个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 3] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 4] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 5] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 10] = particle_index     
                  
            elif X_true_detection[i, grid_y, grid_x, 6] == 0:   # 检查第三个位置是否可用  
                X_true_detection[i, grid_y, grid_x, 6] = confidence  # 粒子存在  
                X_true_detection[i, grid_y, grid_x, 7] = (x_center % cell_size) / cell_size   # 局部 x_center  
                X_true_detection[i, grid_y, grid_x, 8] = (y_center % cell_size) / cell_size   # 局部 y_center  
                X_true_detection[i, grid_y, grid_x, 11] = particle_index   
                  
      
    for i, data in enumerate(X_true_detection[1:,:,:,9:]):  
        for j, y_index in enumerate(data):  
            for k, x_index in enumerate(y_index):  
                for particle in x_index:  
                    if particle > 0:  
                        y_true[i, j, k, int(particle)-1] = 1  
  
    for i, data in enumerate(X_true_detection[:-1,:,:,9:]):  
        for j, y_index in enumerate(data):  
            for k, x_index in enumerate(y_index):  
                for particle in x_index:  
                    if particle > 0:  
                        X_true_first_indices[i, j, k, int(particle)-1] = 1  
                          
    X_true_first_detection = X_true_detection[:-1,:,:,:9]  
    X_true_second_detection = X_true_detection[1:,:,:,:9]  
  
    del X_true_detection  
  
    X_true_both_detection = np.concatenate([X_true_first_detection, X_true_first_indices],axis=-1)   
    X_true_both_detection = np.concatenate([X_true_both_detection, X_true_second_detection],axis=-1)   
  
    X_true = [X_true_frames, X_true_both_detection]  
  
    return X_true, y_true  
  
X_true, y_true = convert_data_tracking(total_data)
[X_true_frames, X_true_both_detection] =  X_true  
  
split_index = int(len(X_true_frames) * 0.97)  
  
X_frames_train = X_true_frames[:split_index]  
X_frames_test = X_true_frames[split_index:]  
  
X_detections_train = X_true_both_detection[:split_index]  
X_detections_test = X_true_both_detection[split_index:]  
  
y_train = y_true[:split_index]  
y_test = y_true[split_index:]
def train_generator():  
    for i in range(len(X_frames_train)):  
        yield ((X_frames_train[i], X_detections_train[i]), y_train[i])  
  
def test_generator():  
    for i in range(len(X_frames_test)):  
        yield ((X_frames_test[i], X_detections_test[i]), y_test[i])  
          
          
train_dataset = tf.data.Dataset.from_generator(  
    train_generator,  
    output_types=((tf.float32,tf.float32), np.float32),  
    output_shapes=(((None,None,None), (None,None,None)), (None,None,None))  # 根据实际数据形状调整这些形状  
)  
  
# 创建测试数据集  
test_dataset = tf.data.Dataset.from_generator(  
    test_generator,  
    output_types=((tf.float32,tf.float32), np.float32),  
    output_shapes=(((None, None, None), (None,None,None)), (None,None,None))  # 根据实际数据形状调整这些形状  
)  
  
# 定义批量大小和预取  
train_dataset = train_dataset.batch(32).prefetch(tf.data.AUTOTUNE)  
test_dataset = test_dataset.batch(32).prefetch(tf.data.AUTOTUNE)

7. 训练物体跟踪模型

利用功能性API,物体跟踪模型的训练也可以使用TensorFlow Keras框架,如下所示的简单架构。输出类似于YOLO模型,采用30乘30的网格,不同的是现在的输出张量有24个通道,表示画布最多可以容纳24个粒子。

此外,输出中我们使用sigmoid激活函数而不是softmax激活函数,因为每个网格单元最多可以容纳3个粒子。因此,例如,如果一个网格单元的所有通道都是0,除了索引5和12接近于1,这意味着标签为5和12的粒子存在于该网格单元中。

在这个框架中,输出张量将是稀疏的,在推理过程中,我们只检查YOLO模型检测到物体的网格单元。因此,我们设计了一个自定义跟踪损失函数,仅考虑包含至少一个检测到的物体的网格单元中的损失值,然后为对象标签存在的通道缩放损失值。

from tensorflow.keras.optimizers import RMSprop, Adam  
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Reshape, Resizing, concatenate  
  
input_frames = tf.keras.Input(shape=X_frames_train.shape[1:])  
input_detections = tf.keras.Input(shape=X_detections_train.shape[1:])  
  
x = Conv2D(32, (3, 3), padding='same', activation='relu')(input_frames)   # 输入为 (240,240,3)  
x = MaxPooling2D(2, 2)(x)  
x = Conv2D(64, (3, 3), padding='same', activation='relu')(x)  
x = MaxPooling2D(2, 2)(x)  
x = Conv2D(128, (3, 3), padding='same', activation='relu')(x)  
x = MaxPooling2D(2, 2)(x)  
  
x = concatenate([x, input_detections])  
x = Conv2D(256, (3, 3), padding='same', activation='relu')(x)  
  
x = Conv2D(256, (1, 1), padding='same', activation='relu')(x)  
x = Conv2D(128, (1, 1), padding='same', activation='relu')(x)  
output = Conv2D(24, (1, 1), padding='same', activation='sigmoid')(x)    # 输出为 (30, 30, 24)  
  
model = tf.keras.Model(inputs=[input_frames, input_detections], outputs=output)   
  
def tracking_loss(y_true, y_pred):  
    # 存在掩码(对象存在时为1)  
    object_mask = y_true  
  
    # 不存在掩码(对象不存在时为1)  
    no_object_mask = 1 - y_true  
  
    mask = tf.reduce_max(y_true, axis=-1, keepdims=True)  
    mask = tf.cast(mask, dtype=tf.float32)  
    expanded_mask = tf.repeat(mask, repeats=24, axis=-1)  
  
    # 对象损失(对于有对象的单元的二元交叉熵)  
    object_loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)  
    object_loss = tf.reduce_sum(tf.expand_dims(object_loss,-1) * object_mask)   
    object_loss *= 5  
  
    # 无对象损失(对于没有对象的单元的二元交叉熵,针对有对象的网格)  
    no_object_loss = tf.keras.losses.binary_crossentropy(y_true, y_pred)  
    no_object_loss = tf.reduce_sum(tf.expand_dims(no_object_loss,-1) * expanded_mask)   
  
    total_loss = object_loss + no_object_loss  
    return total_loss  
  
def thresholded_accuracy(y_true, y_pred):  
  
    threshold = 0.5  
    y_pred_thresholded = tf.cast(y_pred > threshold, tf.float32)  
    return tf.keras.metrics.binary_accuracy(y_true, y_pred_thresholded)  
  
model.compile(  
    optimizer=RMSprop(learning_rate=1e-3),   
    loss=tracking_loss,  
    metrics=thresholded_accuracy  
)  
  
model.fit(  
    train_dataset,  
    epochs=300,  
    validation_data=test_dataset,  
    verbose = 1,  
    callbacks=callbacks  
)

8. 使用对象追踪模型进行推断

最后,在我们拥有训练好的YOLO和跟踪器模型之后,我们来到了项目的核心部分,这也是编码中最棘手的部分。在我们处理好代码逻辑以确保多模态系统的输入和输出就位后,最困难的部分涉及确保粒子的标签是递增初始化的,并且在单个画布中不重复。

虽然跟踪模型已成功训练以在连续帧之间传播标签,但在推理过程中有两个问题需要手动硬编码:

  1. 强制在粒子出现时增量缩放标签。如果有旧粒子离开视图,则将回收的、排队的标签应用于新粒子。当应用跟踪模型而没有任何干预时,标签几乎是随机分配的。
  2. 重复标签发生在新粒子出现时。当这种情况发生时,跟踪模型必须重新调整,以根据我们期望的框架给新的粒子打标签。

我们最终能够实现模型的预期行为,经过应用以下详细代码:

    detection_model =  tf.keras.models.load_model("YOLO_particle_detector", custom_objects={'yolo_loss': yolo_loss})  
tracking_model = tf.keras.models.load_model("YOLO_particle_tracker", custom_objects={'tracking_loss': tracking_loss})
frame_height, frame_width = 600, 600  
  
fourcc = cv2.VideoWriter_fourcc(*'XVID')  
out = cv2.VideoWriter('inference_tracking.mp4', fourcc, 40.0, (frame_width, frame_height))  
particles_disappeared = []  
particles_appeared = []  
particles_appeared_pos = []  
particle_max_index = 0  
consecutive_frames = deque(maxlen=2)  
indices_matrix = []  
  
  
def convert_to_absolute_coordinates(predictions, cell_size=20):  
      
    absolute_predictions = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
            if x_grid[0] > 0.5:  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index)})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[3], 'grid': (y_index,x_index)})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[6], 'grid': (y_index,x_index)})  
              
    return absolute_predictions  
  
  
def convert_to_tracking_data(predictions, cell_size=20):  
      
    global particles_disappeared, particles_appeared, particle_max_index, particles_appeared_pos  
      
    absolute_predictions = []  
    current_particles = []  
  
    for y_index, y_grid in enumerate(predictions[0]):  
          
        for x_index, x_grid in enumerate(y_grid):  
              
              
            detection_indices = x_grid[9:]              
            sorted_detection_indices = np.argsort(detection_indices)[::-1]  
            sorted_probabilities = np.sort(detection_indices)[::-1]             
              
            if x_grid[0] > 0.5:  
  
                x_center = x_grid[1] * cell_size + (x_index * cell_size)  
                y_center = x_grid[2] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[0] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                 
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)  
  
                current_particles.append(particle_index)   
                  
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[0], 'grid': (y_index,x_index), 'index': particle_index})  
                  
            if x_grid[3] > 0.5:  
                x_center = x_grid[4] * cell_size + (x_index * cell_size)  
                y_center = x_grid[5] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[1] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
  
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
                  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)  
  
  
                current_particles.append(particle_index)                    
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[3], 'grid': (y_index,x_index), 'index': particle_index})  
                  
            if x_grid[6] > 0.5:  
                x_center = x_grid[7] * cell_size + (x_index * cell_size)  
                y_center = x_grid[8] * cell_size + (y_index * cell_size)  
                width = 40  
                height = 40  
                particle_index = sorted_detection_indices[2] + 1  
  
                if particle_index not in particles_appeared:  
                    if particle_index > particle_max_index + 1:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            while particle_max_index in particles_appeared:  
                                particle_max_index += 1  
                                if particle_max_index == 25:  
                                    particle_max_index = 1  
                            particle_index = particle_max_index  
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
         
                while particle_index in particles_appeared:  
  
                    x, y = particles_appeared_pos[particles_appeared.index(particle_index)]  
                    distance = np.sqrt((x_center-x)**2 + (y_center-y)**2)  
  
                    if distance < 20:  
                        break  
                    else:  
                        if not particles_disappeared:  
                            particle_max_index += 1  
                            if particle_max_index == 25:  
                                particle_max_index = 1  
                            particle_index = particle_max_index                   
                        else:  
                            particle_index = particles_disappeared[0]  
                            particles_disappeared = particles_disappeared[1:]  
                  
                if particle_index not in particles_appeared:  
                    particles_appeared.append(particle_index)       
                    particles_appeared_pos.append((x_center,y_center))   
                else:  
                     particles_appeared_pos[particles_appeared.index(particle_index)] = (x_center,y_center)   
  
  
                current_particles.append(particle_index)                            
                  
                absolute_predictions.append({'x_center': x_center, 'y_center': y_center, 'width': width, 'height': height, 'confidence': x_grid[6], 'grid': (y_index,x_index), 'index': particle_index})  
                  
    for particle_index in particles_appeared:  
        if particle_index not in current_particles:  
            particles_disappeared.append(particle_index)  
            particles_appeared_pos.pop(particles_appeared.index(particle_index))  
            particles_appeared.remove(particle_index)  
              
    return absolute_predictions  
  
  
def create_particle():  
    color = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))  
    radius = 10  # 粒子的半径  
    uniform_random = np.random.uniform()  
      
    if uniform_random <= 0.50:  
        # 从底部开始  
        position = (random.randint(radius, int((frame_width - radius)/2))-radius, radius)  
        angle = random.randint(0, 180)  
        start_pos = "bottom"  
    elif uniform_random <= 1.0:  
        # 从顶部开始  
        position = (random.randint(int((frame_width - radius)/2)+radius, frame_width - radius), frame_height - radius)  
        angle = random.randint(180, 360)  
        start_pos = "top"  
      
    return {'position': position, 'color': color, 'radius': radius, 'angle': angle, 'start_pos': start_pos}  
  
  
def move_particle(particle):  
      
    if particle['start_pos']=='bottom':  
        angle = 90  
    elif particle['start_pos']=='top':  
        angle = 270  
    elif particle['start_pos']=='left':  
        angle = 0  
    elif particle['start_pos']=='right':  
        angle = 180  
      
    angle_rad = np.deg2rad(angle)  
    dx = int(particle['radius'] * np.cos(angle_rad))  
    dy = int(particle['radius'] * np.sin(angle_rad))  
    x, y = particle['position']  
    particle['position'] = (x + dx, y + dy)  
    particle['displacement'] = np.sqrt(dx**2 + dy**2)  
  
  
def draw_particles(particles):  
    frame = np.zeros((frame_height, frame_width, 3), dtype=np.uint8)  
    bounding_boxes = []  
    for particle in particles:  
        cv2.circle(frame, particle['position'], particle['radius'], particle['color'], -1)  
          
    return frame  
  
  
def detect(consecutive_frames):  
      
    global particles_disappeared, particles_appeared, particle_max_index, particles_appeared_pos  
      
    frame = consecutive_frames[0]  
  
    frame_resized = cv2.resize(frame,(240,240))  
    frame_normalized = frame_resized / 255  
      
    detections = detection_model(np.expand_dims(frame_normalized,axis=0))  
      
    detection_indices = np.zeros((1, 30, 30, 24))    
    predictions = convert_to_absolute_coordinates(np.array(detections))  
      
    for prediction in predictions:    
          
        if not particles_disappeared:  
            particle_max_index += 1  
            particle_index = particle_max_index  
            particles_appeared.append(particle_index)  
  
        else:  
            particle_index = particles_disappeared[0]  
            particles_disappeared = particles_disappeared[1:]  
            particles_appeared.append(particle_index)    
  
              
        (grid_y, grid_x) = prediction['grid']  
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
  
        particles_appeared_pos.append((x,y))  
          
        cv2.rectangle(frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
        cv2.putText(frame,f"#{particle_index}", (x -20, y - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
          
        detection_indices[0,grid_y, grid_x, particle_index-1] = 1  
           
    X_first_detection = np.concatenate([detections, detection_indices],axis=-1)    
          
    return frame, X_first_detection     
      
      
def detect_and_track(consecutive_frames, X_first_detection):  
      
    global particles_disappeared, particles_appeared, particle_max_index, indices_matrix  
          
    first_frame = consecutive_frames[0]  
    second_frame = consecutive_frames[1]  
      
    first_frame_resized = cv2.resize(first_frame,(240,240))  
    first_frame_normalized = first_frame_resized / 255  
      
    second_frame_resized = cv2.resize(second_frame,(240,240))  
    second_frame_normalized = second_frame_resized / 255  
      
    second_detections = detection_model(np.expand_dims(second_frame_normalized,axis=0))  
    second_detection_indices = np.zeros((1, 30, 30, 24))    
      
    X_detections = np.concatenate([X_first_detection, second_detections], axis=-1)  
      
    X_frames = np.concatenate([first_frame_normalized, second_frame_normalized],axis=-1)  
    X_frames = np.expand_dims(X_frames,axis=0)  
  
    second_indices = tracking_model([X_frames, X_detections])  
  
    indices_matrix.append(second_indices)  
      
    second_data = np.concatenate([second_detections, second_indices], axis=-1)  
      
    predictions = convert_to_tracking_data(second_data)  
  
      
    for prediction in predictions:  
        (grid_y, grid_x) = prediction['grid']  
        x = int(prediction['x_center'])  
        y = int(prediction['y_center'])  
        particle_index = int(prediction['index'])  
        cv2.rectangle(second_frame, (x - 20, y - 20), (x + 20, y + 20), (0, 255, 0), 1)  
        cv2.putText(second_frame,f"#{particle_index}", (x -20, y - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255),1)  
          
        second_detection_indices[0,grid_y, grid_x, particle_index-1] = 1  
              
    X_first_detection = np.concatenate([second_detections, second_detection_indices],axis=-1)    
  
      
    return second_frame, X_first_detection  
  
  
def test_particles_tracking():  
    particles = []  
    max_particles = 50  
    total_particles_created = 0  
    timer = 0     
  
    while len(particles) > 0 or total_particles_created < max_particles:  
        if total_particles_created < max_particles and timer % 10 == 0:  
            total_particles_created += 1  
            particles.append(create_particle())  
  
        for particle in particles[:]:  
            move_particle(particle)  
            if is_off_screen(particle):  
                particles.remove(particle)  
  
        frame = draw_particles(particles)  
        consecutive_frames.append(frame)  
          
        if len(consecutive_frames) == 1:  
              
            frame_to_display, X_first_detection = detect(consecutive_frames)  
              
        elif len(consecutive_frames) == 2:  
              
            frame_to_display, X_first_detection = detect_and_track(  
                consecutive_frames,  
                X  
            )  
              
        X = X_first_detection  
          
        out.write(frame_to_display)  
          
        cv2.imshow('Frame', frame_to_display)  
        if cv2.waitKey(1) & 0xFF == ord('q'):  
            break  
        timer += 1  
  
    out.release()  
    cv2.destroyAllWindows()  
  
  
test_particles_tracking()

对上下两个车道中多颜色颗粒的检测和跟踪进行模拟,就像道路上的汽车一样。GIF作者提供。

9. 结论和最后的思考

推理结果相当稳健,但还有最后一个问题,这个问题困扰着其他目标跟踪模型,并且很可能是一个活跃的研究领域——当有两个或更多重叠的物体时,跟踪模型更容易出现混乱。

例如,当一个 #4 物体和 #8 物体交叉时,它们的物体标签在相互离开后可能会交换。这是目标跟踪中的一个令人烦恼的问题。解决这个问题的一种方法是使用多个帧(而不是 2)作为输入;然而,如果两个物体长时间保持靠近,然后再分开,这种方法将变得毫无用处。

另一个我想到的想法是使用一个特定长度的嵌入向量(表示裁剪后的对象),可以与模型的中间层连接。这个仍有待观察,我将在不久的将来对其进行实验。

[2024年8月3日] 更新:我成功地对跟踪算法和训练进行了些调整,现在它能够更好地处理重叠对象:

晃动的多色粒子有时会重叠,但跟踪依然稳健。GIF由作者提供。

最后,恭喜你完成了这个教程!我希望这篇文章成功地指导你从头开始编码YOLO和目标跟踪。接下来,我打算开发一个视觉模型,它将使用基于图的视觉变换器(我们称之为GraphViT)。如果你对我的工作感兴趣,请留意!

相关推荐

SpringBoot如何实现优雅的参数校验
SpringBoot如何实现优雅的参数校验

平常业务中肯定少不了校验,如果我们把大量的校验代码夹杂到业务中,肯定是不优雅的,对于一些简单的校验,我们可以使用java为我们提供的api进行处理,同时对于一些...

2025-05-11 19:46 ztj100

Java中的空指针怎么处理?

#暑期创作大赛#Java程序员工作中遇到最多的错误就是空指针异常,无论你多么细心,一不留神就从代码的某个地方冒出NullPointerException,令人头疼。...

一坨一坨 if/else 参数校验,被 SpringBoot 参数校验组件整干净了

来源:https://mp.weixin.qq.com/s/ZVOiT-_C3f-g7aj3760Q-g...

用了这两款插件,同事再也不说我代码写的烂了

同事:你的代码写的不行啊,不够规范啊。我:我写的代码怎么可能不规范,不要胡说。于是同事打开我的IDEA,安装了一个插件,然后执行了一下,规范不规范,看报告吧。这可怎么是好,这玩意竟然给我挑出来这么...

SpringBoot中6种拦截器使用场景

SpringBoot中6种拦截器使用场景,下面是思维导图详细总结一、拦截器基础...

用注解进行参数校验,spring validation介绍、使用、实现原理分析

springvalidation是什么在平时的需求开发中,经常会有参数校验的需求,比如一个接收用户注册请求的接口,要校验用户传入的用户名不能为空、用户名长度不超过20个字符、传入的手机号是合法的手机...

快速上手:SpringBoot自定义请求参数校验

作者:UncleChen来源:http://unclechen.github.io/最近在工作中遇到写一些API,这些API的请求参数非常多,嵌套也非常复杂,如果参数的校验代码全部都手动去实现,写起来...

分布式微服务架构组件

1、服务发现-Nacos服务发现、配置管理、服务治理及管理,同类产品还有ZooKeeper、Eureka、Consulhttps://nacos.io/zh-cn/docs/what-is-nacos...

优雅的参数校验,告别冗余if-else

一、参数校验简介...

Spring Boot断言深度指南:用断言机制为代码构筑健壮防线

在SpringBoot开发中,断言(Assert)如同代码的"体检医生",能在上线前精准捕捉业务逻辑漏洞。本文将结合企业级实践,解析如何通过断言机制实现代码自检、异常预警与性能优化三...

如何在项目中优雅的校验参数

本文看点前言验证数据是贯穿所有应用程序层(从表示层到持久层)的常见任务。通常在每一层实现相同的验证逻辑,这既费时又容易出错。为了避免重复这些验证,开发人员经常将验证逻辑直接捆绑到域模型中,将域类与验证...

SpingBoot项目使用@Validated和@Valid参数校验

一、什么是参数校验?我们在后端开发中,经常遇到的一个问题就是入参校验。简单来说就是对一个方法入参的参数进行校验,看是否符合我们的要求。比如入参要求是一个金额,你前端没做限制,用户随便过来一个负数,或者...

28个验证注解,通过业务案例让你精通Java数据校验(收藏篇)

在现代软件开发中,数据验证是确保应用程序健壮性和可靠性的关键环节。JavaBeanValidation(JSR380)作为一个功能强大的规范,为我们提供了一套全面的注解工具集,这些注解能够帮...

Springboot @NotBlank参数校验失效汇总

有时候明明一个微服务里的@Validated和@NotBlank用的好好的,但就是另一个里不能用,这时候问题是最不好排查的,下面列举了各种失效情况的汇总,供各位参考:1、版本问题springbo...

这可能是最全面的Spring面试八股文了

Spring是什么?Spring是一个轻量级的控制反转(IoC)和面向切面(AOP)的容器框架。...

取消回复欢迎 发表评论: