티스토리 뷰

2022.10.13에 직접 리뷰 했던 것을 좀 지나서 블로그에 올립니다. 최신 정보는 아닌 점 참고 바랍니다.

 


YOLO (You Only Look Once)

  • Single stage real-time object detector
  • Performance, Accruate, Active community

  • featurized through a backbone, combined and mixed in the neck, passed along to the head


YOLO-v7

  • AlexeyAB, WongKinYiu
    • AlexeyAB took up the YOLO torch from the original author. Joseph Redmon, when Redmon quit the CV industry due to ethical concerns
    • Scaled-YOLOv4, YOLOR, YOLOv6

SOTA Real-time object detector

  • beat Tranformer, ConvNext
    • vs ‘transformer based’ (SWIN-L Cascade-Mask R-CNN)
      • 509% in speed and 2% in accuracy
    • vs ‘convolution based’ (ConvNeXt-XL Cascade-Mask R-CNN)
      • 551% in speed and 0.7% AP in accuracy
    • also DETR, YOLOR, YOLOX, Scaled-YOLOv4, Deformable DETR,,,

 

  • Contribution
    • Design new trainable bag-of-freebies
      • Optimized modules and optimization methods which may strengthen the training cost for improving the accuracy of object detection, but without increasing the inference cost
        • more robust loss function
        • more efficient label assignment method
        • more efficient training method
    • Propose new archtiecture for real-time object detector & corresponding model scailing method

 

Trainable bag-of-freebies

1. Planned re-parameterized convolution

  • Model re-parameterization
    • merge multiple computational modules into one at inference stage
    • model-level ensemble
      • train multiple identical models with different training data, and then average the weights of multiple trained models
      • weighted average of the weights of models at different iteration number
    • module-level re-parameterization
      • splits a module into multiple identical or different module branches during training and integrates multiple branched modules into a completely equivalent module during inference
  • RepNet

  •  
    • training time -> multi-branch
    • inference time -> re-parameterization : single branch
    • nice performance in VGG, but not in Resnet, DenseNet

  •  
    • use RepConv without identity connection (RepConvN) to design the architecture of planned re-parameterized convolution
      • identity connection in RepConv destroys the residual in ResNet and the concatenation in DenseNet
        • duplicated identity connection and residual connection

2. Coarse for auxiliary and fine for lead loss

  • Deep supervision
    • Add extra auxiliary head in the middle layers of the network, and the shallow network weights with assistant loss as the guide

  • Improve the deep supervision
    • consider together with the ground truth to use some calculation and optimization methods to generate a reliable soft label
    • Call the mechanism that considers the network prediction results together with the ground truth and then assigns soft labels as “label assigner.”
  • (c) Independent assigner
    • previous approach
  • (d) Lead head guided label assigner
    • calculated based on the prediction result of the lead head and the ground truth, and generate soft label through the optimization process
    • shallower auxiliary head directly learn the information that lead head has learned, lead head will be more able to focus on learning residual information that has not yet been learned.
  • (e) Coarse-to-fine lead head guided label assigner
    • lead head prediction as guidance to generate coarse-to-fine hierarchical labels, which are used for auxiliary head and lead head learning
      • Coarse label : allowing more grids → focus on optimizing the recall of auxiliary head
      • Fine label : same with (c)
    • It makes the optimizable upper bound of fine label always higher than coarse label.
    • if auxiliary head learns lead guided soft label, it will indeed help lead head to extract the residual information from the consistant targets

Architecture

1. Extended efficient layer aggregation networks

  • How to design an efficient network? → By controlling the shortest longest gradient path, a deeper network can learn and converge effectively
  • CSPVoVNet analyzes the gradient path, in order to enable the weights of different layers to learn more diverse features
  • E-ELAN uses expand, shuffle, merge cardinality to achieve the ability to continuously enhance the learning ability of the network without destroying the original gradient path
    • use group convolution to expand the channel and cardinality of computational blocks

2. Model scaling for concatenation-based models

  • Purpose
    • adjust some attributes of the model and generate models of different scales to meet the needs of different inference speeds

  • scale example in EfficientNet (width, depth, resolution)

  • when scaling up or scaling down is performed on depth, the in-degree of a translation layer which is immediately after a concatenation-based computational block will decrease or increase
    • cannot analyze different scaling factors separately for a concatenation-based model but must be considered together
  • propose the corresponding compound model scaling method for a concatenation-based model
    • When we scale the depth factor of a computational block, we must also calculate the change of the output channel of that block

 

Experiments

  • COCO dataset
  • Edge GPU / normal GPU / cloud GPU → YOLOv7-tiny, YOLOv7, YOLOv7-W6
  • compound scaling up → YOLOv7-X, YOLOv7-E6, YOLOv7-D6
  • E-ELAN + YOLOv7-E6 → YOLOv7-E6E

ETC

  • github 6.2k stars, very active community
  • support to export tensorRT
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/04   »
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
글 보관함