[Review] YOLOv7 Review, Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

티스토리 뷰

Paper Review

[Review] YOLOv7 Review, Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

꿈꾸는컴퓨터 2023. 12. 21. 17:49

2022.10.13에 직접 리뷰 했던 것을 좀 지나서 블로그에 올립니다. 최신 정보는 아닌 점 참고 바랍니다.

YOLO (You Only Look Once)

Single stage real-time object detector
Performance, Accruate, Active community

featurized through a backbone, combined and mixed in the neck, passed along to the head

YOLO-v7

AlexeyAB, WongKinYiu
- AlexeyAB took up the YOLO torch from the original author. Joseph Redmon, when Redmon quit the CV industry due to ethical concerns
- Scaled-YOLOv4, YOLOR, YOLOv6

SOTA Real-time object detector

beat Tranformer, ConvNext
- vs ‘transformer based’ (SWIN-L Cascade-Mask R-CNN)
  - 509% in speed and 2% in accuracy
- vs ‘convolution based’ (ConvNeXt-XL Cascade-Mask R-CNN)
  - 551% in speed and 0.7% AP in accuracy
- also DETR, YOLOR, YOLOX, Scaled-YOLOv4, Deformable DETR,,,

Contribution
- Design new trainable bag-of-freebies
  - Optimized modules and optimization methods which may strengthen the training cost for improving the accuracy of object detection, but without increasing the inference cost
    - more robust loss function
    - more efficient label assignment method
    - more efficient training method
- Propose new archtiecture for real-time object detector & corresponding model scailing method

Trainable bag-of-freebies

1. Planned re-parameterized convolution

Model re-parameterization
- merge multiple computational modules into one at inference stage
- model-level ensemble
  - train multiple identical models with different training data, and then average the weights of multiple trained models
  - weighted average of the weights of models at different iteration number
- module-level re-parameterization
  - splits a module into multiple identical or different module branches during training and integrates multiple branched modules into a completely equivalent module during inference
RepNet

- training time -> multi-branch
- inference time -> re-parameterization : single branch
- nice performance in VGG, but not in Resnet, DenseNet

- use RepConv without identity connection (RepConvN) to design the architecture of planned re-parameterized convolution
  - identity connection in RepConv destroys the residual in ResNet and the concatenation in DenseNet
    - duplicated identity connection and residual connection

2. Coarse for auxiliary and fine for lead loss

Deep supervision
- Add extra auxiliary head in the middle layers of the network, and the shallow network weights with assistant loss as the guide

Improve the deep supervision
- consider together with the ground truth to use some calculation and optimization methods to generate a reliable soft label
- Call the mechanism that considers the network prediction results together with the ground truth and then assigns soft labels as “label assigner.”
(c) Independent assigner
- previous approach
(d) Lead head guided label assigner
- calculated based on the prediction result of the lead head and the ground truth, and generate soft label through the optimization process
- shallower auxiliary head directly learn the information that lead head has learned, lead head will be more able to focus on learning residual information that has not yet been learned.
(e) Coarse-to-fine lead head guided label assigner
- lead head prediction as guidance to generate coarse-to-fine hierarchical labels, which are used for auxiliary head and lead head learning
  - Coarse label : allowing more grids → focus on optimizing the recall of auxiliary head
  - Fine label : same with (c)
- It makes the optimizable upper bound of fine label always higher than coarse label.
- if auxiliary head learns lead guided soft label, it will indeed help lead head to extract the residual information from the consistant targets

Architecture

1. Extended efficient layer aggregation networks

How to design an efficient network? → By controlling the shortest longest gradient path, a deeper network can learn and converge effectively
CSPVoVNet analyzes the gradient path, in order to enable the weights of different layers to learn more diverse features
E-ELAN uses expand, shuffle, merge cardinality to achieve the ability to continuously enhance the learning ability of the network without destroying the original gradient path
- use group convolution to expand the channel and cardinality of computational blocks

2. Model scaling for concatenation-based models

Purpose
- adjust some attributes of the model and generate models of different scales to meet the needs of different inference speeds

scale example in EfficientNet (width, depth, resolution)

when scaling up or scaling down is performed on depth, the in-degree of a translation layer which is immediately after a concatenation-based computational block will decrease or increase
- cannot analyze different scaling factors separately for a concatenation-based model but must be considered together
propose the corresponding compound model scaling method for a concatenation-based model
- When we scale the depth factor of a computational block, we must also calculate the change of the output channel of that block

Experiments

COCO dataset
Edge GPU / normal GPU / cloud GPU → YOLOv7-tiny, YOLOv7, YOLOv7-W6
compound scaling up → YOLOv7-X, YOLOv7-E6, YOLOv7-D6
E-ELAN + YOLOv7-E6 → YOLOv7-E6E

ETC

github 6.2k stars, very active community
support to export tensorRT

'Paper Review' 카테고리의 다른 글

[Review] A Review of Video Object Detection: Datasets, Metrics and Methods (0)	2023.12.27
[Review] Class-agnostic Object Detection (1)	2023.12.27
[Review] 3D-to-2D Distillation for Indoor Scene Parsing (1)	2023.12.27
[Review] Unity Perception: Generate Synthetic Data for Computer Vision (1)	2023.12.27

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

꿈꾸는자취방

티스토리 뷰

[Review] YOLOv7 Review, Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

YOLO (You Only Look Once)

YOLO-v7

Trainable bag-of-freebies

1. Planned re-parameterized convolution

2. Coarse for auxiliary and fine for lead loss

Architecture

1. Extended efficient layer aggregation networks

2. Model scaling for concatenation-based models

Experiments

ETC

'Paper Review' 카테고리의 다른 글

티스토리툴바