티스토리 뷰
Paper Review
[Review] Unity Perception: Generate Synthetic Data for Computer Vision
꿈꾸는컴퓨터 2023. 12. 27. 01:432022.04.05 Review
---
https://github.com/MCG-NJU/AdaMixer
GitHub - MCG-NJU/AdaMixer: [CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object Detector
[CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object Detector - GitHub - MCG-NJU/AdaMixer: [CVPR 2022 Oral] AdaMixer: A Fast-Converging Query-Based Object Detector
github.com
https://arxiv.org/pdf/2203.16507.pdf
---
Goal
- To simplify and accelerate the process of generating synthetic datasets for computer vision tasks by offering an easy-to-use and highly customizable toolset
Contributions
- Extends the Unity Editor and engine components to generate perfectly annotated examples
- offers an extensible Randomization framework that lets the user quickly construct and configure randomized simulation parameters in order to introduce variation into the generated datasets
- overview of the provided tools and how they work, and demonstrate the value of the generated synthetic datasets by training a 2D object detection model
Advantage of the synthetic data
- computer vision community has shown interest in more complex tasks such as object detection semantic segmentation, instance segmentation
- These more complex tasks require increasingly complex models, datasets, and labels.
- Challenge of the large and complex datasets for deep learning
- cost
- the cost of annotating each example increases from labeling frames to labeling objects and even pixels in the image
- workflows and tools become more complex
- creates a need to review or audit annotations, leading to additional costs for each labeled example
- bias
- the requirements of data collection become more challenging. Some scenarios may rarely occur in the real world, yet correctly handling these events is crucial
- example: misplaced obstacles on the road need to be detected by autonomous vehicles.
- the requirements of data collection become more challenging. Some scenarios may rarely occur in the real world, yet correctly handling these events is crucial
- privacy
- become increasingly important, further complicating data collection
- cost
- Advantage of using the rendering engine to generate synthetic data
- requires computetime rather than human-time to generate examples
- has perfect information about the scenes it renders, making it possible to bypass the time and cost of human annotations and review
- makes it possible to generate rare examples
- does not rely on any individual’s private data by design.
- the environments are reusable
- Faster iterations on the generated datasets and the computer vision model
The Unity Perception Package: Details
- The Unity Perception package extends the Unity Editor with tools for generating synthetic datasets that include ground truth annotations
- The package supports domain randomization for introducing variety into the generated datasets
- generate millions of annotated images relatively quickly, and without the need for powerful local computing resources.
1. Ground Truth Generation
- package includes a set of Labelers which capture ground truth information along with each captured frame
- The user manually annotates the project’s library of assets with semantic labels such as “chair” and “motorcycle” to provide the Labelers with the data required to capture datasets that match the target task
- Labeler : ID Mapping
- During simulation, these Labelers compute ground truth based on the state of the 3D scene and the rendering results, through a custom rendering pipeline
2. Randomization tools
- provides a randomization framework that simplifies introducing variation into synthetic environments, leading to varied data
- The Scenario controls and coordinates all randomizations in the scene
- Involves triggering a set of Randomizers in a predetermined order
- Each Scenario’s execution is called an Iteration and each Iteration can run for a user-defined number of frames
- Users can configure Randomizers to act at various timestamps during each Iteration, including at the start and end or per each frame
- expose the environment parameters for randomization and utilize samplers to pick random values for these parameters
- Users can create new Randomizers by extending the base Randomizer class to control various parameters in their environments
- several sample Randomizers to assist with common randomization tasks (e.g. random object placement, position, rotation, texture, and hue),
- can set distribution
- including normal and uniform), or providing custom distribution curves by graphically drawing them.
- Built-in support for distributed data generation
- can scale randomized simulations to the cloud by launching a Unity Simulation run directly from Unity Editor
3. Dataset Insights
- provide an accompanying python package
- includes generating and visualizing dataset statistics and performing model training tasks
- constructed dataset IO modules that allow users to parse, load, and transform datasets in memory
- statistics cover elements such as total and per frame object count, visible pixels per object, and frame by frame visualization of the captured ground truth
Example Project
- Built SynthDet Dataset
- https://github.com/Unity-Technologies/SynthDet
- set of 63 common grocery objects
- 3D Model
- 3D scans of the actual grocery objects
- Realworld dataset
- using the same products by taking numerous pictures of them in various formations and locations.
- Randomizations
- Grocery (foreground) objects: A randomly selected subset of the 63 grocery objects is instantiated and randomly positioned in front of the camera per frame
- density of these objects
- scales of these objects
- unified random rotation
- Background objects
- A group of primitive 3D objects are randomly placed close to each other, creating a “wall”
- have a random texture, chosen from a set of 530 varied images of fruits and vegetables
- Randomization : the rotations and color
- Occluding objects
- foreground occluding objects are placed randomly at a distance closer to the camera
- same primitive objects used for the background, but placed farther apart
- texture, hue, and rotation randomization
- Lighting
- A combination of four directional lights
- randomized intensity and color
- one has randomized rotation as well
- Three of the lights affects all objects, while one significantly brighter light only affects the background objects. This light is switched on with a small probability, resulting in the background becoming overexposed in some frames, leading to more visual separation between it and the grocery objects
- Camera post processing
- The contrast and saturation of the output are randomized in small percentages
- in some frames, a small amount of blur is applied to the camera to simulate real test images
- Grocery (foreground) objects: A randomly selected subset of the 63 grocery objects is instantiated and randomly positioned in front of the camera per frame
- Results
- Dataset
- randomized synthetic dataset : 400,000 images
- real-world dataset : 1267 images, 63 target class
- 60% → train / 20% → eval / 20% → test
- Training set for model fine-tuning
- 760 train data → 76 / 380 / 760
- Architecture
- Backbone - resnet50 (pretrained on Imagenet)
- Faster R-CNN model
- Dataset
- Training strategies
- Train with only real-world dataset
- Train with Synthetic dataset
- 2 + Tune with real-world training set
- Results
- the number of false-positives and false-negatives dropped significantly
- with more synthetic data, the model performance is improved
- improvements in the bounding box localization
Future work
- Extensible sensor framework
- make it easier to add new passive and active sensor types to support environments that rely on radar or lidar sensors, such as robotics and autonomous vehicles
- Improve labeler
- ex) sequences of frames: object tracking
For us
- The synthetic data is important
- Need to find Randomization variables that works for us
- may need to divide foreground / background / Occluding objects
- more
'Paper Review' 카테고리의 다른 글
댓글