In this document, we will give a guide on the process of preparing datasets for the MMPose. Various aspects of dataset preparation will be discussed, including using built-in datasets, creating custom datasets, combining datasets for training, and browsing the dataset.
Step 1: Prepare Data
MMPose supports multiple tasks and corresponding datasets. You can find them in dataset zoo. To properly prepare your data, please follow the guidelines associated with your chosen dataset.
Step 2: Configure Dataset Settings in the Config File
Before training or evaluating models, you must configure the dataset settings. Take td-hm_hrnet-w32_8xb64-210e_coco-256x192.py
for example, which can be used to train or evaluate the HRNet pose estimator on COCO dataset. We will go through the dataset configuration.
# base dataset settings
dataset_type = 'CocoDataset'
data_mode = 'topdown'
data_root = 'data/coco/'
dataset_type
specifies the class name of the dataset. Users can refer to Datasets APIs to find the class name of their desired dataset.data_mode
determines the output format of the dataset, with two options available: 'topdown'
and 'bottomup'
. If data_mode='topdown'
, the data element represents a single instance with its pose; otherwise, the data element is an entire image containing multiple instances and poses.data_root
designates the root directory of the dataset.
Data Processing Pipelines
# pipelines
train_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='RandomFlip', direction='horizontal'),
dict(type='RandomHalfBody'),
dict(type='RandomBBoxTransform'),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='GenerateTarget', encoder=codec),
dict(type='PackPoseInputs')
]
val_pipeline = [
dict(type='LoadImage'),
dict(type='GetBBoxCenterScale'),
dict(type='TopdownAffine', input_size=codec['input_size']),
dict(type='PackPoseInputs')
]
The train_pipeline
and val_pipeline
define the steps to process data elements during the training and evaluation phases, respectively. In addition to loading images and packing inputs, the train_pipeline
primarily consists of data augmentation techniques and target generator, while the val_pipeline
focuses on transforming data elements into a unified format.
# data loaders
train_dataloader = dict(
batch_size=64,
num_workers=2,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_mode=data_mode,
ann_file='annotations/person_keypoints_train2017.json',
data_prefix=dict(img='train2017/'),
pipeline=train_pipeline,
))
val_dataloader = dict(
batch_size=32,
num_workers=2,
persistent_workers=True,
drop_last=False,
sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
dataset=dict(
type=dataset_type,
data_root=data_root,
data_mode=data_mode,
ann_file='annotations/person_keypoints_val2017.json',
bbox_file='data/coco/person_detection_results/'
'COCO_val2017_detections_AP_H_56_person.json',
data_prefix=dict(img='val2017/'),
test_mode=True,
pipeline=val_pipeline,
))
test_dataloader = val_dataloader
This section is crucial for configuring the dataset in the config file. In addition to the basic dataset arguments and pipelines discussed earlier, other important parameters are defined here. The batch_size
determines the batch size per GPU; the ann_file
indicates the annotation file for the dataset; and data_prefix
specifies the image folder. The bbox_file
, which supplies detected bounding box information, is only used in the val/test data loader for top-down datasets.
We recommend copying the dataset configuration from provided config files that use the same dataset, rather than writing it from scratch, in order to minimize potential errors. By doing so, users can simply make the necessary modifications as needed, ensuring a more reliable and efficient setup process.
The Customize Datasets guide provides detailed information on how to build a custom dataset. In this section, we will highlight some key tips for using and configuring custom datasets.
Determine the dataset class name. If you reorganize your dataset into the COCO format, you can simply use CocoDataset
as the value for dataset_type
. Otherwise, you will need to use the name of the custom dataset class you added.
Specify the meta information config file. MMPose 1.x employs a different strategy for specifying meta information compared to MMPose 0.x. In MMPose 1.x, users can specify the meta information config file as follows:
train_dataloader = dict(
...
dataset=dict(
type=dataset_type,
data_root='root/of/your/train/data',
ann_file='path/to/your/train/json',
data_prefix=dict(img='path/to/your/train/img'),
# specify dataset meta information
metainfo=dict(from_file='configs/_base_/datasets/custom.py'),
...),
)
Note that the argument metainfo
must be specified in the val/test data loaders as well.
MMPose offers a convenient and versatile solution for training with mixed datasets. Please refer to Use Mixed Datasets for Training.
tools/analysis_tools/browse_dataset.py
helps the user to browse a pose dataset visually, or save the image to a designated directory.
python tools/misc/browse_dataset.py ${CONFIG} [-h] [--output-dir ${OUTPUT_DIR}] [--not-show] [--phase ${PHASE}] [--mode ${MODE}] [--show-interval ${SHOW_INTERVAL}]
ARGS | Description |
---|---|
CONFIG |
The path to the config file. |
--output-dir OUTPUT_DIR |
The target folder to save visualization results. If not specified, the visualization results will not be saved. |
--not-show |
Do not show the visualization results in an external window. |
--phase {train, val, test} |
Options for dataset. |
--mode {original, transformed} |
Specify the type of visualized images. original means to show images without pre-processing; transformed means to show images are pre-processed. |
--show-interval SHOW_INTERVAL |
Time interval between visualizing two images. |
For instance, users who want to visualize images and annotations in COCO dataset use:
python tools/misc/browse_dataset.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-e210_coco-256x192.py --mode original
The bounding boxes and keypoints will be plotted on the original image. Following is an example:
The original images need to be processed before being fed into models. To visualize pre-processed images and annotations, users need to modify the argument mode
to transformed
. For example:
python tools/misc/browse_dataset.py configs/body_2d_keypoint/topdown_heatmap/coco/td-hm_hrnet-w32_8xb64-e210_coco-256x192.py --mode transformed
Here is a processed sample
The heatmap target will be visualized together if it is generated in the pipeline.