We try to apply a pruning algorithm to RTMPose models. In detail, we prune a RTMPose model to a smaller size as the same as a smaller RTMPose model, like pruning RTMPose-S to the size of RTMPose-T. The expriments show that the pruned model have better performance(AP) than the RTMPose model with the similar size and inference speed.
Concretly, we select the RTMPose-S as the base model and prune it to the size of RTMPose-T, and use GroupFisher pruning algorithm which is able to determine the pruning structure automatically. Furthermore, we provide two version of the pruned models including only using coco and using both of coco and ai-challenge datasets.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | Flops | Params | ckpt | log |
---|---|---|---|---|---|---|---|---|---|---|
rtmpose-s-pruned | 256x192 | 0.691 | 0.885 | 0.765 | 0.745 | 0.925 | 0.34 | 3.42 | pruned | finetuned | log |
rtmpose-s-aic-coco-pruned | 256x192 | 0.694 | 0.884 | 0.771 | 0.747 | 0.922 | 0.35 | 3.43 | pruned | finetuned | log |
We have three steps to apply GroupFisher to your model, including Prune, Finetune, Deploy.
Note: please use torch>=1.12, as we need fxtracer to parse the models automatically.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_train.sh \
{config_folder}/group_fisher_{normalization_type}_prune_{model_name}.py 8 \
--work-dir $WORK_DIR
In the pruning config file. You have to fill some args as below.
"""
_base_ (str): The path to your pretrained model checkpoint.
pretrained_path (str): The path to your pretrained model checkpoint.
interval (int): Interval between pruning two channels. You should ensure you
can reach your target pruning ratio when the training ends.
normalization_type (str): GroupFisher uses two methods to normlized the channel
importance, including ['flops','act']. The former uses flops, while the
latter uses the memory occupation of activation feature maps.
lr_ratio (float): Ratio to decrease lr rate. As pruning progress is unstable,
you need to decrease the original lr rate until the pruning training work
steadly without getting nan.
target_flop_ratio (float): The target flop ratio to prune your model.
input_shape (Tuple): input shape to measure the flops.
"""
After the pruning process, you will get a checkpoint of the pruned model named flops_{target_flop_ratio}.pth in your workdir.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_train.sh \
{config_folder}/group_fisher_{normalization_type}_finetune_{model_name}.py 8 \
--work-dir $WORK_DIR
There are also some args for you to fill in the config file as below.
"""
_base_(str): The path to your pruning config file.
pruned_path (str): The path to the checkpoint of the pruned model.
finetune_lr (float): The lr rate to finetune. Usually, we directly use the lr
rate of the pretrain.
"""
After finetuning, except a checkpoint of the best model, there is also a fix_subnet.json, which records the pruned model structure. It will be used when deploying.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_test.sh \
{config_folder}/group_fisher_{normalization_type}_finetune_{model_name}.py {checkpoint_path} 8
For a pruned model, you only need to use the pruning deploy config to instead the pretrain config to deploy the pruned version of your model. If you are not familiar with mmdeploy, it's recommended to refer to MMDeploy document.
python {mmdeploy}/tools/deploy.py \
{mmdeploy}/{mmdeploy_config}.py \
{config_folder}/group_fisher_{normalization_type}_deploy_{model_name}.py \
{path_to_finetuned_checkpoint}.pth \
{mmdeploy}/tests/data/tiger.jpeg
The deploy config has some args as below:
"""
_base_ (str): The path to your pretrain config file.
fix_subnet (Union[dict,str]): The dict store the pruning structure or the
json file including it.
divisor (int): The divisor the make the channel number divisible.
"""
The divisor is important for the actual inference speed, and we suggest you to test it in [1,2,4,8,16,32] to find the fastest divisor.