robustness_benchmarking.md 5.7 KB

Corruption Benchmarking

Introduction

We provide tools to test object detection and instance segmentation models on the image corruption benchmark defined in Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming. This page provides basic tutorials how to use the benchmark.

@article{michaelis2019winter,
  title={Benchmarking Robustness in Object Detection:
    Autonomous Driving when Winter is Coming},
  author={Michaelis, Claudio and Mitzkus, Benjamin and
    Geirhos, Robert and Rusak, Evgenia and
    Bringmann, Oliver and Ecker, Alexander S. and
    Bethge, Matthias and Brendel, Wieland},
  journal={arXiv:1907.07484},
  year={2019}
}

image corruption example

About the benchmark

To submit results to the benchmark please visit the benchmark homepage

The benchmark is modelled after the imagenet-c benchmark which was originally published in Benchmarking Neural Network Robustness to Common Corruptions and Perturbations (ICLR 2019) by Dan Hendrycks and Thomas Dietterich.

The image corruption functions are included in this library but can be installed separately using:

pip install imagecorruptions

Compared to imagenet-c a few changes had to be made to handle images of arbitrary size and greyscale images. We also modified the 'motion blur' and 'snow' corruptions to remove dependency from a linux specific library, which would have to be installed separately otherwise. For details please refer to the imagecorruptions repository.

Inference with pretrained models

We provide a testing script to evaluate a models performance on any combination of the corruptions provided in the benchmark.

Test a dataset

  • single GPU testing
  • multiple GPU testing
  • visualize detection results

You can use the following commands to test a models performance under the 15 corruptions used in the benchmark.

# single-gpu testing
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}]

Alternatively different group of corruptions can be selected.

# noise
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --corruptions noise

# blur
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --corruptions blur

# wetaher
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --corruptions weather

# digital
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --corruptions digital

Or a costom set of corruptions e.g.:

# gaussian noise, zoom blur and snow
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions gaussian_noise zoom_blur snow

Finally the corruption severities to evaluate can be chosen. Severity 0 corresponds to clean data and the effect increases from 1 to 5.

# severity 1
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --severities 1

# severities 0,2,4
python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --severities 0 2 4

Results for modelzoo models

The results on COCO 2017val are shown in the below table.

Model Backbone Style Lr schd box AP clean box AP corr. box % mask AP clean mask AP corr. mask %
Faster R-CNN R-50-FPN pytorch 1x 36.3 18.2 50.2 - - -
Faster R-CNN R-101-FPN pytorch 1x 38.5 20.9 54.2 - - -
Faster R-CNN X-101-32x4d-FPN pytorch 1x 40.1 22.3 55.5 - - -
Faster R-CNN X-101-64x4d-FPN pytorch 1x 41.3 23.4 56.6 - - -
Faster R-CNN R-50-FPN-DCN pytorch 1x 40.0 22.4 56.1 - - -
Faster R-CNN X-101-32x4d-FPN-DCN pytorch 1x 43.4 26.7 61.6 - - -
Mask R-CNN R-50-FPN pytorch 1x 37.3 18.7 50.1 34.2 16.8 49.1
Mask R-CNN R-50-FPN-DCN pytorch 1x 41.1 23.3 56.7 37.2 20.7 55.7
Cascade R-CNN R-50-FPN pytorch 1x 40.4 20.1 49.7 - - -
Cascade Mask R-CNN R-50-FPN pytorch 1x 41.2 20.7 50.2 35.7 17.6 49.3
RetinaNet R-50-FPN pytorch 1x 35.6 17.8 50.1 - - -
Hybrid Task Cascade X-101-64x4d-FPN-DCN pytorch 1x 50.6 32.7 64.7 43.8 28.1 64.0

Results may vary slightly due to the stochastic application of the corruptions.