DJW c16313bb6a 第一次提交 | 9 hónapja | |
---|---|---|
.. | ||
README.md | 9 hónapja | |
mask2former_r101_8xb2-lsj-50e_coco-panoptic.py | 9 hónapja | |
mask2former_r101_8xb2-lsj-50e_coco.py | 9 hónapja | |
mask2former_r50_8xb2-lsj-50e_coco-panoptic.py | 9 hónapja | |
mask2former_r50_8xb2-lsj-50e_coco.py | 9 hónapja | |
mask2former_swin-b-p4-w12-384-in21k_8xb2-lsj-50e_coco-panoptic.py | 9 hónapja | |
mask2former_swin-b-p4-w12-384_8xb2-lsj-50e_coco-panoptic.py | 9 hónapja | |
mask2former_swin-l-p4-w12-384-in21k_16xb1-lsj-100e_coco-panoptic.py | 9 hónapja | |
mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py | 9 hónapja | |
mask2former_swin-s-p4-w7-224_8xb2-lsj-50e_coco.py | 9 hónapja | |
mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco-panoptic.py | 9 hónapja | |
mask2former_swin-t-p4-w7-224_8xb2-lsj-50e_coco.py | 9 hónapja | |
metafile.yml | 9 hónapja |
Masked-attention Mask Transformer for Universal Image Segmentation
Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).
Mask2Former requires COCO and COCO-panoptic dataset for training and evaluation. You need to download and extract it in the COCO dataset path. The directory should be like this.
mmdetection
├── mmdet
├── tools
├── configs
├── data
│ ├── coco
│ │ ├── annotations
| | | ├── instances_train2017.json
| | | ├── instances_val2017.json
│ │ │ ├── panoptic_train2017.json
│ │ │ ├── panoptic_train2017
│ │ │ ├── panoptic_val2017.json
│ │ │ ├── panoptic_val2017
│ │ ├── train2017
│ │ ├── val2017
│ │ ├── test2017
Backbone | style | Pretrain | Lr schd | Mem (GB) | Inf time (fps) | PQ | box mAP | mask mAP | Config | Download |
---|---|---|---|---|---|---|---|---|---|---|
R-50 | pytorch | ImageNet-1K | 50e | 13.9 | - | 52.0 | 44.5 | 41.8 | config | model | log |
R-101 | pytorch | ImageNet-1K | 50e | 16.1 | - | 52.4 | 45.3 | 42.4 | config | model | log |
Swin-T | - | ImageNet-1K | 50e | 15.9 | - | 53.4 | 46.3 | 43.4 | config | model | log |
Swin-S | - | ImageNet-1K | 50e | 19.1 | - | 54.5 | 47.8 | 44.5 | config | model | log |
Swin-B | - | ImageNet-1K | 50e | 26.0 | - | 55.1 | 48.2 | 44.9 | config | model | log |
Swin-B | - | ImageNet-21K | 50e | 25.8 | - | 56.3 | 50.0 | 46.3 | config | model | log |
Swin-L | - | ImageNet-21K | 100e | 21.1 | - | 57.6 | 52.2 | 48.5 | config | model | log |
Backbone | style | Pretrain | Lr schd | Mem (GB) | Inf time (fps) | box mAP | mask mAP | Config | Download |
---|---|---|---|---|---|---|---|---|---|
R-50 | pytorch | ImageNet-1K | 50e | 13.7 | - | 45.7 | 42.9 | config | model | log |
R-101 | pytorch | ImageNet-1K | 50e | 15.5 | - | 46.7 | 44.0 | config | model | log |
Swin-T | - | ImageNet-1K | 50e | 15.3 | - | 47.7 | 44.7 | config | model | log |
Swin-S | - | ImageNet-1K | 50e | 18.8 | - | 49.3 | 46.1 | config | model | log |
Mask2Former-R50-coco-panoptic
may fluctuate about 0.2 PQ. The models other than Mask2Former-R50-coco-panoptic
were trained with mmdet 2.x and have been converted for mmdet 3.x.@article{cheng2021mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
journal={arXiv},
year={2021}
}