Detecting Twenty-thousand Classes using Image-level Supervision

Description

Detic: A Detector with image classes that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision, Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra, ECCV 2022 (arXiv 2201.02605)

Usage

Installation

Detic requires to install CLIP.

pip install git+https://github.com/openai/CLIP.git

Demo

Inference with existing dataset vocabulary embeddings

First, go to the Detic project folder.

cd projects/Detic

Then, download the pre-computed CLIP embeddings from dataset metainfo to the datasets/metadata folder. The CLIP embeddings will be loaded to the zero-shot classifier during inference. For example, you can download LVIS's class name embeddings with the following command:

wget -P datasets/metadata https://raw.githubusercontent.com/facebookresearch/Detic/main/datasets/metadata/lvis_v1_clip_a%2Bcname.npy

You can run demo like this:

python demo.py \
  ${IMAGE_PATH} \
  ${CONFIG_PATH} \
  ${MODEL_PATH} \
  --show \
  --score-thr 0.5 \
  --dataset lvis

Inference with custom vocabularies

Detic can detects any class given class names by using CLIP.

You can detect custom classes with --class-name command:

python demo.py \
  ${IMAGE_PATH} \
  ${CONFIG_PATH} \
  ${MODEL_PATH} \
  --show \
  --score-thr 0.3 \
  --class-name headphone webcam paper coffe

Note that headphone, paper and coffe (typo intended) are not LVIS classes. Despite the misspelled class name, Detic can produce a reasonable detection for coffe.

Results

Here we only provide the Detic Swin-B model for the open vocabulary demo. Multi-dataset training and open-vocabulary testing will be supported in the future.

To find more variants, please visit the official model zoo.

Backbone	Training data	Config	Download
Swin-B	ImageNet-21K & LVIS & COCO	config	model

Citation

If you find Detic is useful in your research or applications, please consider giving a star 🌟 to the official repository and citing Detic by the following BibTeX entry.

@inproceedings{zhou2022detecting,
  title={Detecting Twenty-thousand Classes using Image-level Supervision},
  author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},
  booktitle={ECCV},
  year={2022}
}

Checklist

[x] Milestone 1: PR-ready, and acceptable to be one of the projects/.
- Finish the code
- Basic docstrings & proper citation
- Test-time correctness
- A full README
[ ] Milestone 2: Indicates a successful model implementation.
- Training-time correctness
[ ] Milestone 3: Good to be a part of our core package!
- Type hints and docstrings
- Unit tests
- Code polishing
- Metafile.yml
[ ] Move your modules into the core package following the codebase's file hierarchy structure.

Refactor your modules into the core package following the codebase's file hierarchy structure.

README.md 8.1 KB Permalink History Raw