DJW c16313bb6a 第一次提交		vor 1 Jahr
..
configs	c16313bb6a 第一次提交	vor 1 Jahr
detic	c16313bb6a 第一次提交	vor 1 Jahr
README.md	c16313bb6a 第一次提交	vor 1 Jahr
demo.py	c16313bb6a 第一次提交	vor 1 Jahr

Detecting Twenty-thousand Classes using Image-level Supervision

Description

Detic: A Detector with image classes that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision, Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra, ECCV 2022 (arXiv 2201.02605)

Usage

Installation

Detic requires to install CLIP.

pip install git+https://github.com/openai/CLIP.git

Demo

Inference with existing dataset vocabulary embeddings

First, go to the Detic project folder.

cd projects/Detic

Then, download the pre-computed CLIP embeddings from dataset metainfo to the datasets/metadata folder. The CLIP embeddings will be loaded to the zero-shot classifier during inference. For example, you can download LVIS's class name embeddings with the following command:

wget -P datasets/metadata https://raw.githubusercontent.com/facebookresearch/Detic/main/datasets/metadata/lvis_v1_clip_a%2Bcname.npy

You can run demo like this:

python demo.py \
  ${IMAGE_PATH} \
  ${CONFIG_PATH} \
  ${MODEL_PATH} \
  --show \
  --score-thr 0.5 \
  --dataset lvis

Inference with custom vocabularies

Detic can detects any class given class names by using CLIP.

You can detect custom classes with --class-name command:

python demo.py \
  ${IMAGE_PATH} \
  ${CONFIG_PATH} \
  ${MODEL_PATH} \
  --show \
  --score-thr 0.3 \
  --class-name headphone webcam paper coffe

Note that headphone, paper and coffe (typo intended) are not LVIS classes. Despite the misspelled class name, Detic can produce a reasonable detection for coffe.

Results

Here we only provide the Detic Swin-B model for the open vocabulary demo. Multi-dataset training and open-vocabulary testing will be supported in the future.

To find more variants, please visit the official model zoo.

Backbone	Training data	Config	Download
Swin-B	ImageNet-21K & LVIS & COCO	config	model

Citation

If you find Detic is useful in your research or applications, please consider giving a star 🌟 to the official repository and citing Detic by the following BibTeX entry.

@inproceedings{zhou2022detecting,
  title={Detecting Twenty-thousand Classes using Image-level Supervision},
  author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},
  booktitle={ECCV},
  year={2022}
}

Checklist

[x] Milestone 1: PR-ready, and acceptable to be one of the projects/.
- Finish the code
- Basic docstrings & proper citation
- Test-time correctness
- A full README
[ ] Milestone 2: Indicates a successful model implementation.
- Training-time correctness
[ ] Milestone 3: Good to be a part of our core package!
- Type hints and docstrings
- Unit tests
- Code polishing
- Metafile.yml
[ ] Move your modules into the core package following the codebase's file hierarchy structure.

Refactor your modules into the core package following the codebase's file hierarchy structure.

README.md