DJW c16313bb6a 第一次提交 7 месяцев назад
..
configs c16313bb6a 第一次提交 7 месяцев назад
detic c16313bb6a 第一次提交 7 месяцев назад
README.md c16313bb6a 第一次提交 7 месяцев назад
demo.py c16313bb6a 第一次提交 7 месяцев назад

README.md

Detecting Twenty-thousand Classes using Image-level Supervision

Description

Detic: A Detector with image classes that can use image-level labels to easily train detectors.

Detecting Twenty-thousand Classes using Image-level Supervision, Xingyi Zhou, Rohit Girdhar, Armand Joulin, Philipp Krähenbühl, Ishan Misra, ECCV 2022 (arXiv 2201.02605)

Usage

Installation

Detic requires to install CLIP.

pip install git+https://github.com/openai/CLIP.git

Demo

Inference with existing dataset vocabulary embeddings

First, go to the Detic project folder.

cd projects/Detic

Then, download the pre-computed CLIP embeddings from dataset metainfo to the datasets/metadata folder. The CLIP embeddings will be loaded to the zero-shot classifier during inference. For example, you can download LVIS's class name embeddings with the following command:

wget -P datasets/metadata https://raw.githubusercontent.com/facebookresearch/Detic/main/datasets/metadata/lvis_v1_clip_a%2Bcname.npy

You can run demo like this:

python demo.py \
  ${IMAGE_PATH} \
  ${CONFIG_PATH} \
  ${MODEL_PATH} \
  --show \
  --score-thr 0.5 \
  --dataset lvis

image

Inference with custom vocabularies

  • Detic can detects any class given class names by using CLIP.

You can detect custom classes with --class-name command:

python demo.py \
  ${IMAGE_PATH} \
  ${CONFIG_PATH} \
  ${MODEL_PATH} \
  --show \
  --score-thr 0.3 \
  --class-name headphone webcam paper coffe

image

Note that headphone, paper and coffe (typo intended) are not LVIS classes. Despite the misspelled class name, Detic can produce a reasonable detection for coffe.

Results

Here we only provide the Detic Swin-B model for the open vocabulary demo. Multi-dataset training and open-vocabulary testing will be supported in the future.

To find more variants, please visit the official model zoo.

Backbone Training data Config Download
Swin-B ImageNet-21K & LVIS & COCO config model

Citation

If you find Detic is useful in your research or applications, please consider giving a star 🌟 to the official repository and citing Detic by the following BibTeX entry.

@inproceedings{zhou2022detecting,
  title={Detecting Twenty-thousand Classes using Image-level Supervision},
  author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},
  booktitle={ECCV},
  year={2022}
}

Checklist

  • [x] Milestone 1: PR-ready, and acceptable to be one of the projects/.

    • Finish the code
    • Basic docstrings & proper citation
    • Test-time correctness
    • A full README
  • [ ] Milestone 2: Indicates a successful model implementation.

    • Training-time correctness
  • [ ] Milestone 3: Good to be a part of our core package!

    • Type hints and docstrings
    • Unit tests
    • Code polishing
    • Metafile.yml
  • [ ] Move your modules into the core package following the codebase's file hierarchy structure.

  • Refactor your modules into the core package following the codebase's file hierarchy structure.