DJW cdece0b32a 第一次提交		1 jaar geleden
..
README.md	cdece0b32a 第一次提交	1 jaar geleden
README_CN.md	cdece0b32a 第一次提交	1 jaar geleden

RTMPose Benchmarks

English | 简体中文

Community users are welcome to contribute to this project directory by performing inference speed tests on different hardware devices.

Currently tested:

CPU
- Intel i7-11700
GPU
- NVIDIA GeForce 1660 Ti
- NVIDIA GeForce RTX 3090
Nvidia Jetson
- AGX Orin
- Orin NX
ARM
- Snapdragon 865

Body 2d (17 Keypoints)

Model Info

Speed Benchmark

Numbers displayed in the table are inference latencies in millisecond(ms).

Config	Input Size	AP^(COCO)	Params(M)	FLOPS(G)
RTMPose-t	256x192	68.5	3.34	0.36
RTMPose-s	256x192	72.2	5.47	0.68
RTMPose-m	256x192	75.8	13.59	1.93
RTMPose-l	256x192	76.5	27.66	4.16
RTMPose-m	384x288	77.0	13.72	4.33
RTMPose-l	384x288	77.3	27.79	9.35

WholeBody 2d (133 Keypoints)

Model Info

Config	Input Size	ORT^(i7-11700)	TRT-FP16^{(GTX 1660Ti)}	TRT-FP16^{(RTX 3090)}	ncnn-FP16^{(Snapdragon 865)}	TRT-FP16^{(Jetson AGX Orin)}	TRT-FP16^{(Jetson Orin NX)}
RTMPose-t	256x192	3.20	1.06	0.98	9.02	1.63	1.97
RTMPose-s	256x192	4.48	1.39	1.12	13.89	1.85	2.18
RTMPose-m	256x192	11.06	2.29	1.18	26.44	2.72	3.35
RTMPose-l	256x192	18.85	3.46	1.37	45.37	3.67	4.78
RTMPose-m	384x288	24.78	3.66	1.20	26.44	3.45	5.08
RTMPose-l	384x288	-	6.05	1.74	-	4.93	7.23

Config	Input Size	Whole AP	Whole AR	FLOPS(G)
RTMPose-m	256x192	60.4	66.7	2.22
RTMPose-l	256x192	63.2	69.4	4.52
RTMPose-l	384x288	67.0	72.3	10.07

Speed Benchmark

Numbers displayed in the table are inference latencies in millisecond(ms).
Data from different community users are separated by |.

How To Test Speed

If you need to test the inference speed of the model under the deployment framework, MMDeploy provides a convenient tools/profiler.py script.

The user needs to prepare a folder for the test images ./test_images, the profiler will randomly read images from this directory for the model speed test.

python tools/profiler.py \
    configs/mmpose/pose-detection_simcc_onnxruntime_dynamic.py \
    {RTMPOSE_PROJECT}/rtmpose/body_2d_keypoint/rtmpose-m_8xb256-420e_coco-256x192.py \
    ../test_images \
    --model {WORK_DIR}/end2end.onnx \
    --shape 256x192 \
    --device cpu \
    --warmup 50 \
    --num-iter 200

The result is as follows:

01/30 15:06:35 - mmengine - INFO - [onnxruntime]-70 times per count: 8.73 ms, 114.50 FPS
01/30 15:06:36 - mmengine - INFO - [onnxruntime]-90 times per count: 9.05 ms, 110.48 FPS
01/30 15:06:37 - mmengine - INFO - [onnxruntime]-110 times per count: 9.87 ms, 101.32 FPS
01/30 15:06:37 - mmengine - INFO - [onnxruntime]-130 times per count: 9.99 ms, 100.10 FPS
01/30 15:06:38 - mmengine - INFO - [onnxruntime]-150 times per count: 10.39 ms, 96.29 FPS
01/30 15:06:39 - mmengine - INFO - [onnxruntime]-170 times per count: 10.77 ms, 92.86 FPS
01/30 15:06:40 - mmengine - INFO - [onnxruntime]-190 times per count: 10.98 ms, 91.05 FPS
01/30 15:06:40 - mmengine - INFO - [onnxruntime]-210 times per count: 11.19 ms, 89.33 FPS
01/30 15:06:41 - mmengine - INFO - [onnxruntime]-230 times per count: 11.16 ms, 89.58 FPS
01/30 15:06:42 - mmengine - INFO - [onnxruntime]-250 times per count: 11.06 ms, 90.41 FPS
----- Settings:
+------------+---------+
| batch size |    1    |
|   shape    | 256x192 |
| iterations |   200   |
|   warmup   |    50   |
+------------+---------+
----- Results:
+--------+------------+---------+
| Stats  | Latency/ms |   FPS   |
+--------+------------+---------+
|  Mean  |   11.060   |  90.412 |
| Median |   11.852   |  84.375 |
|  Min   |   7.812    | 128.007 |
|  Max   |   13.690   |  73.044 |
+--------+------------+---------+

If you want to learn more details of profiler, you can refer to the Profiler Docs.

Config	Input Size	ORT^(i7-11700)	TRT-FP16^{(GTX 1660Ti)}	TRT-FP16^{(RTX 3090)}	TRT-FP16^{(Jetson AGX Orin)}	TRT-FP16^{(Jetson Orin NX)}
RTMPose-m	256x192	13.50	4.00	1.17 \| 1.84	2.79	3.51
RTMPose-l	256x192	23.41	5.67	1.44 \| 2.61	3.80	4.95
RTMPose-l	384x288	44.58	7.68	1.75 \| 4.24	5.08	7.20

README.md

RTMPose Benchmarks

Body 2d (17 Keypoints)

Model Info

Speed Benchmark

WholeBody 2d (133 Keypoints)

Model Info

Speed Benchmark

How To Test Speed