🛸
đŸĻ
âœˆī¸
đŸ‘Ŋ
đŸ•Šī¸
🚁
đŸ›°ī¸
🚀
đŸ›Šī¸
đŸĻ…

Real-time UAV Detection System

Combining multiscale processing and cross-head knowledge distillation to detect small UAVs in challenging environments using a YOLOv8-based architecture.

0.84
mAP@0.5
28
FPS
+29%
IMPROVEMENT
View Source Code

Abstract

The proliferation of small Unmanned Aerial Vehicles (UAVs) poses significant security challenges to critical infrastructure. Detecting these small targets in real-time on edge devices is difficult due to their low radar cross-sections and complex environmental backgrounds. This report presents a computer vision system designed to address these challenges using a YOLOv8-based architecture. The system is enhanced by two key strategies: (1) Multiscale Processing using a 5-crop inference mechanism with Non-Maximum Weighted (NMW) fusion to simulate a zoom effect without interpolation loss; and (2) Cross-Head Knowledge Distillation (CrossKD) combined with Progressive KD to efficiently transfer detection-sensitive features from a large Teacher model to a lightweight Student model.

Technical Challenges

As commercial drones become more accessible, the risk of unauthorized surveillance and airspace intrusion increases.

Small Object Size

Drones often occupy a tiny fraction of the frame (<5%), and standard resizing in object detectors causes significant feature loss.

Computational Cost

High-accuracy models (e.g., YOLOv8-X) are too heavy for edge devices, requiring the speed of 'Nano' models with the accuracy of 'Large' models.

Domain Generalization

The system must generalize across diverse environments including clear sky, fog, night, and various weather conditions.

Methodology

Our approach combines two powerful strategies to achieve high accuracy with real-time performance.

Multiscale Processing

To resolve the small object issue, we implemented a grid cropping strategy during inference. Instead of resizing the entire large image down to the model input size (which destroys small details), we process the image in segments.

5-Crop Pattern

  • Image is divided into 4 crops from corners and 1 from center
  • Each crop covers approximately 55-65% of the original frame
  • Crops are fed at input size (320×256) creating a "zoom" effect

NMW Fusion

Merging results using standard NMS proved too aggressive. We adopted Non-Maximum Weighted (NMW) to calculate weighted average of box coordinates based on confidence scores.

Note: Using only the 5 crops (discarding the full-frame original image) yielded the best F1-score and AP, as the original resized frame often introduced false positives.

Knowledge Distillation

To achieve low latency, we distilled knowledge from a heavy Teacher model to a lightweight Student model (YOLOv8-Nano) using Cross-Head Knowledge Distillation (CrossKD).

Progressive Distillation

Directly distilling from a massive model to a tiny one often causes "Knowledge Shock" due to the capacity gap. We employed a progressive pipeline:

Teacher (v8-X) → Intermediate (v8-L) → Student (v8-Nano)

Cross-Head Knowledge Distillation (CrossKD)

Instead of traditional feature mimicking, CrossKD establishes cross-connections between Teacher and Student detection heads:

  • Student Features → Teacher Head: Forces the Student's backbone to learn features robust enough for the Teacher's powerful detection head
  • Teacher Features → Student Head: Trains the Student's head to process high-quality, complex features from the Teacher

By interacting directly at the head level, the model learns better bounding box regression and focuses on detection-sensitive features rather than background noise.

System Pipeline

Overview of our training and inference architecture

Phase 1: Training Pipeline

Step 1: Robust Init
Transfer Learning
Source: DUT Anti-UAV Dataset (10k images)

Pre-training YOLOv8 on a diverse dataset to learn general drone features (shape, motion) before seeing the target data.
Step 2: Progressive KD
Knowledge Distillation
Target: VIP Cup 2025 Dataset
YOLOv8-X YOLOv8-L
YOLOv8-L YOLOv8-n

Using Intermediate Teacher (L) to bridge the capacity gap between X and Nano.
Step 3: CrossKD Loss
Cross-Head Distillation
Applying Cross-Head Knowledge Distillation during training.

Cross-connecting Student backbone → Teacher head and Teacher backbone → Student head for detection-sensitive feature transfer.

Phase 2: Inference Pipeline

Input Frame
Raw RGB/IR Image
5-Crop Split
4 Corners + 1 Center
(Zoom Effect)
YOLOv8-n
Distilled Model
(Batch Inference)
NMW
Non-Maximum Weighted
(Fusion)
Result
Final Bounding Box
Distillation Pipeline

Experimental Results

Evaluated on VIP Cup 2025 and DUT Anti-UAV datasets

Quantitative Results

Configuration mAP@0.5 mAP@0.5-0.9 FPS
YOLOv11-Nano 0.51 0.23 72
YOLOv12-Nano 0.55 0.22 69
YOLOv8-Nano (Baseline) 0.55 0.22 77
YOLOv8-Nano (KD) 0.65 0.25 77
YOLOv8-Nano (Pretrained) 0.79 0.48 77
YOLOv8-Nano (Multiscale) 0.61 0.23 28
YOLOv8-Nano (KD + Pretrained + Multiscale) 0.84 0.51 28

Multiscale Analysis

Experiments on the cropping strategy showed that including the original full-frame image often degraded precision.

Knowledge Distillation Results

We distilled knowledge from YOLOv8-X to YOLOv8-Nano via YOLOv8-Large as an intermediate using Cross-Head Knowledge Distillation (CrossKD). This approach improved mAP by +10% by focusing on detection-sensitive features and improving localization.

Detection Demos

Visual comparison of our detection system on RGB and IR (Infrared) videos

Loading videos...

Conclusion

This project developed a UAV detection system that balances accuracy and speed. The multiscale processing strategy handles small objects by simulating a zoom effect, while the cross-head knowledge distillation combined with progressive KD enables efficient knowledge transfer with enhanced localization capabilities.

+29% Accuracy

mAP improved from 0.55 to 0.84

Real-time Performance

28 FPS on edge devices

Better Localization

Improved bounding box regression via CrossKD

References

Research papers that informed our approach

1
High-Speed Drone Detection Based On Yolo-V8
Jun-Hwa Kim, Namho Kim, Chee Sun Won
ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing (2023)
2
Vision-based Anti-UAV Detection and Tracking
Jie Zhao, Jingshu Zhang, Dongdong Li, Dong Wang
IEEE Transactions on Intelligent Transportation Systems (2022)
3
A Fourier-based Framework for Domain Generalization
Qinwei Xu, Ruipeng Zhang, Ya Zhang, Yanfeng Wang, Qi Tian
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
4
Improving Small Drone Detection Through Multi-Scale Processing and Data Augmentation
Rayson Laroca, Marcelo dos Santos, David Menotti
International Joint Conference on Neural Networks (IJCNN) (2025)
5
Domain-invariant Progressive Knowledge Distillation for UAV-based Object Detection
Liang Yao, Fan Liu, Chuanyi Zhang, Zhiquan Ou, Ting Wu
IEEE Geoscience and Remote Sensing Letters (2024)
6
CrossKD: Cross-Head Knowledge Distillation for Object Detection
Jiabao Wang, Yuming Chen, Zhaohui Zheng, Xiang Li, Ming-Ming Cheng, Qibin Hou
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024)