Lightweight Distilled Transformer-Based Vision Framework for Detection of Forest Fire and Smoke in Real-World Scenes

Authors: Hassan Akbar, Tahir Nawaz, Md Asaduzzaman, Mohammad Hasan, Waqar S. Qureshi, Faisal Shafait

Forest fires have become a ravaging threat with incidents growing rapidly across the globe. Several approaches for forest fire detection have been presented over the years, however, the need remains for an effective, computationally efficient, and unified vision-based solution, which can easily be deployable on edge devices for real-world applications. To this end, we present a lightweight model based on a distilled vision transformer (D-ViT) to classify forest imagery into fire, smoke and normal scenarios. We used ResNet50 as a teacher model trained on the target dataset and a compressed D-ViT as a student model trained using the knowledge distillation (KD) approach. Unlike existing approaches, the proposed D-ViT framework is computationally efficient with fewer trainable parameters and is unified in terms of detecting both fire and smoke (whichever is dominant) at longer ranges with visible imagery in the scene. For experimental validation, we deployed the model on Jetson Nano board, and performed an extensive evaluation and analysis of the proposed framework on data collected from public online sources, which we have made available on request for use by the research community. The proposed D-ViT model achieves an encouraging performance with a processing speed of 18.84 frames per second (FPS) and accuracy of 94% using soft distillation, thus demonstrating a performance improvement over the 90% accuracy obtained with the ViT (without distillation). A comparison with several other standard deep classification models also shows encouraging results, with a better trade-off between accuracy and computational efficiency.

You may also like these