An efficient hyperspectral image classification method using retentive network

Arya R.K.; Paul S.; Srivastava R.

doi:https://doi.org/10.1016/j.asr.2024.10.001

An efficient hyperspectral image classification method using retentive network

Authors

Abstract

In recent computer vision tasks, the vision transformer (ViT) has demonstrated competitive ability. However, ViT still has problems: the computational complexity of the self-attention layer leads to expensive and slow interference, and processing all tokens for high-resolution images may slow down due to the layer's quadratic complexity. Recently, a retentive network with excellent performance, training parallelism, and an inexpensive inference cost was proposed. For hyperspectral image (HSI) classification, this paper proposes a retention-based network model called the HSI retentive network (HSIRN). The proposed model allows memory usage independent of the token's sequence, facilitating the efficient processing of high-resolution images with low inference and computational costs. Although the retention encoder can extract global data, it pays limited attention to local data. A powerful tool for extracting local information is a convolutional neural network (CNN). The proposed HSIRN model uses a specific CNN-based block to extract local spectral-spatial information. To maintain degradation between successive vertical and horizontal positions with the depth dimension of the HSI, we propose a three-dimensional retention mechanism for the three-dimensional HSI dataset in the retention encoder. By efficiently using both local and global spectral-spatial information, the proposed method offers a potent tool for HSI classification. We evaluated the classification performance of the proposed HSIRN approach on four datasets through comprehensive examinations, and the results demonstrated its superiority over state-of-the-art methods. At https://github.com/RajatArya22/HSIRN, the source code will be available to the public. © 2024 COSPAR