An efficient hyperspectral image classification method using retentive network
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In recent computer vision tasks, the vision transformer (ViT) has demonstrated competitive ability. However, ViT still has problems: the computational complexity of the self-attention layer leads to expensive and slow interference, and processing all tokens for high-resolution images may slow down due to the layer's quadratic complexity. Recently, a retentive network with excellent performance, training parallelism, and an inexpensive inference cost was proposed. For hyperspectral image (HSI) classification, this paper proposes a retention-based network model called the HSI retentive network (HSIRN). The proposed model allows memory usage independent of the token's sequence, facilitating the efficient processing of high-resolution images with low inference and computational costs. Although the retention encoder can extract global data, it pays limited attention to local data. A powerful tool for extracting local information is a convolutional neural network (CNN). The proposed HSIRN model uses a specific CNN-based block to extract local spectral-spatial information. To maintain degradation between successive vertical and horizontal positions with the depth dimension of the HSI, we propose a three-dimensional retention mechanism for the three-dimensional HSI dataset in the retention encoder. By efficiently using both local and global spectral-spatial information, the proposed method offers a potent tool for HSI classification. We evaluated the classification performance of the proposed HSIRN approach on four datasets through comprehensive examinations, and the results demonstrated its superiority over state-of-the-art methods. At https://github.com/RajatArya22/HSIRN, the source code will be available to the public. © 2024 COSPAR