Transformer-based Models for Supervised Monocular Depth Estimation

Gupta A.; Prince A.A.; Fredo A.R.J.; Robert F.

doi:https://doi.org/10.1109/ICICCSP53532.2022.9862348

Transformer-based Models for Supervised Monocular Depth Estimation

dc.contributor.author	Gupta A.; Prince A.A.; Fredo A.R.J.; Robert F.
dc.date.accessioned	2025-05-23T11:24:27Z
dc.description.abstract	Existing traditional solutions for monocular depth estimation, usually use convolution networks as the backbone of their model architecture. This work presents an encoder-decoder network using a transformer architecture that can perform monocular depth estimation on a single RGB image. For environment perception and autonomous navigation systems, where depth estimation is done on edge devices, there is a need for lightweight and efficient models. It is shown that transformer-based architectures provide comparable results to the currently used convolution networks with significantly fewer parameters. Unlike convolutional networks, transformers don't downsample the input progressively at each layer. Maintaining a similar resolution throughout the encoding process allows for global awareness at each stage. 2 different decoder models are implemented on top of a transformer encoder and their usability is evaluated for depth estimation. On comparing with a comparable convolution network, it is observed that on the KITTI outdoor dataset, the lighter transformer model performs better in terms of robustness and accuracy. © 2022 IEEE.
dc.identifier.doi	https://doi.org/10.1109/ICICCSP53532.2022.9862348
dc.identifier.uri	http://172.23.0.11:4000/handle/123456789/10107
dc.relation.ispartofseries	2022 International Conference on Intelligent Controller and Computing for Smart Power, ICICCSP 2022
dc.title	Transformer-based Models for Supervised Monocular Depth Estimation

Collections

2022

Transformer-based Models for Supervised Monocular Depth Estimation

Files

Collections