EoCD is a simple and efficient change detection model based on early fusion that entirely eliminates the need for a sophisticated decoder.
The Efficient Multiscale Feature Fusion module contains no learnable parameters yet effectively aggregates multiscale encoder features for optimal CD performance.
We demonstrate that CD performance is predominantly dependent on the encoder, making the decoder an additional and often unnecessary component — a new direction for the RS community.
Experiments with CNN, ViT, and Swin-based encoders confirm EoCD achieves the optimal balance between performance and prediction speed across architectures.
Comparison of various CD frameworks. (a) In late fusion, Ipre and Ipost are fed to a Siamese encoder causing the backbone to process each image separately, leading to increased computational cost. (b) Early fusion prevents this by concatenating the bitemporal images before passing to the backbone; however, the sophisticated decoder still adds undesirable complexity. (c) EoCD introduces a simple design that bypasses the extra overhead of the Siamese encoder and sophisticated decoder.
Overall architecture of the proposed EoCD. It characterizes a student–teacher framework with a decoder-less student network. The student network performs early fusion of temporal images and optimally combines the multiscale representations, thereby significantly improving the efficiency of the network.
Performance comparison of state-of-the-art methods on LEVIR-CD. FLOPs and latency are computed using RGB image size of 224×224. EoCD achieves superior performance in IoU, F1, and overall accuracy while showing favorable efficiency against existing approaches.
| Method | Backbone | Params (M) | FLOPs (G) | Latency (ms) | IoU (%) | F1 (%) | Accuracy (%) |
|---|---|---|---|---|---|---|---|
| EATDer | Custom | 7.12 | 21.30 | 26.2 | — | 91.20 | 98.75 |
| ELGCNet-LW | Custom | 6.78 | 15.17 | 24.5 | 82.36 | 90.33 | 99.03 |
| ChangeFormer | Custom | 41.03 | 106.00 | 26.6 | 82.48 | 90.40 | 99.04 |
| Convformer-CD/48 | Custom | 49.31 | 5.30 | 48.8 | 84.23 | 91.44 | 99.13 |
| RSMamba † | Mamba | 27.90 | 15.70 | — | 83.66 | 91.10 | — |
| CDMamba † | Mamba | 11.91 | 49.26 | 54.77 | 83.07 | 90.75 | 99.06 |
| BIT | ResNet-18 | 12.40 | 8.32 | 13.3 | 80.68 | 89.31 | 98.92 |
| STRobustNet | ResNet-18 | 13.73 | 19.32 | 12.7 | 83.66 | 91.11 | 99.10 |
| TMSF | ResNet-18 | 12.92 | 8.90 | 40.4 | 83.29 | 90.88 | — |
| FSG-Net | ResNet-18 | 13.76 | — | — | 83.94 | 91.27 | 99.10 |
| RHighNet † | ResNet-50 + ViT-B/16 | 120.80 | 69.47 | 98.9 | 84.01 | 91.31 | 99.13 |
| SFEARNet | SegFormer | 5.56 | 3.64 | 26.7 | 83.23 | 90.85 | 99.07 |
| DSFDcd | U-Net | 8.94 | — | — | 80.34 | 89.11 | 98.93 |
| EoCD Ours | mit-b1 | 13.37 | 2.49 | 8.1 | 83.20 | 90.83 | 99.08 |
| EoCD Ours | ResNet-34 | 21.50 | 4.39 | 3.8 | 83.33 | 90.91 | 99.09 |
| EoCD Ours | FocalNet-T | 30.32 | 6.46 | 12.1 | 84.78 | 91.76 | 99.17 |
† FLOPs and Latency reported using image size 256×256 due to model configuration constraints.
EoCD achieves superior performance across all metrics on the CDD-CD dataset.
| Method | IoU (%) | F1 (%) | Acc (%) |
|---|---|---|---|
| BIT | 80.01 | 88.90 | 97.47 |
| ChangeFormer | 81.53 | 89.83 | 97.68 |
| ChangeMamba | 81.99 | 90.10 | 97.72 |
| STRobustNet | 88.08 | 93.66 | 98.50 |
| ConvFormer-CD | 88.63 | 93.96 | 98.59 |
| CDMamba | 88.81 | 94.06 | 98.57 |
| DSFDcd | 88.81 | 94.06 | 98.57 |
| FSG-Net | 88.96 | 94.16 | 98.56 |
| TMSF | 90.44 | 94.98 | — |
| EATDer | — | 95.97 | 98.97 |
| ELGCNet-LW | 93.48 | 96.63 | 99.21 |
| RHighNet | 94.65 | 97.25 | 99.35 |
| EoCD Ours | 94.83 | 97.34 | 99.37 |
EoCD performs significantly better compared to existing CD methods on SYSU-CD.
| Method | IoU (%) | F1 (%) | Acc (%) |
|---|---|---|---|
| ChangeFormer | 60.60 | 75.46 | 89.20 |
| BIT | 61.40 | 76.08 | 88.95 |
| ConvFormer-CD/48 | 65.76 | 79.35 | 90.98 |
| ELGCNet | 66.62 | 79.97 | 90.72 |
| ChangeMamba | 66.39 | 79.80 | 90.85 |
| DSFDcd | 67.31 | 80.46 | 91.00 |
| RHighNet | 67.53 | 80.62 | 91.33 |
| STRobustNet | 67.59 | 80.66 | 91.13 |
| LCD-Net | 68.38 | 81.22 | — |
| EoCD Ours | 68.67 | 81.42 | 91.67 |
EoCD exhibits substantial progress across all metrics on WHU-CD, indicating better capabilities to capture semantic changes.
| Method | IoU (%) | F1 (%) | Accuracy (%) |
|---|---|---|---|
| BIT | 72.39 | 83.98 | 98.75 |
| ChangeFormer | 73.80 | 84.93 | 98.82 |
| ELGCNet | 80.86 | 89.42 | 99.20 |
| EATDer | — | 90.01 | 98.58 |
| TMSF | 80.09 | 88.95 | — |
| RSMamba | 84.96 | 91.87 | — |
| STRobustNet | 83.29 | 90.89 | 99.32 |
| RHighNet | 83.79 | 91.18 | 99.32 |
| ScratchFormer | 84.97 | 91.89 | 99.37 |
| ConvFormer-CD | 85.41 | 92.13 | 99.26 |
| SFEARNet | 85.81 | 92.36 | 99.38 |
| EoCD Ours | 87.17 | 93.15 | 99.47 |
Qualitative comparison of EoCD with BIT, ChangeFormer, and ELGC-Net CD methods. Data samples shown from row one to four correspond to LEVIR-CD, CDD-CD, SYSU-CD, and WHU-CD datasets, respectively. Notably, our approach demonstrates its capabilities to better detect the semantic changes highlighted in the yellow dotted boxes compared to existing methods.
@article{noman2026eocd,
title = {EoCD: Encoder only Remote Sensing Change Detection},
author = {Noman, Mubashir and Fiaz, Mustansar and Debary, Hiyam
and Hannan, Abdul and Nawaz, Shah and Khan, Fahad Shahbaz
and Khan, Salman},
journal = {arXiv preprint arXiv:2602.05882},
year = {2026},
url = {https://arxiv.org/abs/2602.05882}
}