Squeeze and Excitation Networks (SENet) introduce channel-wise attention mechanisms that complement spatial attention in computer vision. The SE module consists of two main operations: squeeze (global average pooling to capture channel information) and excitation (two fully connected layers with ReLU and sigmoid activations to generate channel weights). Unlike Vision Transformers that operate spatially, SE modules assign importance weights to different channels, assuming each channel contributes differently to class prediction. The implementation demonstrates integrating SE modules into ResNeXt architecture, showing how the bottleneck design with reduction ratio r=16 balances accuracy and computational complexity. Experimental results show consistent accuracy improvements across various CNN architectures with minimal computational overhead.
Sort: