Overview

Max pooling is a down-sampling technique commonly used in Convolutional Neural Networks (CNNs). The main objective of max pooling is to reduce the spatial dimensions (width and height) of the input feature maps while retaining the most important information. This operation helps in reducing computational cost, mitigating overfitting, and allowing the network to learn more abstract features.

How Max Pooling Works

Max pooling operates on a feature map by applying a pooling window (usually a 2x2 window) across the input and taking the maximum value within that window. The process is repeated across the entire feature map, effectively reducing its size.

Example:

Suppose we have a 4x4 feature map and apply a 2x2 max pooling operation with a stride of 2:

Input Feature Map:

1324
5678
9843
2157

After 2x2 Max Pooling:

68
97

In this example, the max pooling operation has reduced the 4x4 feature map to a 2x2 map.

Benefits of Max Pooling

  • Dimensionality Reduction: By reducing the size of the feature maps, max pooling decreases the computational load in subsequent layers of the network.
  • Translation Invariance: Max pooling provides a form of translation invariance, meaning small translations in the input image have a minimal effect on the pooled feature map.
  • Overfitting Mitigation: Reducing the number of parameters and computations helps in preventing overfitting, especially in large networks.

Common Variations

  • Average Pooling: Instead of taking the maximum value in each pooling window, average pooling computes the average value.
  • Global Max Pooling: In this variation, the pooling operation is applied to the entire feature map, producing a single value per feature map.

Practical Considerations

  • Pooling Window Size: Common choices are 2x2 and 3x3, but the window size can vary depending on the architecture and the size of the input feature maps.
  • Stride: The stride determines how the window moves across the feature map. A stride of 2 is common, but it can be adjusted based on the desired output size.
  • Padding: Sometimes, padding is applied to the input feature maps to maintain a specific output size.

Applications in CNN Architectures

Max pooling is widely used in popular CNN architectures such as:

  • LeNet
  • AlexNet
  • VGGNet
  • ResNet

In these architectures, max pooling layers are typically inserted between convolutional layers to progressively reduce the spatial dimensions of the feature maps.

Further Reading