API reference: C, C++

The pooling primitive performs forward or backward max or average pooling operation on 2D or 3D spatial data.

The pooling operation is defined by the following formulas. We show formulas only for 2D spatial data which are straightforward to generalize to cases of higher and lower dimensions. Variable names follow the standard Naming Conventions.

Forward

Max pooling:

\[ dst(n, c, oh, ow) = \max\limits_{kh, kw} \left( src(n, c, oh \cdot SH + kh - ph_0, ow \cdot SW +kw - pw_0) \right) \]

Average pooling:

\[ dst(n, c, oh, ow) = \frac{1}{DENOM} \sum\limits_{kh, kw} src(n, c, oh \cdot SH + kh - ph_0, ow \cdot SW +kw - pw_0) \]

where \(ph_0, pw_0\) are padding_l[0] and padding_l[1] respectively, and output spatial dimensions are calculated similarly to how they are done in convolution.

Average pooling supports two algorithms:

mkldnn_pooling_avg_include_padding, in which case \(DENOM = KH \cdot KW\),
mkldnn_pooling_avg_exclude_padding, in which case \(DENOM\) equals to the size of overlap between an averaging window and images.

TODO: a picture would be nice here.

Difference Between Forward Training and Forward Inference

Max pooling requires workspace output for the mkldnn_forward_training propagation kind, and doesn't require it for mkldnn_forward_inference (see details below).

Backward

The backward propagation computes \(diff\_src(n, c, h, w)\), based on \(diff\_dst(n, c, h, w)\) and (in case of max pooling) workspace.

Implementation Details

General Notes

During training, max pooling requires a workspace on forward (mkldnn_forward_training) and backward passes to save indices where a maximum was found. The workspace format is opaque, and the indices cannot be restored from it. However, one can use backward pooling to perform up-sampling (used in some detection topologies).
A user can use memory format tag mkldnn_format_tag_any for dst memory descriptor when creating pooling forward propagation. The library would derive the appropriate format from the src memory descriptor. However, the src itself must be defined. Similarly, a user can use memory format tag mkldnn_format_tag_any for thediff_src memory descriptor when creating pooling backward propagation.

Data Type Support

The pooling primitive supports the following combinations of data types:

Propagation	Source / Destination	Acc
forward / backward	f32	f32
forward	f16	f16
forward	s8, u8, s32	s32

Warning: There might be hardware and/or implementation specific restrictions. Check Implementation Limitations section below.

Data Representation

Source, Destination, and Their Gradients

Like other CNN primitives, the pooling primitive expects data to be \(N \times C \times H \times W\) tensor in case 2D spatial data and \(N \times C \times D \times H \times W\) tensor in case 3D spatial data.

The pooling primitive is optimized for the following memory formats:

Spatial	Logical tensor	Data type	Implementations optimized for memory formats
2D	NCHW	f32	mkldnn_nchw (mkldnn_abcd), mkldnn_nhwc (mkldnn_acdb), optimized^
2D	NCHW	s32, s8, u8	mkldnn_nhwc (mkldnn_acdb), optimized^
3D	NCDHW	f32	mkldnn_ncdhw (mkldnn_abcde), mkldnn_ndhwc (mkldnn_acdeb), optimized^
3D	NCDHW	s32, s8, u8	mkldnn_ndhwc (mkldnn_acdeb), optimized^

Here optimized^ means the format that comes out of any preceding compute-intensive primitive.

Post-ops and Attributes

The pooling primitive doesn't support any post-ops or attributes.

Implementation Limitations

No primitive specific limitations. Refer to Data Types for limitations related to data types support.

Performance Tips

N/A