1. Introduction

The Maxout activation function was proposed by Ian Goodfellow in the paper “Maxout Networks” in 2013. It involves employing multiple ReLU functions over the input and selecting the maximum value as the output. Mathematically, it is represented as:

(1)   \begin{equation*}f(x) = max (w_1  x + b_1, w_2  x + b_2, \dots, w_k  x + b_k)\end{equation*}

where x is the input, w_1, w_2, \dots w_k, and b_1, b_2, \dots, b_k are the weights and biases of the k ReLU activation functions.

2. Custom TensorFlow Implementation of Maxout

In TensorFlow, we can define a custom maxout implementation:

import tensorflow as tf
import numpy as np

def maxout(inputs, num_units, axis=None):
    shape = inputs.shape.as_list()
    if axis is None:
        axis = -1
    num_channels = shape[axis]
    if num_channels % num_units:
        raise ValueError('number of features({}) is not a multiple of num_units({})'
             .format(num_channels, num_units))
    shape[axis] = -1
    shape += [num_channels // num_units]
    outputs = tf.reduce_max(tf.reshape(inputs, shape), axis=-1, keepdims=False)
    return outputs

The function takes three main arguments: inputs, num_units, and axis. The inputs parameter represents the input tensor on which the maxout operation is applied. The num_units parameter specifies the number of units or neurons in the maxout layer. The axis parameter indicates the axis along which the max pooling operation is performed. By default, if the axis is not specified, the function assumes that the last dimension corresponds to the channels or features and performs the max pooling operation along this axis.

After reshaping, we apply the tf.reduce_max operation along the specified axis. This operation calculates the maximum value along the given axis, effectively performing the maxout operation.

3. Comparison with the Built-in Library of Maxout

Here, we use TensorFlow’s tf.reshape function to reshape the input tensor into a shape where the number of elements in each group is inferred. Then, we apply max pooling within each group to mimic the maxout operation:

# Call custom maxout function
output_tensor_custom = maxout(input_tensor, num_units=2, axis=-1)

# Reshape input tensor to facilitate maxout operation
input_tensor_reshaped = tf.reshape(input_tensor, (2, 3, -1))

# Apply tf.reduce_max to mimic Maxout
output_tensor_builtin = tf.reduce_max(tf.reshape(input_tensor_reshaped, (2, 3, -1, 3)), axis=-1)

Then, we use tf.reduce_max to find the maximum value along the last dimension of the reshaped tensor, effectively performing max pooling within each group. The axis=-1 argument indicates that the max pooling is performed along the last dimension of the tensor.

4. Total Code Comparison

Below is the entire code sample:

import tensorflow as tf
import numpy as np

def maxout(inputs, num_units, axis=None):
    shape = inputs.shape.as_list()
    if axis is None:
        axis = -1
    num_channels = shape[axis]
    if num_channels % num_units:
        raise ValueError('number of features({}) is not a multiple of num_units({})'
             .format(num_channels, num_units))
    shape[axis] = -1
    shape += [num_channels // num_units]
    outputs = tf.reduce_max(tf.reshape(inputs, shape), axis=-1, keepdims=False)
    return outputs

input_tensor = tf.constant(np.random.randn(2, 3, 6))  # Example input tensor of shape (2, 3, 6)
output_tensor_custom = maxout(input_tensor, num_units=2, axis=-1)
input_tensor_reshaped = tf.reshape(input_tensor, (2, 3, -1))

output_tensor_builtin = tf.reduce_max(tf.reshape(input_tensor_reshaped, (2, 3, -1, 3)), axis=-1)

print("Output tensor from custom maxout:")
print(output_tensor_custom.numpy())

print("\nOutput tensor mimicking Maxout using tf.reduce_max:")
print(output_tensor_builtin.numpy())

if np.allclose(output_tensor_custom.numpy(), output_tensor_builtin.numpy()):
    print("\nCustom maxout function and tf.reduce_max produce the same output.")
else:
    print("\nCustom maxout function and tf.reduce_max produce different output.")

5. Conclusion

In this article, we discussed the maxout activation that offers several advantages such as improved efficiency and robustness, it also comes with computational costs and challenges in hyperparameter tuning.

We implemented and utilized the maxout activation function in the Python-based machine learning framework TensorFlow.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.