User Guide¶
Standard Models¶

class
network.
Network
(input_shape, device, gpu, track_dynamics=False)¶ Base class for all neural network modules.
Your network should also subclass this class.
Parameters:  input_shape (tuple) – input tensor shape
 device (torch.device or int) – GPU or CPU for training purposes
 gpu (bool) – true if GPU is enabled on the system, false otherwise
 track_dynamics (bool) – tracks the NN dynamics during training (stores inputoutput for every intermediate layer)

input_shape
¶ tuple – input tensor shape

layer_list
¶ iterable – an iterable of pytorch
torch.nn.modules.container.Sequential
type layers

num_layers
¶ int – number of layers in the model

is_gpu
¶ bool – true if GPU is enabled on the system, false otherwise

device
¶ torch.device or int – GPU or CPU for training purposes

track_dynamics
¶ bool – tracks the NN dynamics during training (stores inputoutput for every intermediate layer)

criterion
¶ callable – loss function for the model

optimizer
¶ torch.optim.Optimizer – optimizer for training the model

metrics
¶ str – metric to be used for evaluating performance of the model

add
(layer_obj)¶ Adds specified (by the layer_obj argument) layer to the model.
Parameters: layer_obj (glow.Layer) – object of specific layer to be added

attach_evaluator
(evaluator_obj)¶ Attaches an evaluator with the model which will get evaluated at every pass of batch and obtain information plane coordinates according to defined criterion in the ‘evaluator_obj’.
It appends the ‘evaluator_obj’ to the list evaluator_list which contains all the attached evaluators with the model.
Parameters: evaluator_obj (glow.information_bottleneck.Estimator) – evaluator object with has criterion defined which will get evaluated for the dynamics of the training process

compile
(optimizer='SGD', loss='cross_entropy', metrics=[], learning_rate=0.001, momentum=0.95, **kwargs)¶ Compile the model with attaching optimizer and loss function to the model.
Parameters:  optimizer (torch.optim.Optimizer) – optimizer to be used during training process
 loss (loss) – loss function for backpropagation
 metrics (list) – list of all performance metric which needs to be evaluated in validation pass
 learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
 momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)

fit
(x_train, y_train, batch_size, num_epochs, validation_split=0.2, show_plot=False)¶ Fits the dataset passed as numpy array (Keras like pipeline) in the arguments.
Parameters:  x_train (numpy.ndarray) – training input dataset
 y_train (numpy.ndarray) – training groundtruth labels
 batch_size (int) – batch size of one batch
 num_epochs (int) – number of epochs for training
 validation_split (float, optional) – proportion of the total dataset to be used for validation (default: 0.2)
 show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)

fit_generator
(train_loader, val_loader, num_epochs, show_plot=False)¶ Fits the dataset by taking dataloader as argument.
Parameters:  num_epochs (int) – number of epochs for training
 train_loader (torch.utils.data.DataLoader) – training dataset (with already processed batches)
 val_loader (torch.utils.data.DataLoader) – validation dataset (with already processed batches)
 show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)

forward
(x)¶ Method for defining forward pass through the model.
This method needs to be overridden by your implementation contain logic of the forward pass through your model.
Parameters: x (torch.Tensor) – input tensor to the model Returns:  tuple containing:
 (torch.Tensor): output tensor of the network (iterable): list of hidden layer outputs for dynamics tracking purposes
Return type: (tuple)
Sequential¶

class
network.
Sequential
(input_shape, gpu=False)¶ Keras like Sequential model.
Parameters:  input_shape (tuple) – input tensor shape
 gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
IBSequential¶

class
network.
IBSequential
(input_shape, gpu=False, track_dynamics=False, save_dynamics=False)¶ Keras like Sequential model with extended more sophisticated Information Bottleneck functionalities for analysing the dynamics of training.
Parameters:  input_shape (tuple) – input tensor shape
 gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
 track_dynamics (bool) – if true then will track the inputhiddenoutput dynamics segment and will allow evaluator to attach to the model, for false no track for dynamics is kept
 save_dynamics (bool, optional) – if true then saves the whole training process dynamics into a distributed file (for efficiency)

evaluator_list
¶ iterable – list of
glow.information_bottleneck.Estimator
instances which stores the evaluators for the model

evaluated_dynamics
¶ iterable – list of evaluated dynamics segment information coordinates for intermediate layer for each evaluator averaged over batch for each epoch
 Shape:
 evaluator_list has shape (N, E, L, 2) where:
 N: Number of epochs
 E: Number of evaluators
 L: Number of layers with parameters (Flatten and Dropout excluded)
and last dimension is equal to 2 which stores 2D information plane coordinates
‘Models without Backprop’¶

class
hsic.
HSIC
(input_shape, device, gpu, **kwargs)¶ The HSIC Bottelneck: Deep Learning without backpropagation.
Base class for all HSIC network.
Your HSIC network should also subclass this class.
Parameters:  input_shape (tuple) – input tensor shape
 device (torch.device, optional) –
 gpu (bool) – true if GPU is enabled on the system, false otherwise

add
(layer_obj, loss_criterion=None, regularize_coeff=0)¶ Adds specified layer and loss criterion to the model which will be used for measuring objective between layer’s current representation and optimal representation (representation with minimum IBbased objective).
Parameters:  layer_obj (glow.Layer) – object of specific layer to be added
 loss_criterion (glow.information_bottleneck.Estimator) – loss function for the layer which is added
 regularize_coeff (float) – regularization coefficient between generalization and compression of IB type objective

compile
(loss_criterion=None, optimizer='SGD', regularize_coeff=100, learning_rate=0.001, momentum=0.95, **kwargs)¶ Compile the HSIC network with loss criterion (criterion objective used as loss function for intermediate representations).
All the layers which did not have any criterion passed as argument at the time of ‘.add’ of layer automatically takes this loss criterion.
Parameters:  loss_criterion (glow.information_bottleneck.Estimator) – criterion function which is an instance of
glow.information_bottleneck.Estimator
 optimizer (torch.optim.Optimizer) – optimizer to be used during training process for all the layers
 regularize_coeff (float) – tradeoff parameter between generalization and compression according to IBbased theory
 learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
 momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)
 loss_criterion (glow.information_bottleneck.Estimator) – criterion function which is an instance of

forward
(x)¶ Method for defining forward pass through the model.
This method needs to be overridden by your implementation which contains the logic of the forward pass through your model.
Parameters: x (torch.Tensor) – input tensor to the model Returns: list of hidden layer outputs (objects of type torch.Tensor
) which are detached from their previous layer’s gradientsReturn type: (iterable)

pre_training_loop
(num_epochs, train_loader, val_loader)¶ Pre training phase in which hidden representations are learned using HSIC training paradigm.
Parameters:  num_epochs (int) – number of epochs for pretraining phase
 train_loader (torch.utils.data.DataLoader) – training dataset (with already processed batches)
 val_loader (torch.utils.data.DataLoader) – validation dataset (with already processed batches)

sequential_forward
(x)¶ Sequentially calculate the output taking HSIC network as sequential feedforward neural network and is equivalent to forward pass in standard models.
Parameters: x (torch.Tensor) – input tensor to the network Returns: output of the sequential feedforward network Return type: (torch.Tensor)
HSICSequential¶

class
hsic.
HSICSequential
(input_shape, gpu=False, **kwargs)¶ Base implementation for HSIC networks.
This class forms instances for multimodel sigma network as given in the paper https://arxiv.org/abs/1908.01580 .
Parameters:  input_shape (tuple) – input tensor shape
 gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
Layers¶

class
layer.
Layer
(*args)¶ Base class for all layer modules.
Your layer should also subclass this class.

forward
(x)¶ Forward method overrides PyTorch forward method and contains the logic for the forward pass through the custom layer defined.
Parameters: x (torch.Tensor) – input tensor to the layer Returns: output tensor of the layer Return type: y (torch.Tensor)

set_input
(input_shape)¶ Takes input_shape and demands user to define a variable self.output_shape which stores the output shape of the custom layer.
Parameters: input_shape (tuple) – input shape of the tensor which the layer expects to receive

Core¶

class
core.
Dense
(output_dim, activation=None)¶ Bases:
glow.layer.Layer
Class for full connected dense layer.
Parameters:  output_dim (int) – output dimension of the dense layer
 activation (str) – activation function to be used for the layer (default: None)

class
core.
Dropout
(prob)¶ Bases:
glow.layer.Layer
Class for dropout layer  regularization using noise stablity of output.
Parameters: prob (float) – probability with which neurons in the previous layer is dropped

class
core.
Flatten
¶ Bases:
glow.layer.Layer
Class for flattening the input shape.
Convolutional¶

class
convolutional.
Conv1d
(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶ Bases:
convolutional._Conv
Convolutional layer of rank 1.
Parameters:  filters (int) – number of filters for the layer
 kernel_size (int) – size of kernel to be used for convolutional operation
 stride (int) – stride for the kernel in convolutional operations
 padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
 dilation (int, optional) – dilation for the convolutional operation (default: 1)
 activation (str) – activation function to be used for the layer (default: None)

class
convolutional.
Conv2d
(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶ Bases:
convolutional._Conv
Convolutional layer of rank 2.
Parameters:  filters (int) – number of filters for the layer
 kernel_size (int) – size of kernel to be used for convolutional operation
 stride (int) – stride for the kernel in convolutional operations
 padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
 dilation (int, optional) – dilation for the convolutional operation (default: 1)
 activation (str) – activation function to be used for the layer (default: None)

class
convolutional.
Conv3d
(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶ Bases:
convolutional._Conv
Convolutional layer of rank 3.
Parameters:  filters (int) – number of filters for the layer
 kernel_size (int) – size of kernel to be used for convolutional operation
 stride (int) – stride for the kernel in convolutional operations
 padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
 dilation (int, optional) – dilation for the convolutional operation (default: 1)
 activation (str) – activation function to be used for the layer (default: None)
Normalization¶

class
normalization.
BatchNorm1d
(eps=1e05, momentum=0.1)¶ Bases:
normalization._BatchNorm
1D batch normalization layer.
See https://arxiv.org/abs/1502.03167 for more information on batch normalization.
Parameters:  eps (float) – a value added to the denominator for numerical stability (default: 1e5)
 momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)

class
normalization.
BatchNorm2d
(eps=1e05, momentum=0.1)¶ Bases:
normalization._BatchNorm
2D batch normalization layer.
Parameters:  eps (float) – a value added to the denominator for numerical stability (default: 1e5)
 momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)

class
normalization.
BatchNorm3d
(eps=1e05, momentum=0.1)¶ Bases:
normalization._BatchNorm
3D batch normalization layer.
Parameters:  eps (float) – a value added to the denominator for numerical stability (default: 1e5)
 momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)
Pooling¶
1D¶

class
pooling.
MaxPool1d
(kernel_size, stride, padding=0, dilation=1)¶ Bases:
pooling._Pooling1d
1D max pooling layer.
Parameters:  kernel_size (int) – size of kernel to be used for pooling operation
 stride (int) – stride for the kernel in pooling operations
 padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
 dilation (int, optional) – dilation for the pooling operation (default: 1)

class
pooling.
AvgPool1d
(kernel_size, stride, padding=0)¶ Bases:
pooling._Pooling1d
1D average pooling layer.
Parameters:  kernel_size (int) – size of kernel to be used for pooling operation
 stride (int) – stride for the kernel in pooling operations
 padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
2D¶

class
pooling.
MaxPool2d
(kernel_size, stride, padding=0, dilation=1)¶ Bases:
pooling._Pooling2d
2D max pooling layer.
Parameters:  kernel_size (int) – size of kernel to be used for pooling operation
 stride (int) – stride for the kernel in pooling operations
 padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
 dilation (int, optional) – dilation for the pooling operation (default: 1)

class
pooling.
AvgPool2d
(kernel_size, stride, padding=0)¶ Bases:
pooling._Pooling2d
2D average pooling layer.
Parameters:  kernel_size (int) – size of kernel to be used for pooling operation
 stride (int) – stride for the kernel in pooling operations
 padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
3D¶

class
pooling.
MaxPool3d
(kernel_size, stride, padding=0, dilation=1)¶ Bases:
pooling._Pooling3d
3D max pooling layer.
Parameters:  kernel_size (int) – size of kernel to be used for pooling operation
 stride (int) – stride for the kernel in pooling operations
 padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
 dilation (int, optional) – dilation for the pooling operation (default: 1)

class
pooling.
AvgPool3d
(kernel_size, stride, padding=0)¶ Bases:
pooling._Pooling3d
3D average pooling layer.
Parameters:  kernel_size (int) – size of kernel to be used for pooling operation
 stride (int) – stride for the kernel in pooling operations
 padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
HSIC¶

class
hsic_output.
HSICoutput
(output_dim, activation='softmax')¶ Bases:
glow.layers.core.Dense
Class for HSIC sigma network output layer. This class extends functionalities of
glow.layers.Dense
with more robust features to serve for HSIC sigma network purposes.Parameters:  output_dim (int) – output dimension of the HSIC output layer used after pretraining phase
 activation (str, optional) – activation function to be used for the layer (default: softmax)
Information Bottleneck¶
Estimator¶

class
estimator.
Estimator
(gpu, **kwargs)¶ Base class for all the estimator modules.
Your estimator should also subclass this class.
This Class is for implementing functionalities to estimate different dependence criterion in information theory like mutual information etc. These methods are further used in analysing training dyanmics of different architechures.
Parameters:  gpu (bool) – if true then all the computation is carried on GPU else on CPU
 **kwargs – the keyword that stores parameters for the estimators

criterion
(x, y)¶ Defines the criterion of the estimator for example EDGE algorithm have mutual information as its criterion. Generally criterion is some kind of dependence or independence measure between x and y. In the context of information theory most widely used criterion is mutual information between the two arguments.
Parameters:  x (torch.Tensor) – first random variable
 y (torch.Tensor) – second random variable
Returns: calculated criterion of the two random variables ‘x’ and ‘y’
Return type: (torch.Tensor)

eval_dynamics_segment
(dynamics_segment)¶ Process smallest segment of dynamics and calculate coordinates using the defined criterion.
Parameters: dynamics_segment (iterable) – smallest segment of the dynamics of a batch containing input, hidden layer output and label in form of torch.Tensor
objectsReturns: list of calculated coordinates according to the criterion with length equal to ‘len(dynamics_segment)2’ Return type: (iterable)

class
estimator.
HSIC
(kernel, gpu=True, **kwargs)¶ Class for estimating HilbertSchmidt Independence Criterion as done in paper “The HSIC Bottleneck: Deep Learning without BackPropagation”.
Parameters:  kernel (str) – kernel which is used for calculating K matrix in HSIC criterion
 gpu (bool) – if true then all the computation is carried on GPU else on CPU
 **kwargs – the keyword that stores parameters for HSIC criterion

criterion
(x, y)¶ Defines the HSIC criterion.
Preprocessing¶
Data Loading and Generation¶

class
data_generator.
DataGenerator
¶ class for implementing data generators and loaders.

prepare_numpy_data
(x_train, y_train, batch_size, validation_split)¶ Converts numpy type dataset into PyTorch dataloader type dataset.
Parameters:  x_train (numpy.ndarray) – training input dataset
 y_train (numpy.ndarray) – training groundtruth labels
 batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation
Returns: contains training dataloader with processed batches val_loader (torch.utils.data.DataLoader): contains validation dataloader with processed batches
Return type: train_loader (torch.utils.data.DataLoader)

set_dataset
(X, y, batch_size, validation_split=0.2)¶ Converts raw dataset into processed batched dataset loaders for training and validation.
Parameters:  X (torch.Tensor) – input dataset
 y (torch.Tensor) – labels
 batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation
Returns: contains training dataloader with processed batches validation_dataset (torch.utils.data.DataLoader): contains validation dataloader with processed batches
Return type: train_dataset (torch.utils.data.DataLoader)
