User Guide¶

Standard Models¶

class network.Network(input_shape, device, gpu, track_dynamics=False)¶

Base class for all neural network modules.

Your network should also subclass this class.

Parameters:	input_shape (tuple) – input tensor shape device (torch.device or int) – GPU or CPU for training purposes gpu (bool) – true if GPU is enabled on the system, false otherwise track_dynamics (bool) – tracks the NN dynamics during training (stores input-output for every intermediate layer)

input_shape¶: tuple – input tensor shape

layer_list¶: iterable – an iterable of pytorch torch.nn.modules.container.Sequential type layers

num_layers¶: int – number of layers in the model

is_gpu¶: bool – true if GPU is enabled on the system, false otherwise

device¶: torch.device or int – GPU or CPU for training purposes

track_dynamics¶: bool – tracks the NN dynamics during training (stores input-output for every intermediate layer)

criterion¶: callable – loss function for the model

optimizer¶: torch.optim.Optimizer – optimizer for training the model

metrics¶: str – metric to be used for evaluating performance of the model

add(layer_obj)¶

Adds specified (by the layer_obj argument) layer to the model.

Parameters:	layer_obj (glow.Layer) – object of specific layer to be added

attach_evaluator(evaluator_obj)¶

Attaches an evaluator with the model which will get evaluated at every pass of batch and obtain information plane coordinates according to defined criterion in the ‘evaluator_obj’.

It appends the ‘evaluator_obj’ to the list evaluator_list which contains all the attached evaluators with the model.

Parameters:	evaluator_obj (glow.information_bottleneck.Estimator) – evaluator object with has criterion defined which will get evaluated for the dynamics of the training process

compile(optimizer='SGD', loss='cross_entropy', metrics=[], learning_rate=0.001, momentum=0.95, **kwargs)¶

Compile the model with attaching optimizer and loss function to the model.

Parameters:

optimizer (torch.optim.Optimizer) – optimizer to be used during training process
loss (loss) – loss function for back-propagation
metrics (list) – list of all performance metric which needs to be evaluated in validation pass
learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)

fit(x_train, y_train, batch_size, num_epochs, validation_split=0.2, show_plot=False)¶

Fits the dataset passed as numpy array (Keras like pipeline) in the arguments.

Parameters:

x_train (numpy.ndarray) – training input dataset
y_train (numpy.ndarray) – training ground-truth labels
batch_size (int) – batch size of one batch
num_epochs (int) – number of epochs for training
validation_split (float, optional) – proportion of the total dataset to be used for validation (default: 0.2)
show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)

fit_generator(train_loader, val_loader, num_epochs, show_plot=False)¶

Fits the dataset by taking data-loader as argument.

Parameters:	num_epochs (int) – number of epochs for training train_loader (torch.utils.data.DataLoader) – training dataset (with already processed batches) val_loader (torch.utils.data.DataLoader) – validation dataset (with already processed batches) show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)

forward(x)¶

Method for defining forward pass through the model.

This method needs to be overridden by your implementation contain logic of the forward pass through your model.

Parameters:	x (torch.Tensor) – input tensor to the model
Returns:	tuple containing: (torch.Tensor): output tensor of the network (iterable): list of hidden layer outputs for dynamics tracking purposes
Return type:	(tuple)

Sequential¶

class network.Sequential(input_shape, gpu=False)¶

Keras like Sequential model.

Parameters:	input_shape (tuple) – input tensor shape gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)

IBSequential¶

class network.IBSequential(input_shape, gpu=False, track_dynamics=False, save_dynamics=False)¶

Keras like Sequential model with extended more sophisticated Information Bottleneck functionalities for analysing the dynamics of training.

Parameters:

input_shape (tuple) – input tensor shape
gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
track_dynamics (bool) – if true then will track the input-hidden-output dynamics segment and will allow evaluator to attach to the model, for false no track for dynamics is kept
save_dynamics (bool, optional) – if true then saves the whole training process dynamics into a distributed file (for efficiency)

evaluator_list¶: iterable – list of glow.information_bottleneck.Estimator instances which stores the evaluators for the model

evaluated_dynamics¶: iterable – list of evaluated dynamics segment information coordinates for intermediate layer for each evaluator averaged over batch for each epoch

Shape:

evaluator_list has shape (N, E, L, 2) where:

N: Number of epochs
E: Number of evaluators
L: Number of layers with parameters (Flatten and Dropout excluded)

and last dimension is equal to 2 which stores 2-D information plane coordinates

‘Models without Back-prop’¶

class hsic.HSIC(input_shape, device, gpu, **kwargs)¶

The HSIC Bottelneck: Deep Learning without backpropagation.

Base class for all HSIC network.

Your HSIC network should also subclass this class.

Parameters:	input_shape (tuple) – input tensor shape device (torch.device, optional) – gpu (bool) – true if GPU is enabled on the system, false otherwise

add(layer_obj, loss_criterion=None, regularize_coeff=0)¶

Adds specified layer and loss criterion to the model which will be used for measuring objective between layer’s current representation and optimal representation (representation with minimum IB-based objective).

Parameters:	layer_obj (glow.Layer) – object of specific layer to be added loss_criterion (glow.information_bottleneck.Estimator) – loss function for the layer which is added regularize_coeff (float) – regularization coefficient between generalization and compression of IB type objective

compile(loss_criterion=None, optimizer='SGD', regularize_coeff=100, learning_rate=0.001, momentum=0.95, **kwargs)¶

Compile the HSIC network with loss criterion (criterion objective used as loss function for intermediate representations).

All the layers which did not have any criterion passed as argument at the time of ‘.add’ of layer automatically takes this loss criterion.

Parameters:

loss_criterion (glow.information_bottleneck.Estimator) – criterion function which is an instance of glow.information_bottleneck.Estimator
optimizer (torch.optim.Optimizer) – optimizer to be used during training process for all the layers
regularize_coeff (float) – trade-off parameter between generalization and compression according to IB-based theory
learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)

forward(x)¶

Method for defining forward pass through the model.

This method needs to be overridden by your implementation which contains the logic of the forward pass through your model.

Parameters:	x (torch.Tensor) – input tensor to the model
Returns:	list of hidden layer outputs (objects of type `torch.Tensor`) which are detached from their previous layer’s gradients
Return type:	(iterable)

pre_training_loop(num_epochs, train_loader, val_loader)¶

Pre training phase in which hidden representations are learned using HSIC training paradigm.

Parameters:	num_epochs (int) – number of epochs for pre-training phase train_loader (torch.utils.data.DataLoader) – training dataset (with already processed batches) val_loader (torch.utils.data.DataLoader) – validation dataset (with already processed batches)

sequential_forward(x)¶

Sequentially calculate the output taking HSIC network as sequential feedforward neural network and is equivalent to forward pass in standard models.

Parameters:	x (torch.Tensor) – input tensor to the network
Returns:	output of the sequential feedforward network
Return type:	(torch.Tensor)

HSICSequential¶

class hsic.HSICSequential(input_shape, gpu=False, **kwargs)¶

Base implementation for HSIC networks.

This class forms instances for multi-model sigma network as given in the paper https://arxiv.org/abs/1908.01580 .

Parameters:	input_shape (tuple) – input tensor shape gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)

Layers¶

class layer.Layer(*args)¶

Base class for all layer modules.

Your layer should also subclass this class.

forward(x)¶

Forward method overrides PyTorch forward method and contains the logic for the forward pass through the custom layer defined.

Parameters:	x (torch.Tensor) – input tensor to the layer
Returns:	output tensor of the layer
Return type:	y (torch.Tensor)

set_input(input_shape)¶

Takes input_shape and demands user to define a variable self.output_shape which stores the output shape of the custom layer.

Parameters:	input_shape (tuple) – input shape of the tensor which the layer expects to receive

Core¶

class core.Dense(output_dim, activation=None)¶

Bases: glow.layer.Layer

Class for full connected dense layer.

Parameters:	output_dim (int) – output dimension of the dense layer activation (str) – activation function to be used for the layer (default: None)

class core.Dropout(prob)¶

Bases: glow.layer.Layer

Class for dropout layer - regularization using noise stablity of output.

Parameters:	prob (float) – probability with which neurons in the previous layer is dropped

class core.Flatten¶

Bases: glow.layer.Layer

Class for flattening the input shape.

Convolutional¶

class convolutional.Conv1d(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶

Bases: convolutional._Conv

Convolutional layer of rank 1.

Parameters:

filters (int) – number of filters for the layer
kernel_size (int) – size of kernel to be used for convolutional operation
stride (int) – stride for the kernel in convolutional operations
padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
dilation (int, optional) – dilation for the convolutional operation (default: 1)
activation (str) – activation function to be used for the layer (default: None)

class convolutional.Conv2d(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶

Bases: convolutional._Conv

Convolutional layer of rank 2.

Parameters:

filters (int) – number of filters for the layer
kernel_size (int) – size of kernel to be used for convolutional operation
stride (int) – stride for the kernel in convolutional operations
padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
dilation (int, optional) – dilation for the convolutional operation (default: 1)
activation (str) – activation function to be used for the layer (default: None)

class convolutional.Conv3d(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶

Bases: convolutional._Conv

Convolutional layer of rank 3.

Parameters:

filters (int) – number of filters for the layer
kernel_size (int) – size of kernel to be used for convolutional operation
stride (int) – stride for the kernel in convolutional operations
padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
dilation (int, optional) – dilation for the convolutional operation (default: 1)
activation (str) – activation function to be used for the layer (default: None)

Normalization¶

class normalization.BatchNorm1d(eps=1e-05, momentum=0.1)¶

Bases: normalization._BatchNorm

1-D batch normalization layer.

See https://arxiv.org/abs/1502.03167 for more information on batch normalization.

Parameters:	eps (float) – a value added to the denominator for numerical stability (default: 1e-5) momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)

class normalization.BatchNorm2d(eps=1e-05, momentum=0.1)¶

Bases: normalization._BatchNorm

2-D batch normalization layer.

Parameters:	eps (float) – a value added to the denominator for numerical stability (default: 1e-5) momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)

class normalization.BatchNorm3d(eps=1e-05, momentum=0.1)¶

Bases: normalization._BatchNorm

3-D batch normalization layer.

Parameters:	eps (float) – a value added to the denominator for numerical stability (default: 1e-5) momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)

Pooling¶

1-D¶

class pooling.MaxPool1d(kernel_size, stride, padding=0, dilation=1)¶

Bases: pooling._Pooling1d

1-D max pooling layer.

Parameters:	kernel_size (int) – size of kernel to be used for pooling operation stride (int) – stride for the kernel in pooling operations padding (int, optional) – padding for the image to handle edges while pooling (default: 0) dilation (int, optional) – dilation for the pooling operation (default: 1)

class pooling.AvgPool1d(kernel_size, stride, padding=0)¶

Bases: pooling._Pooling1d

1-D average pooling layer.

Parameters:	kernel_size (int) – size of kernel to be used for pooling operation stride (int) – stride for the kernel in pooling operations padding (int, optional) – padding for the image to handle edges while pooling (default: 0)

2-D¶

class pooling.MaxPool2d(kernel_size, stride, padding=0, dilation=1)¶

Bases: pooling._Pooling2d

2-D max pooling layer.

Parameters:	kernel_size (int) – size of kernel to be used for pooling operation stride (int) – stride for the kernel in pooling operations padding (int, optional) – padding for the image to handle edges while pooling (default: 0) dilation (int, optional) – dilation for the pooling operation (default: 1)

class pooling.AvgPool2d(kernel_size, stride, padding=0)¶

Bases: pooling._Pooling2d

2-D average pooling layer.

Parameters:	kernel_size (int) – size of kernel to be used for pooling operation stride (int) – stride for the kernel in pooling operations padding (int, optional) – padding for the image to handle edges while pooling (default: 0)

3-D¶

class pooling.MaxPool3d(kernel_size, stride, padding=0, dilation=1)¶

Bases: pooling._Pooling3d

3-D max pooling layer.

Parameters:	kernel_size (int) – size of kernel to be used for pooling operation stride (int) – stride for the kernel in pooling operations padding (int, optional) – padding for the image to handle edges while pooling (default: 0) dilation (int, optional) – dilation for the pooling operation (default: 1)

class pooling.AvgPool3d(kernel_size, stride, padding=0)¶

Bases: pooling._Pooling3d

3-D average pooling layer.

Parameters:	kernel_size (int) – size of kernel to be used for pooling operation stride (int) – stride for the kernel in pooling operations padding (int, optional) – padding for the image to handle edges while pooling (default: 0)

HSIC¶

class hsic_output.HSICoutput(output_dim, activation='softmax')¶

Bases: glow.layers.core.Dense

Class for HSIC sigma network output layer. This class extends functionalities of glow.layers.Dense with more robust features to serve for HSIC sigma network purposes.

Parameters:	output_dim (int) – output dimension of the HSIC output layer used after pre-training phase activation (str, optional) – activation function to be used for the layer (default: softmax)

Information Bottleneck¶

Estimator¶

class estimator.Estimator(gpu, **kwargs)¶

Base class for all the estimator modules.

Your estimator should also subclass this class.

This Class is for implementing functionalities to estimate different dependence criterion in information theory like mutual information etc. These methods are further used in analysing training dyanmics of different architechures.

Parameters:	gpu (bool) – if true then all the computation is carried on GPU else on CPU **kwargs – the keyword that stores parameters for the estimators

criterion(x, y)¶

Defines the criterion of the estimator for example EDGE algorithm have mutual information as its criterion. Generally criterion is some kind of dependence or independence measure between x and y. In the context of information theory most widely used criterion is mutual information between the two arguments.

Parameters:	x (torch.Tensor) – first random variable y (torch.Tensor) – second random variable
Returns:	calculated criterion of the two random variables ‘x’ and ‘y’
Return type:	(torch.Tensor)

eval_dynamics_segment(dynamics_segment)¶

Process smallest segment of dynamics and calculate coordinates using the defined criterion.

Parameters:	dynamics_segment (iterable) – smallest segment of the dynamics of a batch containing input, hidden layer output and label in form of `torch.Tensor` objects
Returns:	list of calculated coordinates according to the criterion with length equal to ‘len(dynamics_segment)-2’
Return type:	(iterable)

class estimator.HSIC(kernel, gpu=True, **kwargs)¶

Class for estimating Hilbert-Schmidt Independence Criterion as done in paper “The HSIC Bottleneck: Deep Learning without Back-Propagation”.

Parameters:	kernel (str) – kernel which is used for calculating K matrix in HSIC criterion gpu (bool) – if true then all the computation is carried on GPU else on CPU **kwargs – the keyword that stores parameters for HSIC criterion

criterion(x, y)¶: Defines the HSIC criterion.

Preprocessing¶

Data Loading and Generation¶

class data_generator.DataGenerator¶

class for implementing data generators and loaders.

prepare_numpy_data(x_train, y_train, batch_size, validation_split)¶

Converts numpy type dataset into PyTorch data-loader type dataset.

Parameters:	x_train (numpy.ndarray) – training input dataset y_train (numpy.ndarray) – training ground-truth labels batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation
Returns:	contains training data-loader with processed batches val_loader (torch.utils.data.DataLoader): contains validation data-loader with processed batches
Return type:	train_loader (torch.utils.data.DataLoader)

set_dataset(X, y, batch_size, validation_split=0.2)¶

Converts raw dataset into processed batched dataset loaders for training and validation.

Parameters:	X (torch.Tensor) – input dataset y (torch.Tensor) – labels batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation
Returns:	contains training data-loader with processed batches validation_dataset (torch.utils.data.DataLoader): contains validation data-loader with processed batches
Return type:	train_dataset (torch.utils.data.DataLoader)