User Guide

Standard Models

class network.Network(input_shape, device, gpu, track_dynamics=False)

Base class for all neural network modules.

Your network should also subclass this class.

  • input_shape (tuple) – input tensor shape
  • device (torch.device or int) – GPU or CPU for training purposes
  • gpu (bool) – true if GPU is enabled on the system, false otherwise
  • track_dynamics (bool) – tracks the NN dynamics during training (stores input-output for every intermediate layer)

tuple – input tensor shape


iterable – an iterable of pytorch torch.nn.modules.container.Sequential type layers


int – number of layers in the model


bool – true if GPU is enabled on the system, false otherwise


torch.device or intGPU or CPU for training purposes


bool – tracks the NN dynamics during training (stores input-output for every intermediate layer)


callable – loss function for the model


torch.optim.Optimizer – optimizer for training the model


str – metric to be used for evaluating performance of the model


Adds specified (by the layer_obj argument) layer to the model.

Parameters:layer_obj (glow.Layer) – object of specific layer to be added

Attaches an evaluator with the model which will get evaluated at every pass of batch and obtain information plane coordinates according to defined criterion in the ‘evaluator_obj’.

It appends the ‘evaluator_obj’ to the list evaluator_list which contains all the attached evaluators with the model.

Parameters:evaluator_obj (glow.information_bottleneck.Estimator) – evaluator object with has criterion defined which will get evaluated for the dynamics of the training process
compile(optimizer='SGD', loss='cross_entropy', metrics=[], learning_rate=0.001, momentum=0.95, **kwargs)

Compile the model with attaching optimizer and loss function to the model.

  • optimizer (torch.optim.Optimizer) – optimizer to be used during training process
  • loss (loss) – loss function for back-propagation
  • metrics (list) – list of all performance metric which needs to be evaluated in validation pass
  • learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
  • momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)
fit(x_train, y_train, batch_size, num_epochs, validation_split=0.2, show_plot=False)

Fits the dataset passed as numpy array (Keras like pipeline) in the arguments.

  • x_train (numpy.ndarray) – training input dataset
  • y_train (numpy.ndarray) – training ground-truth labels
  • batch_size (int) – batch size of one batch
  • num_epochs (int) – number of epochs for training
  • validation_split (float, optional) – proportion of the total dataset to be used for validation (default: 0.2)
  • show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)
fit_generator(train_loader, val_loader, num_epochs, show_plot=False)

Fits the dataset by taking data-loader as argument.

  • num_epochs (int) – number of epochs for training
  • train_loader ( – training dataset (with already processed batches)
  • val_loader ( – validation dataset (with already processed batches)
  • show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)

Method for defining forward pass through the model.

This method needs to be overridden by your implementation contain logic of the forward pass through your model.

Parameters:x (torch.Tensor) – input tensor to the model
tuple containing:
(torch.Tensor): output tensor of the network (iterable): list of hidden layer outputs for dynamics tracking purposes
Return type:(tuple)


class network.Sequential(input_shape, gpu=False)

Keras like Sequential model.

  • input_shape (tuple) – input tensor shape
  • gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)


class network.IBSequential(input_shape, gpu=False, track_dynamics=False, save_dynamics=False)

Keras like Sequential model with extended more sophisticated Information Bottleneck functionalities for analysing the dynamics of training.

  • input_shape (tuple) – input tensor shape
  • gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
  • track_dynamics (bool) – if true then will track the input-hidden-output dynamics segment and will allow evaluator to attach to the model, for false no track for dynamics is kept
  • save_dynamics (bool, optional) – if true then saves the whole training process dynamics into a distributed file (for efficiency)

iterable – list of glow.information_bottleneck.Estimator instances which stores the evaluators for the model


iterable – list of evaluated dynamics segment information coordinates for intermediate layer for each evaluator averaged over batch for each epoch

evaluator_list has shape (N, E, L, 2) where:
  • N: Number of epochs
  • E: Number of evaluators
  • L: Number of layers with parameters (Flatten and Dropout excluded)

and last dimension is equal to 2 which stores 2-D information plane coordinates

‘Models without Back-prop’

class hsic.HSIC(input_shape, device, gpu, **kwargs)

The HSIC Bottelneck: Deep Learning without backpropagation.

Base class for all HSIC network.

Your HSIC network should also subclass this class.

  • input_shape (tuple) – input tensor shape
  • device (torch.device, optional) –
  • gpu (bool) – true if GPU is enabled on the system, false otherwise
add(layer_obj, loss_criterion=None, regularize_coeff=0)

Adds specified layer and loss criterion to the model which will be used for measuring objective between layer’s current representation and optimal representation (representation with minimum IB-based objective).

  • layer_obj (glow.Layer) – object of specific layer to be added
  • loss_criterion (glow.information_bottleneck.Estimator) – loss function for the layer which is added
  • regularize_coeff (float) – regularization coefficient between generalization and compression of IB type objective
compile(loss_criterion=None, optimizer='SGD', regularize_coeff=100, learning_rate=0.001, momentum=0.95, **kwargs)

Compile the HSIC network with loss criterion (criterion objective used as loss function for intermediate representations).

All the layers which did not have any criterion passed as argument at the time of ‘.add’ of layer automatically takes this loss criterion.

  • loss_criterion (glow.information_bottleneck.Estimator) – criterion function which is an instance of glow.information_bottleneck.Estimator
  • optimizer (torch.optim.Optimizer) – optimizer to be used during training process for all the layers
  • regularize_coeff (float) – trade-off parameter between generalization and compression according to IB-based theory
  • learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
  • momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)

Method for defining forward pass through the model.

This method needs to be overridden by your implementation which contains the logic of the forward pass through your model.

Parameters:x (torch.Tensor) – input tensor to the model
Returns:list of hidden layer outputs (objects of type torch.Tensor) which are detached from their previous layer’s gradients
Return type:(iterable)
pre_training_loop(num_epochs, train_loader, val_loader)

Pre training phase in which hidden representations are learned using HSIC training paradigm.

  • num_epochs (int) – number of epochs for pre-training phase
  • train_loader ( – training dataset (with already processed batches)
  • val_loader ( – validation dataset (with already processed batches)

Sequentially calculate the output taking HSIC network as sequential feedforward neural network and is equivalent to forward pass in standard models.

Parameters:x (torch.Tensor) – input tensor to the network
Returns:output of the sequential feedforward network
Return type:(torch.Tensor)


class hsic.HSICSequential(input_shape, gpu=False, **kwargs)

Base implementation for HSIC networks.

This class forms instances for multi-model sigma network as given in the paper .

  • input_shape (tuple) – input tensor shape
  • gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)


class layer.Layer(*args)

Base class for all layer modules.

Your layer should also subclass this class.


Forward method overrides PyTorch forward method and contains the logic for the forward pass through the custom layer defined.

Parameters:x (torch.Tensor) – input tensor to the layer
Returns:output tensor of the layer
Return type:y (torch.Tensor)

Takes input_shape and demands user to define a variable self.output_shape which stores the output shape of the custom layer.

Parameters:input_shape (tuple) – input shape of the tensor which the layer expects to receive


class core.Dense(output_dim, activation=None)

Bases: glow.layer.Layer

Class for full connected dense layer.

  • output_dim (int) – output dimension of the dense layer
  • activation (str) – activation function to be used for the layer (default: None)
class core.Dropout(prob)

Bases: glow.layer.Layer

Class for dropout layer - regularization using noise stablity of output.

Parameters:prob (float) – probability with which neurons in the previous layer is dropped
class core.Flatten

Bases: glow.layer.Layer

Class for flattening the input shape.


class convolutional.Conv1d(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)

Bases: convolutional._Conv

Convolutional layer of rank 1.

  • filters (int) – number of filters for the layer
  • kernel_size (int) – size of kernel to be used for convolutional operation
  • stride (int) – stride for the kernel in convolutional operations
  • padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
  • dilation (int, optional) – dilation for the convolutional operation (default: 1)
  • activation (str) – activation function to be used for the layer (default: None)
class convolutional.Conv2d(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)

Bases: convolutional._Conv

Convolutional layer of rank 2.

  • filters (int) – number of filters for the layer
  • kernel_size (int) – size of kernel to be used for convolutional operation
  • stride (int) – stride for the kernel in convolutional operations
  • padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
  • dilation (int, optional) – dilation for the convolutional operation (default: 1)
  • activation (str) – activation function to be used for the layer (default: None)
class convolutional.Conv3d(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)

Bases: convolutional._Conv

Convolutional layer of rank 3.

  • filters (int) – number of filters for the layer
  • kernel_size (int) – size of kernel to be used for convolutional operation
  • stride (int) – stride for the kernel in convolutional operations
  • padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
  • dilation (int, optional) – dilation for the convolutional operation (default: 1)
  • activation (str) – activation function to be used for the layer (default: None)


class normalization.BatchNorm1d(eps=1e-05, momentum=0.1)

Bases: normalization._BatchNorm

1-D batch normalization layer.

See for more information on batch normalization.

  • eps (float) – a value added to the denominator for numerical stability (default: 1e-5)
  • momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)
class normalization.BatchNorm2d(eps=1e-05, momentum=0.1)

Bases: normalization._BatchNorm

2-D batch normalization layer.

  • eps (float) – a value added to the denominator for numerical stability (default: 1e-5)
  • momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)
class normalization.BatchNorm3d(eps=1e-05, momentum=0.1)

Bases: normalization._BatchNorm

3-D batch normalization layer.

  • eps (float) – a value added to the denominator for numerical stability (default: 1e-5)
  • momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)



class pooling.MaxPool1d(kernel_size, stride, padding=0, dilation=1)

Bases: pooling._Pooling1d

1-D max pooling layer.

  • kernel_size (int) – size of kernel to be used for pooling operation
  • stride (int) – stride for the kernel in pooling operations
  • padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
  • dilation (int, optional) – dilation for the pooling operation (default: 1)
class pooling.AvgPool1d(kernel_size, stride, padding=0)

Bases: pooling._Pooling1d

1-D average pooling layer.

  • kernel_size (int) – size of kernel to be used for pooling operation
  • stride (int) – stride for the kernel in pooling operations
  • padding (int, optional) – padding for the image to handle edges while pooling (default: 0)


class pooling.MaxPool2d(kernel_size, stride, padding=0, dilation=1)

Bases: pooling._Pooling2d

2-D max pooling layer.

  • kernel_size (int) – size of kernel to be used for pooling operation
  • stride (int) – stride for the kernel in pooling operations
  • padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
  • dilation (int, optional) – dilation for the pooling operation (default: 1)
class pooling.AvgPool2d(kernel_size, stride, padding=0)

Bases: pooling._Pooling2d

2-D average pooling layer.

  • kernel_size (int) – size of kernel to be used for pooling operation
  • stride (int) – stride for the kernel in pooling operations
  • padding (int, optional) – padding for the image to handle edges while pooling (default: 0)


class pooling.MaxPool3d(kernel_size, stride, padding=0, dilation=1)

Bases: pooling._Pooling3d

3-D max pooling layer.

  • kernel_size (int) – size of kernel to be used for pooling operation
  • stride (int) – stride for the kernel in pooling operations
  • padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
  • dilation (int, optional) – dilation for the pooling operation (default: 1)
class pooling.AvgPool3d(kernel_size, stride, padding=0)

Bases: pooling._Pooling3d

3-D average pooling layer.

  • kernel_size (int) – size of kernel to be used for pooling operation
  • stride (int) – stride for the kernel in pooling operations
  • padding (int, optional) – padding for the image to handle edges while pooling (default: 0)


class hsic_output.HSICoutput(output_dim, activation='softmax')

Bases: glow.layers.core.Dense

Class for HSIC sigma network output layer. This class extends functionalities of glow.layers.Dense with more robust features to serve for HSIC sigma network purposes.

  • output_dim (int) – output dimension of the HSIC output layer used after pre-training phase
  • activation (str, optional) – activation function to be used for the layer (default: softmax)

Information Bottleneck


class estimator.Estimator(gpu, **kwargs)

Base class for all the estimator modules.

Your estimator should also subclass this class.

This Class is for implementing functionalities to estimate different dependence criterion in information theory like mutual information etc. These methods are further used in analysing training dyanmics of different architechures.

  • gpu (bool) – if true then all the computation is carried on GPU else on CPU
  • **kwargs – the keyword that stores parameters for the estimators
criterion(x, y)

Defines the criterion of the estimator for example EDGE algorithm have mutual information as its criterion. Generally criterion is some kind of dependence or independence measure between x and y. In the context of information theory most widely used criterion is mutual information between the two arguments.

  • x (torch.Tensor) – first random variable
  • y (torch.Tensor) – second random variable

calculated criterion of the two random variables ‘x’ and ‘y’

Return type:



Process smallest segment of dynamics and calculate coordinates using the defined criterion.

Parameters:dynamics_segment (iterable) – smallest segment of the dynamics of a batch containing input, hidden layer output and label in form of torch.Tensor objects
Returns:list of calculated coordinates according to the criterion with length equal to ‘len(dynamics_segment)-2’
Return type:(iterable)
class estimator.HSIC(kernel, gpu=True, **kwargs)

Class for estimating Hilbert-Schmidt Independence Criterion as done in paper “The HSIC Bottleneck: Deep Learning without Back-Propagation”.

  • kernel (str) – kernel which is used for calculating K matrix in HSIC criterion
  • gpu (bool) – if true then all the computation is carried on GPU else on CPU
  • **kwargs – the keyword that stores parameters for HSIC criterion
criterion(x, y)

Defines the HSIC criterion.


Data Loading and Generation

class data_generator.DataGenerator

class for implementing data generators and loaders.

prepare_numpy_data(x_train, y_train, batch_size, validation_split)

Converts numpy type dataset into PyTorch data-loader type dataset.

  • x_train (numpy.ndarray) – training input dataset
  • y_train (numpy.ndarray) – training ground-truth labels
  • batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation

contains training data-loader with processed batches val_loader ( contains validation data-loader with processed batches

Return type:

train_loader (

set_dataset(X, y, batch_size, validation_split=0.2)

Converts raw dataset into processed batched dataset loaders for training and validation.

  • X (torch.Tensor) – input dataset
  • y (torch.Tensor) – labels
  • batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation

contains training data-loader with processed batches validation_dataset ( contains validation data-loader with processed batches

Return type:

train_dataset (