User Guide¶
Standard Models¶
-
class
network.
Network
(input_shape, device, gpu, track_dynamics=False)¶ Base class for all neural network modules.
Your network should also subclass this class.
Parameters: - input_shape (tuple) – input tensor shape
- device (torch.device or int) – GPU or CPU for training purposes
- gpu (bool) – true if GPU is enabled on the system, false otherwise
- track_dynamics (bool) – tracks the NN dynamics during training (stores input-output for every intermediate layer)
-
input_shape
¶ tuple – input tensor shape
-
layer_list
¶ iterable – an iterable of pytorch
torch.nn.modules.container.Sequential
type layers
-
num_layers
¶ int – number of layers in the model
-
is_gpu
¶ bool – true if GPU is enabled on the system, false otherwise
-
device
¶ torch.device or int – GPU or CPU for training purposes
-
track_dynamics
¶ bool – tracks the NN dynamics during training (stores input-output for every intermediate layer)
-
criterion
¶ callable – loss function for the model
-
optimizer
¶ torch.optim.Optimizer – optimizer for training the model
-
metrics
¶ str – metric to be used for evaluating performance of the model
-
add
(layer_obj)¶ Adds specified (by the layer_obj argument) layer to the model.
Parameters: layer_obj (glow.Layer) – object of specific layer to be added
-
attach_evaluator
(evaluator_obj)¶ Attaches an evaluator with the model which will get evaluated at every pass of batch and obtain information plane coordinates according to defined criterion in the ‘evaluator_obj’.
It appends the ‘evaluator_obj’ to the list evaluator_list which contains all the attached evaluators with the model.
Parameters: evaluator_obj (glow.information_bottleneck.Estimator) – evaluator object with has criterion defined which will get evaluated for the dynamics of the training process
-
compile
(optimizer='SGD', loss='cross_entropy', metrics=[], learning_rate=0.001, momentum=0.95, **kwargs)¶ Compile the model with attaching optimizer and loss function to the model.
Parameters: - optimizer (torch.optim.Optimizer) – optimizer to be used during training process
- loss (loss) – loss function for back-propagation
- metrics (list) – list of all performance metric which needs to be evaluated in validation pass
- learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
- momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)
-
fit
(x_train, y_train, batch_size, num_epochs, validation_split=0.2, show_plot=False)¶ Fits the dataset passed as numpy array (Keras like pipeline) in the arguments.
Parameters: - x_train (numpy.ndarray) – training input dataset
- y_train (numpy.ndarray) – training ground-truth labels
- batch_size (int) – batch size of one batch
- num_epochs (int) – number of epochs for training
- validation_split (float, optional) – proportion of the total dataset to be used for validation (default: 0.2)
- show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)
-
fit_generator
(train_loader, val_loader, num_epochs, show_plot=False)¶ Fits the dataset by taking data-loader as argument.
Parameters: - num_epochs (int) – number of epochs for training
- train_loader (torch.utils.data.DataLoader) – training dataset (with already processed batches)
- val_loader (torch.utils.data.DataLoader) – validation dataset (with already processed batches)
- show_plot (bool, optional) – if true plots the training loss (red), validation loss (blue) vs epochs (default: True)
-
forward
(x)¶ Method for defining forward pass through the model.
This method needs to be overridden by your implementation contain logic of the forward pass through your model.
Parameters: x (torch.Tensor) – input tensor to the model Returns: - tuple containing:
- (torch.Tensor): output tensor of the network (iterable): list of hidden layer outputs for dynamics tracking purposes
Return type: (tuple)
Sequential¶
-
class
network.
Sequential
(input_shape, gpu=False)¶ Keras like Sequential model.
Parameters: - input_shape (tuple) – input tensor shape
- gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
IBSequential¶
-
class
network.
IBSequential
(input_shape, gpu=False, track_dynamics=False, save_dynamics=False)¶ Keras like Sequential model with extended more sophisticated Information Bottleneck functionalities for analysing the dynamics of training.
Parameters: - input_shape (tuple) – input tensor shape
- gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
- track_dynamics (bool) – if true then will track the input-hidden-output dynamics segment and will allow evaluator to attach to the model, for false no track for dynamics is kept
- save_dynamics (bool, optional) – if true then saves the whole training process dynamics into a distributed file (for efficiency)
-
evaluator_list
¶ iterable – list of
glow.information_bottleneck.Estimator
instances which stores the evaluators for the model
-
evaluated_dynamics
¶ iterable – list of evaluated dynamics segment information coordinates for intermediate layer for each evaluator averaged over batch for each epoch
- Shape:
- evaluator_list has shape (N, E, L, 2) where:
- N: Number of epochs
- E: Number of evaluators
- L: Number of layers with parameters (Flatten and Dropout excluded)
and last dimension is equal to 2 which stores 2-D information plane coordinates
‘Models without Back-prop’¶
-
class
hsic.
HSIC
(input_shape, device, gpu, **kwargs)¶ The HSIC Bottelneck: Deep Learning without backpropagation.
Base class for all HSIC network.
Your HSIC network should also subclass this class.
Parameters: - input_shape (tuple) – input tensor shape
- device (torch.device, optional) –
- gpu (bool) – true if GPU is enabled on the system, false otherwise
-
add
(layer_obj, loss_criterion=None, regularize_coeff=0)¶ Adds specified layer and loss criterion to the model which will be used for measuring objective between layer’s current representation and optimal representation (representation with minimum IB-based objective).
Parameters: - layer_obj (glow.Layer) – object of specific layer to be added
- loss_criterion (glow.information_bottleneck.Estimator) – loss function for the layer which is added
- regularize_coeff (float) – regularization coefficient between generalization and compression of IB type objective
-
compile
(loss_criterion=None, optimizer='SGD', regularize_coeff=100, learning_rate=0.001, momentum=0.95, **kwargs)¶ Compile the HSIC network with loss criterion (criterion objective used as loss function for intermediate representations).
All the layers which did not have any criterion passed as argument at the time of ‘.add’ of layer automatically takes this loss criterion.
Parameters: - loss_criterion (glow.information_bottleneck.Estimator) – criterion function which is an instance of
glow.information_bottleneck.Estimator
- optimizer (torch.optim.Optimizer) – optimizer to be used during training process for all the layers
- regularize_coeff (float) – trade-off parameter between generalization and compression according to IB-based theory
- learning_rate (float, optional) – learning rate for gradient descent step (default: 0.001)
- momentum (float, optional) – momentum for different variants of optimizers (default: 0.95)
- loss_criterion (glow.information_bottleneck.Estimator) – criterion function which is an instance of
-
forward
(x)¶ Method for defining forward pass through the model.
This method needs to be overridden by your implementation which contains the logic of the forward pass through your model.
Parameters: x (torch.Tensor) – input tensor to the model Returns: list of hidden layer outputs (objects of type torch.Tensor
) which are detached from their previous layer’s gradientsReturn type: (iterable)
-
pre_training_loop
(num_epochs, train_loader, val_loader)¶ Pre training phase in which hidden representations are learned using HSIC training paradigm.
Parameters: - num_epochs (int) – number of epochs for pre-training phase
- train_loader (torch.utils.data.DataLoader) – training dataset (with already processed batches)
- val_loader (torch.utils.data.DataLoader) – validation dataset (with already processed batches)
-
sequential_forward
(x)¶ Sequentially calculate the output taking HSIC network as sequential feedforward neural network and is equivalent to forward pass in standard models.
Parameters: x (torch.Tensor) – input tensor to the network Returns: output of the sequential feedforward network Return type: (torch.Tensor)
HSICSequential¶
-
class
hsic.
HSICSequential
(input_shape, gpu=False, **kwargs)¶ Base implementation for HSIC networks.
This class forms instances for multi-model sigma network as given in the paper https://arxiv.org/abs/1908.01580 .
Parameters: - input_shape (tuple) – input tensor shape
- gpu (bool, optional) – if true then PyGlow will attempt to use GPU, for false CPU will be used (default: False)
Layers¶
-
class
layer.
Layer
(*args)¶ Base class for all layer modules.
Your layer should also subclass this class.
-
forward
(x)¶ Forward method overrides PyTorch forward method and contains the logic for the forward pass through the custom layer defined.
Parameters: x (torch.Tensor) – input tensor to the layer Returns: output tensor of the layer Return type: y (torch.Tensor)
-
set_input
(input_shape)¶ Takes input_shape and demands user to define a variable self.output_shape which stores the output shape of the custom layer.
Parameters: input_shape (tuple) – input shape of the tensor which the layer expects to receive
-
Core¶
-
class
core.
Dense
(output_dim, activation=None)¶ Bases:
glow.layer.Layer
Class for full connected dense layer.
Parameters: - output_dim (int) – output dimension of the dense layer
- activation (str) – activation function to be used for the layer (default: None)
-
class
core.
Dropout
(prob)¶ Bases:
glow.layer.Layer
Class for dropout layer - regularization using noise stablity of output.
Parameters: prob (float) – probability with which neurons in the previous layer is dropped
-
class
core.
Flatten
¶ Bases:
glow.layer.Layer
Class for flattening the input shape.
Convolutional¶
-
class
convolutional.
Conv1d
(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶ Bases:
convolutional._Conv
Convolutional layer of rank 1.
Parameters: - filters (int) – number of filters for the layer
- kernel_size (int) – size of kernel to be used for convolutional operation
- stride (int) – stride for the kernel in convolutional operations
- padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
- dilation (int, optional) – dilation for the convolutional operation (default: 1)
- activation (str) – activation function to be used for the layer (default: None)
-
class
convolutional.
Conv2d
(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶ Bases:
convolutional._Conv
Convolutional layer of rank 2.
Parameters: - filters (int) – number of filters for the layer
- kernel_size (int) – size of kernel to be used for convolutional operation
- stride (int) – stride for the kernel in convolutional operations
- padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
- dilation (int, optional) – dilation for the convolutional operation (default: 1)
- activation (str) – activation function to be used for the layer (default: None)
-
class
convolutional.
Conv3d
(filters, kernel_size, stride, padding=0, dilation=1, activation=None, **kwargs)¶ Bases:
convolutional._Conv
Convolutional layer of rank 3.
Parameters: - filters (int) – number of filters for the layer
- kernel_size (int) – size of kernel to be used for convolutional operation
- stride (int) – stride for the kernel in convolutional operations
- padding (int, optional) – padding for the image to handle edges while convoluting (default: 0)
- dilation (int, optional) – dilation for the convolutional operation (default: 1)
- activation (str) – activation function to be used for the layer (default: None)
Normalization¶
-
class
normalization.
BatchNorm1d
(eps=1e-05, momentum=0.1)¶ Bases:
normalization._BatchNorm
1-D batch normalization layer.
See https://arxiv.org/abs/1502.03167 for more information on batch normalization.
Parameters: - eps (float) – a value added to the denominator for numerical stability (default: 1e-5)
- momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)
-
class
normalization.
BatchNorm2d
(eps=1e-05, momentum=0.1)¶ Bases:
normalization._BatchNorm
2-D batch normalization layer.
Parameters: - eps (float) – a value added to the denominator for numerical stability (default: 1e-5)
- momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)
-
class
normalization.
BatchNorm3d
(eps=1e-05, momentum=0.1)¶ Bases:
normalization._BatchNorm
3-D batch normalization layer.
Parameters: - eps (float) – a value added to the denominator for numerical stability (default: 1e-5)
- momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average) (default: 0.1)
Pooling¶
1-D¶
-
class
pooling.
MaxPool1d
(kernel_size, stride, padding=0, dilation=1)¶ Bases:
pooling._Pooling1d
1-D max pooling layer.
Parameters: - kernel_size (int) – size of kernel to be used for pooling operation
- stride (int) – stride for the kernel in pooling operations
- padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
- dilation (int, optional) – dilation for the pooling operation (default: 1)
-
class
pooling.
AvgPool1d
(kernel_size, stride, padding=0)¶ Bases:
pooling._Pooling1d
1-D average pooling layer.
Parameters: - kernel_size (int) – size of kernel to be used for pooling operation
- stride (int) – stride for the kernel in pooling operations
- padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
2-D¶
-
class
pooling.
MaxPool2d
(kernel_size, stride, padding=0, dilation=1)¶ Bases:
pooling._Pooling2d
2-D max pooling layer.
Parameters: - kernel_size (int) – size of kernel to be used for pooling operation
- stride (int) – stride for the kernel in pooling operations
- padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
- dilation (int, optional) – dilation for the pooling operation (default: 1)
-
class
pooling.
AvgPool2d
(kernel_size, stride, padding=0)¶ Bases:
pooling._Pooling2d
2-D average pooling layer.
Parameters: - kernel_size (int) – size of kernel to be used for pooling operation
- stride (int) – stride for the kernel in pooling operations
- padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
3-D¶
-
class
pooling.
MaxPool3d
(kernel_size, stride, padding=0, dilation=1)¶ Bases:
pooling._Pooling3d
3-D max pooling layer.
Parameters: - kernel_size (int) – size of kernel to be used for pooling operation
- stride (int) – stride for the kernel in pooling operations
- padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
- dilation (int, optional) – dilation for the pooling operation (default: 1)
-
class
pooling.
AvgPool3d
(kernel_size, stride, padding=0)¶ Bases:
pooling._Pooling3d
3-D average pooling layer.
Parameters: - kernel_size (int) – size of kernel to be used for pooling operation
- stride (int) – stride for the kernel in pooling operations
- padding (int, optional) – padding for the image to handle edges while pooling (default: 0)
HSIC¶
-
class
hsic_output.
HSICoutput
(output_dim, activation='softmax')¶ Bases:
glow.layers.core.Dense
Class for HSIC sigma network output layer. This class extends functionalities of
glow.layers.Dense
with more robust features to serve for HSIC sigma network purposes.Parameters: - output_dim (int) – output dimension of the HSIC output layer used after pre-training phase
- activation (str, optional) – activation function to be used for the layer (default: softmax)
Information Bottleneck¶
Estimator¶
-
class
estimator.
Estimator
(gpu, **kwargs)¶ Base class for all the estimator modules.
Your estimator should also subclass this class.
This Class is for implementing functionalities to estimate different dependence criterion in information theory like mutual information etc. These methods are further used in analysing training dyanmics of different architechures.
Parameters: - gpu (bool) – if true then all the computation is carried on GPU else on CPU
- **kwargs – the keyword that stores parameters for the estimators
-
criterion
(x, y)¶ Defines the criterion of the estimator for example EDGE algorithm have mutual information as its criterion. Generally criterion is some kind of dependence or independence measure between x and y. In the context of information theory most widely used criterion is mutual information between the two arguments.
Parameters: - x (torch.Tensor) – first random variable
- y (torch.Tensor) – second random variable
Returns: calculated criterion of the two random variables ‘x’ and ‘y’
Return type: (torch.Tensor)
-
eval_dynamics_segment
(dynamics_segment)¶ Process smallest segment of dynamics and calculate coordinates using the defined criterion.
Parameters: dynamics_segment (iterable) – smallest segment of the dynamics of a batch containing input, hidden layer output and label in form of torch.Tensor
objectsReturns: list of calculated coordinates according to the criterion with length equal to ‘len(dynamics_segment)-2’ Return type: (iterable)
-
class
estimator.
HSIC
(kernel, gpu=True, **kwargs)¶ Class for estimating Hilbert-Schmidt Independence Criterion as done in paper “The HSIC Bottleneck: Deep Learning without Back-Propagation”.
Parameters: - kernel (str) – kernel which is used for calculating K matrix in HSIC criterion
- gpu (bool) – if true then all the computation is carried on GPU else on CPU
- **kwargs – the keyword that stores parameters for HSIC criterion
-
criterion
(x, y)¶ Defines the HSIC criterion.
Preprocessing¶
Data Loading and Generation¶
-
class
data_generator.
DataGenerator
¶ class for implementing data generators and loaders.
-
prepare_numpy_data
(x_train, y_train, batch_size, validation_split)¶ Converts numpy type dataset into PyTorch data-loader type dataset.
Parameters: - x_train (numpy.ndarray) – training input dataset
- y_train (numpy.ndarray) – training ground-truth labels
- batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation
Returns: contains training data-loader with processed batches val_loader (torch.utils.data.DataLoader): contains validation data-loader with processed batches
Return type: train_loader (torch.utils.data.DataLoader)
-
set_dataset
(X, y, batch_size, validation_split=0.2)¶ Converts raw dataset into processed batched dataset loaders for training and validation.
Parameters: - X (torch.Tensor) – input dataset
- y (torch.Tensor) – labels
- batch_size (int) – batch size of a single batch validation_split (float): proportion of the total dataset which is used for validation
Returns: contains training data-loader with processed batches validation_dataset (torch.utils.data.DataLoader): contains validation data-loader with processed batches
Return type: train_dataset (torch.utils.data.DataLoader)
-