Introduction

In this project diglm we implement a normalizing flow hybrid model (diglm) following the idea in this article [NMT+19] : the DIGLM is a machine learning algorithm trainable in a single feed-forward step to perform two distinct tasks, i.e.

  1. Probability Density estimation

  2. Classification (or any regression problem)

The first result is accomplished through the implementation of a normalizing flow trainable function, NeuralSplineFlow in the spqr module, with coupling layers for efficient evaluation of the Jacobian (see [DBMP19]).

The second task is performed with a Generalized Linear Model (GLM). The feature vector fed to the GLM is not the intial feature vector, but they are the “latent” features calculated by the normalizing flow. The feature vectors used for this part of the training need to have labels, hence the whole algorithm can be semi-supervised trained.

Normalizing flows and coupling layers

A normalizing flow ([PNR+21], [KPB21]) is a bijective and differentiable transformation \(g_{\theta}\), which can map a vector of features with D-dimensions x in a transformed vector \(g_\theta(x) = z\) with D-dimension. Since the transformation \(g_\theta\) is differentiable, the probability distribution of z and x are linked through a simple change of variable:

\[p(x) = p(z) \cdot det \Bigl(\dfrac{d g_\theta}{d x}\Bigr)\]

Hence the name normalizing. The parameters \(\theta\) can be trained to map a simple distribution p(z) (tipically a gaussian) into the feature distribution p(x) through the inverse transformation \(g_\theta^{-1}\).

It is possible to compose in chain bijective function to improve expressivity of the model and speed up the computing time for the Jacobian: if at each step the bijector is made to act as the identity on some features, the Jacobian can be made triangular (\(\mathcal{O}(D)\) instead of \(\mathcal{O}(D^3)\) complexity). This is why we use coupling layers (see [DBMP19]) of RealNVP ([DSohlDicksteinB16]) type to define our bijector.

GLM

Generalized Linear Model is a fancy name (there may be some historical reasons behind it) for a simple “perceptron” architecture of a machine learning layer: the inputs (features) x are linearly transformed applying trainable weights, and then the output (linear response) is passed through an activation function to produce a response. Given an objective loss, depending on labels of the inputs and the response, the weights are trained. Using math:

\[x \rightarrow w * x + b \rightarrow y = \dfrac{1}{1 + e^{-w*x + b}} \rightarrow loss(y_{true}, y)\]

In our case we had {0, 1} labeled data, therefore we choose a GLM with a sigmoid activation function.

DIGLM

The architecture of a hybrid model mixing the characteristics of normalizing flows and GLM is simple: instead of feeding the input features to the GLM, we transform the variables to the latent space of features z with the bijector and use these transformed variables as inputs to the GLM.

The training of both the parts of the algorithm is obtained in a single feed-forward step minimizing the log-likelihood of the labels:

\[\mathcal{L} = - \sum_i \log{ p(y_i| x) } = - \sum_i ( \log{ p(y_i| z; \beta) } + \log{p(z; \theta) det \mathcal{J} })\]

where \(\mathcal{J}\) is the Jacobian and \(\beta\) are the GLM parameters. We see that the minimization of this loss consists of the minimization of the objective function of both the algorithm parts, the GLM and the normalizing flow.

As sudjested in [NMT+19] we multiply the second term of the loss by a scaling constant \(\lambda\), which can be tuned to allow the algorithm to train on a part more then on another, depending on the desired performances.

References

DSohlDicksteinB16

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using real NVP. CoRR, 2016. URL: http://arxiv.org/abs/1605.08803, arXiv:1605.08803.

DBMP19(1,2)

Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. 2019. URL: https://arxiv.org/abs/1906.04032, doi:10.48550/ARXIV.1906.04032.

KPB21

Ivan Kobyzev, Simon J.D. Prince, and Marcus A. Brubaker. Normalizing flows: an introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979, Nov 2021. URL: http://dx.doi.org/10.1109/TPAMI.2020.2992934, doi:10.1109/tpami.2020.2992934.

NMT+19(1,2)

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshminarayanan. Hybrid models with deep and invertible features. 2019. arXiv:1902.02767.

PNR+21

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. 2021. arXiv:1912.02762.