Abstrakt: |
We review the current literature concerned with information plane (IP) analyses of neural network (NN) classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical evidence was found to be both supporting and conflicting. We review this evidence together with a detailed analysis of how the respective information quantities were estimated. Our survey suggests that compression visualized in IPs is not necessarily information-theoretic but is rather often compatible with geometric compression of the latent representations. This insight gives the IP a renewed justification. Aside from this, we shed light on the problem of estimating mutual information in deterministic NNs and its consequences. Specifically, we argue that, even in feedforward NNs, the data processing inequality needs not to hold for estimates of mutual information. Similarly, while a fitting phase, in which the mutual information is between the latent representation and the target increases, is necessary (but not sufficient) for good classification performance, depending on the specifics of mutual information estimation, such a fitting phase needs to not be visible in the IP. |