Cross Entropy in Deep Learning of Classifiers Is Unnecessary—ISBE Error Is All You Need-Reference-Cited by-同舟云学术

Cross Entropy in Deep Learning of Classifiers Is Unnecessary—ISBE Error Is All You Need

Published:2024-01-12 Issue:1 Volume:26 Page:65
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Skarbek Władysław¹^ORCID

Affiliation:

1. Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-661 Warszawa, Poland

Abstract

In deep learning of classifiers, the cost function usually takes the form of a combination of SoftMax and CrossEntropy functions. The SoftMax unit transforms the scores predicted by the model network into assessments of the degree (probabilities) of an object’s membership to a given class. On the other hand, CrossEntropy measures the divergence of this prediction from the distribution of target scores. This work introduces the ISBE functionality, justifying the thesis about the redundancy of cross-entropy computation in deep learning of classifiers. Not only can we omit the calculation of entropy, but also, during back-propagation, there is no need to direct the error to the normalization unit for its backward transformation. Instead, the error is sent directly to the model’s network. Using examples of perceptron and convolutional networks as classifiers of images from the MNIST collection, it is observed for ISBE that results are not degraded with SoftMax only but also with other activation functions such as Sigmoid, Tanh, or their hard variants HardSigmoid and HardTanh. Moreover, savings in the total number of operations were observed within the forward and backward stages. The article is addressed to all deep learning enthusiasts but primarily to programmers and students interested in the design of deep models. For example, it illustrates in code snippets possible ways to implement ISBE functionality but also formally proves that the SoftMax trick only applies to the class of dilated SoftMax functions with relocations.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/26/1/65/pdf

Reference32 articles.

1. Schmidhuber, J. (2022). Annotated History of Modern AI and Deep Learning. arXiv.

2. The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain;Rosenblatt;Psychol. Rev.,1958

3. A theory of adaptive pattern classifier;Amari;IEEE Trans. Electron. Comput.,1967

4. Golden, R.M. (1996). Mathematical Methods for Neural Network Analysis and Design, The MIT Press.

5. Fergus, P., and Chalmers, C. (2022). Applied Deep Learning—Tools, Techniques, and Implementation, Springer.