In recent times the neural network based learning system has got a lot of attention. The reason for this attention is that these systems are getting good at lot many tasks , in some even better than human.This ability for a system to learn things is due to there ability to learn the mappings from the inputs to the outputs. These expert systems do not need to be hard coded with the explicit relations.However , they learn the general properties and features that are good over a wide variety of use cases.

However , it is seen that the neural network donot perform that well once they encounter “numerical values” outside the range they were trained on. Range of Numerical Data refers to the range of the datapoints (all data image , voice , table or language is numerical data for computer) on which a network is trained. In simple language a neural network trained to classify cats from dogs will not be good at classifying a cow. The reason being the network never saw the image of an cow.

This is a drawback of Neural Network that they cannot generalise to data outside the training range of the numerical data that they have encountered.In other words the Neural network cannot extrapolate.

This failure to extrapolate shows that the learnt behaviour of the network is a memorization work instead of general abstraction. Author Andrew Trask and others in their paper Neural Arithmetic Logic Unit have put forward a new architecture that encourage systemic numerical extrapolation.In this architecture , the author proposes addition of a linear activation using simple arithmetic operators like “addition” or “multiplication” etc controlled by learnt gates.In their experiment the authors found that the network got substantially better at generalization both inside (interpolation) and outside (extrapolation) of the range of the numerical values the network was trained on.

As is evident from the image on the left the most non linear functions learn values they are trained on. The error measured by Mean Squared Error (MSE) ramps up as we go outside the training range.

In their training set up the authors used an auto encoder to take a scalar value as input say for example digit 3 , encode the values within its hidden layers , than reconstruct the input values as a linear combination of the last hidden layer to get back the digit 3.

**“Autoencoder** is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an **autoencoder** is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction”. Wikipedia.

In their experiment the authors of the paper the authors trained a autoencoder on a number range from -5 to +5 and then tried to reconstruct the numbers from -20 to +20 , these numbers are outside the range of the training data. Most non-linear functions fail to represent number outside the range in which they have seen in their training*. “The severity of the failure is directly proportional to the degree of non-linearity in the activation function.”*