What is Neural Network

A short description about building a simple neural network model for sine function.

Intro

Briefly introduced in the previous post, a neural network is made up of thousands or millions (or more) of perceptrons. A whole model of neural network can be thought as a function approximator that even though we don't know a true function, we can find a function that approximates the same or really close.

We cannot exactly see what work a neural network does or how it does, behind the scene, but with enough training it outputs a prediction(s). Because of this, sometimes we call neural network (or hidden layers between input and output) a black box.

A NN model starts as blank and as it iterates over training data, it updates and finds a better model.

Every time it outputs prediction values, it compares with a true y values and computes loss. With this and backpropagation technique, a model updates weights and biases in hidden layers to reduce next prediction's loss.

Backpropagation is a way of computing how much each hidden layer contributes to predicted values. For example we have a model with two hidden layers and first layer contributes 40% work to the output while the second does 60%. Then when we update weights of two layers, we update them with the same ratio. We don't want to punish a worker of two with equal amount when one of them was the main cause of a problem.

One good thing about neural network is, given substantial amount of data, we can find an approximate function that almost (or all) works as a true function. However this can also be one con of neural network that it needs large data. If the amount is small, it works poorly and using supervised or unsupervised algorithms will be much better instead.

A neural network has many different forms such as convolutional neural network, recurrent neural network, LSTM network, GANs and so many more. Here we will explore how to construct a basic neural network to find an approximate function for f(x)=sin(x)f(x) = sin(x) .

Code

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

To make a model, we need to make a model function to pass into a tensorflow estimator.

def sin_model(features, labels, mode):

    input_layer = features['x']

    dense1 = tf.layers.dense(input_layer, units=50, activation=tf.nn.relu)

    dense2 = tf.layers.dense(dense1, units=100, activation=tf.nn.relu)

    predictions = tf.layers.dense(dense2, units=1)

    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)


    loss = tf.losses.mean_squared_error(labels, predictions)

    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
        train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())

        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    eval_metric_ops = {'Mean Squared Error':tf.metrics.mean_squared_error(labels=labels, predictions=predictions)}

    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

The arguments in the function should be named exactly as the above or else it will throw an exception. For example if the name of 'features' is instead 'feature', it will throw model_fn () must include features argument exception.

In this model, we are only using two hidden layers of size 50 and 100 with biases and relu functions for activations.

You can modify these layers and hyper-parameters to make a better function or to find a different approximation for different function.

x = np.linspace(-10, 10, 100).reshape(-1, 1)
y = np.sin(x)

For the x values, I reshaped to (-1, 1) to make it in 2 dimensions as that is the least number of dimensions tensorflow expects. The first column value will be number of total samples and the second is to hold each of its values. Same goes for the y's shape as well.

We can save our model's checkpoints in the 'model' sub-directory.

model = tf.estimator.Estimator(model_fn=sin_model, model_dir='./model/')

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'x':x},
    y=y,
    batch_size=25,
    num_epochs=None,
    shuffle=True
)

model.train(input_fn=train_input_fn, steps=30000)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './model/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001C4DAF08E10>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./model/model.ckpt.
INFO:tensorflow:loss = 0.0010722036, step = 29501 (0.134 sec)
INFO:tensorflow:global_step/sec: 833.096
INFO:tensorflow:loss = 0.0013768066, step = 29601 (0.120 sec)
INFO:tensorflow:global_step/sec: 676.952
INFO:tensorflow:loss = 0.0023964667, step = 29701 (0.148 sec)
INFO:tensorflow:global_step/sec: 756.899
INFO:tensorflow:loss = 0.0010435522, step = 29801 (0.132 sec)
INFO:tensorflow:global_step/sec: 856.409
INFO:tensorflow:loss = 0.0026557748, step = 29901 (0.117 sec)
INFO:tensorflow:Saving checkpoints for 30000 into ./model/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0019341598.

<tensorflow.python.estimator.estimator.Estimator at 0x1c4daf08a90>

Now that the model is trained, let's check how much error(Mean Squared Error) we make with the same x and y values.

eval_fn = tf.estimator.inputs.numpy_input_fn(
    x={'x':x},
    y=y,
    shuffle=False,
    num_epochs=1
)

model.evaluate(eval_fn)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-12-28-03:31:33
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./model/model.ckpt-30000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-12-28-03:31:33
INFO:tensorflow:Saving dict for global step 30000: Mean Squared Error = 0.0044636456, global_step = 30000, loss = 0.0044636456
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 30000: ./model/model.ckpt-30000

Our MSE value is 0.0015171621. This value can be different depending on parameters.

Now to predict with the same x values, do the following.

pred_fn = tf.estimator.inputs.numpy_input_fn(
    x={'x':x},
    y=None,
    shuffle=False,
    num_epochs=1
)
pred = list(model.predict(pred_fn))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from ./model/model.ckpt-30000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

First five of prediction values.

pred[:5]
[array([0.37037055]),
 array([0.2057652]),
 array([0.04115985]),
 array([-0.1234455]),
 array([-0.28805085])]

These are the graph of predicted values after 10000, 20000 and 30000 iterations.

Though I stopped training after 30000 iterations, you can do more with different parameters to find an exact sine function.

Usually, if a given function is not so complex such as image classification or else, a model with two hidden layers (w or w/o bias) and relu functions is enough to find an approximator.

When we work with a neural network, we encounter the terms steps, batch size and epoch(s).

One epoch means one iteration over the whole training samples. A step is iterating once over one batch samples. For example if we have 1000 samples and batch size is 100, for 5 epochs we need 50 steps. It could be thought as

#Steps=TotalSamplesBatchSizeEpoch(s)\# Steps = \frac{Total Samples}{Batch Size} * Epoch(s)

Above we used batch size of 50 and steps of 10000 while epoch was None. When None value is passed as epoch, it will iterate as many times as it needs until reaching 10000 steps.

Though some concepts were not explained in this post such as backpropagation, activation functions, computation graph and others, it will be covered in later posts.

Thank you all for reading this and let me know if there is a typo(s) or error(s).

Last updated