\n", "The code writes a feedforward neural network from scratch in raw Python in just a few dozen lines of code! Following the online book, we will use the code to try to classify handwritten digits as taken from the MNIST data set (a very common test data set)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start by loading the MNIST data using mnist_loader.py . This will load three subsets of the data: the training set, the validation set, and the test set. You will need to grab the file mnist.pkl.gz first." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import mnist_loader\n", "import pickle\n", "training_data, validation_data, test_data = mnist_loader.load_data_wrapper()\n", "print('length of training data: ',len(training_data))\n", "print('length of validation data: ',len(validation_data))\n", "print('length of test data: ',len(test_data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each data set is returned as a list of tuples, with the first element of the tuple being the image data, and the second being the truth value. For the training data, the truth is encoded in a 10-element vector, with all entries 0 except for the entry for the correct digit, which is set to 1. For the validataion and test set, the truth value is just an integer with the true value>\n", "

\n", "For use with the neural network, the input image array, which is a 28x28 image, is unwrapped into a single vector of length 784; the simple neural network just treats the image as a series of unconnected intensity value (we can do better than that later!). To display the images, you can use numpy.reshape to reshape the images to 28x28 \n", "

\n", "Looking at code below, make sure you understand how the training and test data are stored.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "print('contents of one training object',training_data[0][0].shape,training_data[0][1].shape)\n", "print(training_data[0][1])\n", "print('contents of one test object',test_data[0][0].shape,type(test_data[0][1]))\n", "print(test_data[0][1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at somem of the training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig,ax=plt.subplots(10,10,figsize=(12,12))\n", "plt.subplots_adjust(wspace=0.001,hspace=0.001)\n", "for i in range(100) :\n", " ax[i//10,i%10].imshow(training_data[i][0].reshape(28,28))\n", " ax[i//10,i%10].xaxis.set_visible(False)\n", " ax[i//10,i%10].yaxis.set_visible(False)\n", " ax[i//10,i%10].axis('equal')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, now we are going to start to work through the neural network code. This is implemented as a Class, i.e., object-oriented programming. It turns out you can't define all of the different methods of the Class in separate cells of the Jupyter notebook, but we'll work through each method at a time, then put them all together in a single cell below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The initial declaration of the class include a __init__ method, which defines the keywords with which you will instantiate an object. Here the required input is a list of integers, giving the number of neurons in each layer of the network. The first layer is the input layer, and the final layer is the output layer. Given the keyword input, the __init__ method with set the attributes num_layers, sizes, biases, and weights. Since the first layer is the input, it doesn't have weights and biases. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "network.py\n", "~~~~~~~~~~\n", "\n", "A module to implement the stochastic gradient descent learning\n", "algorithm for a feedforward neural network. Gradients are calculated\n", "using backpropagation. Note that I have focused on making the code\n", "simple, easily readable, and easily modifiable. It is not optimized,\n", "and omits many desirable features.\n", "\"\"\"\n", "\n", "#### Libraries\n", "# Standard library\n", "import random\n", "\n", "# Third-party libraries\n", "import numpy as np\n", "\n", "class Network(object):\n", "\n", " def __init__(self, sizes):\n", " \"\"\"The list ``sizes`` contains the number of neurons in the\n", " respective layers of the network. For example, if the list\n", " was [2, 3, 1] then it would be a three-layer network, with the\n", " first layer containing 2 neurons, the second layer 3 neurons,\n", " and the third layer 1 neuron. The biases and weights for the\n", " network are initialized randomly, using a Gaussian\n", " distribution with mean 0, and variance 1. Note that the first\n", " layer is assumed to be an input layer, and by convention we\n", " won't set any biases for those neurons, since biases are only\n", " ever used in computing the outputs from later layers.\"\"\"\n", " self.num_layers = len(sizes)\n", " self.sizes = sizes\n", " self.biases = [np.random.randn(y, 1) for y in sizes[1:]]\n", " self.weights = [np.random.randn(y, x)\n", " for x, y in zip(sizes[:-1], sizes[1:])]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although we haven't finished (hard started) the class, we can use what we have so far to instantiate a network object and look at the attributes. Try to instantiate an object with some list of number of neurons in each layer. Then see what the values of num_layers and sizes are. Predict what the dimensions of biases and weights will be, then check to see if y" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net=Network([...]) # choose some desired network architecture with at least 3 layers\n", "print(net.num_layers)\n", "print(net.sizes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The __init__ method will populate the initial bias and weight arrays with random numbers. Remember, each layer takes inputs from the number of neurons in the previous layer for each neuron in the current layer. There is also a bias value for each neuron in the current layer.\n", "

\n",
"Given your input architecture, what do you think the dimensions of these arrays will be?\n",
" ** ANSWER HERE: **"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check to see if you were right. If not, make sure you understand why not. You should also understand why I'm starting with layer 2."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"i=2\n",
"for s, b, w, in zip(net.sizes[1:], net.biases, net.weights) :\n",
" print('layer {:d}, size: {:d} '.format(i,s))\n",
" print('bias shape: ', b.shape)\n",
" print('weight shape: ',w.shape)\n",
" i+=1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How many total parameters are there in your model?\n",
"** ANSWER HERE: **"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, now we are going to define the method that runs a single input object through all of the layers of the network to produce an output. Remember each layer of the network performs a matrix operation on the previous layer, multiplying it by the weights of the layer and adding the biases of the layer. It then runs this value through the activation function: we are going to use the sigmoid function, which we will code up below. \n",
"

\n", "For each layer, $i$, you want to calculate:\n", "$$out = \\sigma( weight_i \\cdot out_{i-1} + bias_i)$$\n", "where $\\sigma$ represents the sigmoid function. Can you code this up by just adding one line to the function below, assuming the sigmoid function is called sigmoid()?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " def feedforward(self, a):\n", " \"\"\"Return the output of the network if ``a`` is input.\"\"\"\n", " for b, w in zip(self.biases, self.weights):\n", " a = # implement the output for each layer here !\n", " return a\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have the feedforward function, then you can make a prediction for the output given an input! But, of course, to get a good result, we need to train the network to replace the random numbers for the weights with numbers that are trained to work on the training set.\n", "

\n", " We will do this using the stochastic gradient descent (SGD) method, discussed (briefly) in the lecture video. This will calculate the change in the loss function for a change in each parameter for each object, but will only use a random subset of the data for each learning step, with a subset size given by the input parameter mini_batch_size. The routine will also require input of a number of learning steps to take (epochs), and the learning rate (eta). Of course, you will have to provide the routing the training data! Optionally, it will take a test_data keyword that gives the test_data; if supplied, this will evaluate the test_data at each learning step and report the fraction of successful classifications in that step.\n", "

\n", "The following cell implements the SGD, along with two routines that it requires: a backpropagation routine and a update_mini_batch() routine that will update the weights and biases at each step.\n", "

\n", " You don't have to program anything here, but try to look through the code, and at least, recognize that there aren't that many lines: the algorithm is not very complicated! But these routines are the heart of the learning of the network.\n", "

\n", " Make sure you understand the parameters that you need to supply the routine." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " def SGD(self, training_data, epochs, mini_batch_size, eta,\n", " test_data=None):\n", " \"\"\"Train the neural network using mini-batch stochastic\n", " gradient descent. The ``training_data`` is a list of tuples\n", " ``(x, y)`` representing the training inputs and the desired\n", " outputs. The other non-optional parameters are\n", " self-explanatory. If ``test_data`` is provided then the\n", " network will be evaluated against the test data after each\n", " epoch, and partial progress printed out. This is useful for\n", " tracking progress, but slows things down substantially.\"\"\"\n", " if test_data: n_test = len(test_data)\n", " n = len(training_data)\n", " for j in range(epochs):\n", " random.shuffle(training_data)\n", " mini_batches = [\n", " training_data[k:k+mini_batch_size]\n", " for k in range(0, n, mini_batch_size)]\n", " for mini_batch in mini_batches:\n", " self.update_mini_batch(mini_batch, eta)\n", " if test_data:\n", " print( \"Epoch {0}: {1} / {2}\".format(\n", " j, self.evaluate(test_data), n_test))\n", " else:\n", " print(\"Epoch {0} complete\".format(j))\n", " \n", " def update_mini_batch(self, mini_batch, eta):\n", " \"\"\"Update the network's weights and biases by applying\n", " gradient descent using backpropagation to a single mini batch.\n", " The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``\n", " is the learning rate.\"\"\"\n", " nabla_b = [np.zeros(b.shape) for b in self.biases]\n", " nabla_w = [np.zeros(w.shape) for w in self.weights]\n", " for x, y in mini_batch:\n", " delta_nabla_b, delta_nabla_w = self.backprop(x, y)\n", " nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]\n", " nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]\n", " self.weights = [w-(eta/len(mini_batch))*nw\n", " for w, nw in zip(self.weights, nabla_w)]\n", " self.biases = [b-(eta/len(mini_batch))*nb\n", " for b, nb in zip(self.biases, nabla_b)]\n", " \n", " def backprop(self, x, y):\n", " \"\"\"Return a tuple ``(nabla_b, nabla_w)`` representing the\n", " gradient for the cost function C_x. ``nabla_b`` and\n", " ``nabla_w`` are layer-by-layer lists of numpy arrays, similar\n", " to ``self.biases`` and ``self.weights``.\"\"\"\n", " nabla_b = [np.zeros(b.shape) for b in self.biases]\n", " nabla_w = [np.zeros(w.shape) for w in self.weights]\n", " # feedforward\n", " activation = x\n", " activations = [x] # list to store all the activations, layer by layer\n", " zs = [] # list to store all the z vectors, layer by layer\n", " for b, w in zip(self.biases, self.weights):\n", " z = np.dot(w, activation)+b\n", " zs.append(z)\n", " activation = sigmoid(z)\n", " activations.append(activation)\n", " # backward pass\n", " delta = self.cost_derivative(activations[-1], y) * \\\n", " sigmoid_prime(zs[-1])\n", " nabla_b[-1] = delta\n", " nabla_w[-1] = np.dot(delta, activations[-2].transpose())\n", " # Note that the variable l in the loop below is used a little\n", " # differently to the notation in Chapter 2 of the book. Here,\n", " # l = 1 means the last layer of neurons, l = 2 is the\n", " # second-last layer, and so on. It's a renumbering of the\n", " # scheme in the book, used here to take advantage of the fact\n", " # that Python can use negative indices in lists.\n", " for l in range(2, self.num_layers):\n", " z = zs[-l]\n", " sp = sigmoid_prime(z)\n", " delta = np.dot(self.weights[-l+1].transpose(), delta) * sp\n", " nabla_b[-l] = delta\n", " nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())\n", " return (nabla_b, nabla_w)\n", " \n", " def cost_derivative(self, output_activations, y):\n", " \"\"\"Return the vector of partial derivatives \\partial C_x /\n", " \\partial a for the output activations.\"\"\"\n", " return (output_activations-y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we have an evaluate() routine which accepts test_data and returns the number of successful classifications." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ " def evaluate(self, test_data):\n", " \"\"\"Return the number of test inputs for which the neural\n", " network outputs the correct result. Note that the neural\n", " network's output is assumed to be the index of whichever\n", " neuron in the final layer has the highest activation.\"\"\"\n", " test_results = [(np.argmax(self.feedforward(x)), y)\n", " for (x, y) in test_data]\n", " return sum(int(x == y) for (x, y) in test_results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we put it all together, we need to supply the external sigmoid() routine anbd a routine to return the derivative of the sigmoid. Remember the sigmoid function is defined as:\n", "$$\\sigma(z) = {1\\over 1.+\\exp(-z)}$$\n", "Supply the code to return this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#### Miscellaneous functions\n", "def sigmoid(z):\n", " \"\"\"The sigmoid function.\"\"\"\n", " return #enter equation for sigmoid here\n", "\n", "def sigmoid_prime(z):\n", " \"\"\"Derivative of the sigmoid function.\"\"\"\n", " return sigmoid(z)*(1-sigmoid(z))\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, I've taken all of the class functions from above and put them into a single cell below so that we can instantiate and object and have access to all of the attributes and methods." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "network.py\n", "~~~~~~~~~~\n", "\n", "A module to implement the stochastic gradient descent learning\n", "algorithm for a feedforward neural network. Gradients are calculated\n", "using backpropagation. Note that I have focused on making the code\n", "simple, easily readable, and easily modifiable. It is not optimized,\n", "and omits many desirable features.\n", "\"\"\"\n", "\n", "#### Libraries\n", "# Standard library\n", "import random\n", "\n", "# Third-party libraries\n", "import numpy as np\n", "\n", "class Network(object):\n", "\n", " def __init__(self, sizes):\n", " \"\"\"The list ``sizes`` contains the number of neurons in the\n", " respective layers of the network. For example, if the list\n", " was [2, 3, 1] then it would be a three-layer network, with the\n", " first layer containing 2 neurons, the second layer 3 neurons,\n", " and the third layer 1 neuron. The biases and weights for the\n", " network are initialized randomly, using a Gaussian\n", " distribution with mean 0, and variance 1. Note that the first\n", " layer is assumed to be an input layer, and by convention we\n", " won't set any biases for those neurons, since biases are only\n", " ever used in computing the outputs from later layers.\"\"\"\n", " self.num_layers = len(sizes)\n", " self.sizes = sizes\n", " self.biases = [np.random.randn(y, 1) for y in sizes[1:]]\n", " self.weights = [np.random.randn(y, x)\n", " for x, y in zip(sizes[:-1], sizes[1:])]\n", "\n", " def feedforward(self, a):\n", " \"\"\"Return the output of the network if ``a`` is input.\"\"\"\n", " for b, w in zip(self.biases, self.weights):\n", " a = sigmoid(np.dot(w, a)+b)\n", " return a\n", "\n", " def SGD(self, training_data, epochs, mini_batch_size, eta,\n", " test_data=None):\n", " \"\"\"Train the neural network using mini-batch stochastic\n", " gradient descent. The ``training_data`` is a list of tuples\n", " ``(x, y)`` representing the training inputs and the desired\n", " outputs. The other non-optional parameters are\n", " self-explanatory. If ``test_data`` is provided then the\n", " network will be evaluated against the test data after each\n", " epoch, and partial progress printed out. This is useful for\n", " tracking progress, but slows things down substantially.\"\"\"\n", " if test_data: n_test = len(test_data)\n", " n = len(training_data)\n", " for j in range(epochs):\n", " random.shuffle(training_data)\n", " mini_batches = [\n", " training_data[k:k+mini_batch_size]\n", " for k in range(0, n, mini_batch_size)]\n", " for mini_batch in mini_batches:\n", " self.update_mini_batch(mini_batch, eta)\n", " if test_data:\n", " print( \"Epoch {0}: {1} / {2}\".format(\n", " j, self.evaluate(test_data), n_test))\n", " else:\n", " print(\"Epoch {0} complete\".format(j)) \n", "\n", " def update_mini_batch(self, mini_batch, eta):\n", " \"\"\"Update the network's weights and biases by applying\n", " gradient descent using backpropagation to a single mini batch.\n", " The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``\n", " is the learning rate.\"\"\"\n", " nabla_b = [np.zeros(b.shape) for b in self.biases]\n", " nabla_w = [np.zeros(w.shape) for w in self.weights]\n", " for x, y in mini_batch:\n", " delta_nabla_b, delta_nabla_w = self.backprop(x, y)\n", " nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]\n", " nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]\n", " self.weights = [w-(eta/len(mini_batch))*nw\n", " for w, nw in zip(self.weights, nabla_w)]\n", " self.biases = [b-(eta/len(mini_batch))*nb\n", " for b, nb in zip(self.biases, nabla_b)]\n", "\n", " def backprop(self, x, y):\n", " \"\"\"Return a tuple ``(nabla_b, nabla_w)`` representing the\n", " gradient for the cost function C_x. ``nabla_b`` and\n", " ``nabla_w`` are layer-by-layer lists of numpy arrays, similar\n", " to ``self.biases`` and ``self.weights``.\"\"\"\n", " nabla_b = [np.zeros(b.shape) for b in self.biases]\n", " nabla_w = [np.zeros(w.shape) for w in self.weights]\n", " # feedforward\n", " activation = x\n", " activations = [x] # list to store all the activations, layer by layer\n", " zs = [] # list to store all the z vectors, layer by layer\n", " for b, w in zip(self.biases, self.weights):\n", " z = np.dot(w, activation)+b\n", " zs.append(z)\n", " activation = sigmoid(z)\n", " activations.append(activation)\n", " # backward pass\n", " delta = self.cost_derivative(activations[-1], y) * \\\n", " sigmoid_prime(zs[-1])\n", " nabla_b[-1] = delta\n", " nabla_w[-1] = np.dot(delta, activations[-2].transpose())\n", " # Note that the variable l in the loop below is used a little\n", " # differently to the notation in Chapter 2 of the book. Here,\n", " # l = 1 means the last layer of neurons, l = 2 is the\n", " # second-last layer, and so on. It's a renumbering of the\n", " # scheme in the book, used here to take advantage of the fact\n", " # that Python can use negative indices in lists.\n", " for l in range(2, self.num_layers):\n", " z = zs[-l]\n", " sp = sigmoid_prime(z)\n", " delta = np.dot(self.weights[-l+1].transpose(), delta) * sp\n", " nabla_b[-l] = delta\n", " nabla_w[-l] = np.dot(delta, activations[-l-1].transpose())\n", " return (nabla_b, nabla_w)\n", "\n", " def evaluate(self, test_data):\n", " \"\"\"Return the number of test inputs for which the neural\n", " network outputs the correct result. Note that the neural\n", " network's output is assumed to be the index of whichever\n", " neuron in the final layer has the highest activation.\"\"\"\n", " test_results = [(np.argmax(self.feedforward(x)), y)\n", " for (x, y) in test_data]\n", " return sum(int(x == y) for (x, y) in test_results)\n", "\n", " def cost_derivative(self, output_activations, y):\n", " \"\"\"Return the vector of partial derivatives \\partial C_x /\n", " \\partial a for the output activations.\"\"\"\n", " return (output_activations-y)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, let's try to classify some handwritten digits! First instantiate a network. Remember, the first layer is the input layer, which will be a vector of length 784 (28x28). Let's define a simple 3 layer network with this input layer, a hidden layer with 30 neurons, and an output layer of 10 neurons (since we want the output to give a maximum value for the true digit, just like we supplied the truth for the training data.\n", "

\n", "Instantiate a network object with this architecture. What will the values of the sizes and num_layers attributes be, and what will the dimensions of the weight and bias arrays be? What are the total number of parameters?\n", "

\n",
" ** ANSWER HERE: **"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"net = Network([ ... ]) # enter the values for the desired network architecture"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check the attributes:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"i=2\n",
"print('num_layers: ',net.num_layers)\n",
"print('sizes: ',net.sizes)\n",
"for s, b, w, in zip(net.sizes[1:], net.biases, net.weights) :\n",
" print('layer {:d}, size: {:d} '.format(i,s))\n",
" print('bias shape: ', b.shape)\n",
" print('weight shape: ',w.shape)\n",
" i+=1\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, let's use the network -- **before training** -- and see how well it does. The evaluate() method will return the number of successful classifications from the test data set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"successful = net.evaluate(test_data)\n",
"print('{:d} successful out of {:d} test_data'.format(successful,len(test_data)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How well would you expect to do with just random guesses at which digit?\n",
"How does the success rate compare with your expectation?\n",
"**ANSWER HERE: **"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"OK, let's train the network using the SGD() method! Call it, supplying the training data, number of steps (start with something small like 5) so it doesn't run too long, a mini_batch_size (perhaps 10), and a learning rate (start with 3.0). Supply the test data with the test_data keyword, so you can see how it does at each step"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"net.SGD( ... ) # supply the inputs for the SGD() method above"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What do you think about how well it did? \n",
"** ANSWER HERE: **"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Experiment with different learning rates. How do things change with learning rate? Remember that the weights and biases are stored in the object, so if you want to start from scratch (random biases and weights), you'll have to instantiate the object again. YOu can also play with different architectures if you want!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"net = Network([....]) # enter the values for the desired network architecture\n",
"net.SGD( ... )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Discuss.\n",
"** ANSWER HERE: **"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Are you curious about what test_data the network failed on? Let's run individual objects through the trained network using the feedforward() method, check the network answer against the true value, and display the data that the network failed on!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# loop over 100 objects\n",
"for i in range(100) :\n",
" out = net.feedforward(test_data[i][0]) # use net.feedforward() to process test_data[i][0]\n",
" best= out.argmax() # take the maximum output (From the output layer of 10) as the best estimate\n",
" if best != test_data[i][1] : # if the network estimate is NOT the true value, display the image\n",
" plt.figure()\n",
" plt.imshow(test_data[i][0].reshape(28,28))\n",
" print(best,test_data[i][1])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"are you motivated to try to make your network work better?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations, you've just trained an run a neural network, and hopefully demystified the process a little. Note that this code was designed to demonstrate that there's nothing magical, or even overly complicated about a neural network. Of course, much more efficient code can be written, and we'll look later at some canned packages for neural networks that are much faster and have more features.\n",
"

\n", "Note also that we didn't use the validation set here, or show any learning curves, or work to tune the hyperparameters other than any trial and error you might have done above ..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" } }, "nbformat": 4, "nbformat_minor": 2 }