{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Neural network using Keras" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do a neural network with Keras, which provides a general framework, and interfaces with a backend (TensorFlow) that allows efficient calculation, and use of a GPU." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

MNIST digit recognition with Keras

\n", "

\n", "Start by getting the MNIST data again through the Keras interface.\n", "Here we explicitly reshape the input data and normalize it, and encode the training labels. Keras also has an interface to the MNIST data: it returns the images and integers giving the true values. We can use the keras to_categorical function to convert the true values into \"one-hot\" vectors.\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#import data\n", "from tensorflow.keras.datasets import mnist\n", "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", "\n", "# shape input data into 1D and scale\n", "shape=train_images.shape\n", "print('training data shape: ',shape)\n", "train_images = train_images.reshape((shape[0],shape[1]*shape[2]))\n", "train_images = train_images.astype('float32') / 255\n", "\n", "# shape test data input 1D and scale\n", "shape=test_images.shape\n", "test_images = test_images.reshape((shape[0],shape[1]*shape[2]))\n", "test_images = test_images.astype('float32') / 255\n", "\n", "# convert labels into vectors\n", "from tensorflow.keras.utils import to_categorical\n", "train_labels= to_categorical(train_labels)\n", "print('First label: ', train_labels[0])\n", "test_labels = to_categorical(test_labels)\n", "\n", "# number of input nodes\n", "ninput = shape[1]*shape[2]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will define the Keras model and the parameters we will use to train it. We will start by instantiating a model using models.Sequential(), then add layers using the add() method. This takes as a first argument the number of neurons. You can specify the activation function with the activation() keyword. \n", "
\n", "For the first layer, you have to specify the input shape, but for all subsequent layers, it will inherit the input shape from the previous layer.\n", "
\n", "After adding the layer, use the compile() method to specify an optimizer with the optimizer() keyword, a loss function with the loss=() keyword, and, optionally, an accuracy metric.\n", "
\n", "Note that Keras supports a layers.Normalization layer that you can use as the first layer to normalize the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras import models\n", "from tensorflow.keras import layers\n", "from tensorflow.keras import optimizers" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#instantiate a model\n", "net = models.Sequential()\n", "\n", "# add layers\n", "net.add(layers.Normalization(input_shape=(ninput,)))\n", "net.add(layers.Dense(30,activation='sigmoid'))\n", "\n", "# how many neurons does the last layer need to have?\n", "net.add(layers.Dense( ... , activation='sigmoid')) # fill in number of neurons for last layer\n", "\n", "# print out a summary of the architecture\n", "net.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You now specify the optimizer and loss function using the compile() method. You can find information about optimers and loss functions . Here are a couple of examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net.compile(optimizer=optimizers.SGD(learning_rate=30),loss='categorical_crossentropy',metrics=['accuracy'])\n", "net.compile(optimizer=optimizers.Adam(),loss='categorical_crossentropy',metrics=['accuracy'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we train the model with the input data using the fit() method, specifying number of epochs with the epochs= keyword and batch size with batch_size= keyword. We can also include validation data with the validation_data= keyword, which takes a tuple of (validatation_input,validation_labels)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "history = net.fit(train_images,train_labels,epochs=10,batch_size=128, \n", " validation_data=(test_images,test_labels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now inspect the learning curves. The fit() method returns a dictionary that has keys 'loss' and 'val_loss', among others, that are arrays with length of the number of training steps, so you can plot loss against training steps for both training data and validation data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('history keys: ', history.history.keys())\n", "loss=history.history['loss']\n", "acc=history.history['accuracy']\n", "val_loss=history.history['val_loss']\n", "val_acc=history.history['val_accuracy']\n", "epochs=range(len(acc))\n", "\n", "# plot loss function for training data and loss function for validation data vs training epoch\n", "plt.plot(..., ...,label='loss')\n", "plt.plot(..., ...,label='validation loss')\n", "plt.legend()\n", "plt.show()\n", "\n", "# plot accuracy for training data and accuracy for validation data vs training epoch\n", "plt.plot(..., ...,label='accuracy')\n", "plt.plot(..., ...,label='validation accuracy')\n", "plt.legend()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adjust number of steps until you are no longer improving things, or it is taking too long. How many steps and what accuracy do you achieve?\n", "
ANSWER HERE: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Experiment with using different optimizers ( e.g., stochastic gradient descent using optimmizers.SGD() with learning rate specified with the learning_rate= keyword different loss functions, different activation functions, and/or different network architecture. What differences do you see?\n", "
ANSWER HERE: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we wanted to evaluate the model on a new set of data, we can use the evaluate method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net.evaluate(test_images,test_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, if you wanted to get values for some data set with unknown labels, you would use the predict() method:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out=net.predict(test_images)\n", "print(out)\n", "print(out.argmax(axis=1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect some of the test images and see how well the network does." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Doing a fit using a neural network

\n", "

\n", "OK, now let's try a test case where we want out network to determine the relation between some input values and an output value, i.e. a fit.\n", "

\n", "Start by defining some arbitrary function and creating some input and output data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def func(x,y,z) :\n", " return # make up some function here\n", "\n", "# input data\n", "x=np.random.uniform(size=10000)\n", "y=np.random.uniform(size=10000)\n", "z=np.random.uniform(size=10000)\n", "\n", "# create labels\n", "vals=func(x,y,z)\n", "\n", "# split into training and validation sets\n", "train_input = np.vstack([x,y,z]).T[0:9000]\n", "train_labels = vals[0:9000]\n", "\n", "validation_input = np.vstack([x,y,z]).T[9000:]\n", "validation_labels = vals[9000:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create Keras model as above, but this time, the final layer should just use a linear activation function. For my function, I found that a relu activation function for the hidden layer performed much better than a sigmoid activation function, but you can and should experiment!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net = models.Sequential()\n", "net.add(layers.Normalization(input_shape=(train_input.shape[1],)))\n", "net.add(layers.Dense(30,activation='relu'))\n", "\n", "# what is output layer for a fit that returns a single value\n", "net.add(layers.Dense( ... , activation='linear'))\n", "sgd=optimizers.SGD(learning_rate=3)\n", "net.compile(optimizer='ADAM',loss='mse',metrics=['mse'])\n", "net.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the model with the fit() method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "history = net.fit(train_input,train_labels,epochs=100,batch_size=128, \n", " validation_data=(validation_input,validation_labels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the learning curves. The fit() method returns a dictionary that has keys 'loss' and 'val_loss', among others, that are arrays with length of the number of training steps, so you can plot loss against training steps for both training data and validation data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(history.history.keys())\n", "loss=history.history['loss']\n", "acc=history.history['mse']\n", "val_loss=history.history['val_loss']\n", "val_acc=history.history['val_mse']\n", "epochs=range(len(acc))\n", "plt.plot(epochs,loss,label='loss')\n", "plt.plot(epochs,val_loss,label='validation loss')\n", "plt.legend()\n", "plt.show()\n", "plt.plot(epochs,acc,label='mse')\n", "plt.plot(epochs,val_acc,label='validation mse')\n", "plt.legend()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate and plot results of validation set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net.evaluate(validation_input,validation_labels)\n", "out=net.predict(validation_input)\n", "print(validation_labels.shape,out.shape)\n", "plt.scatter(validation_labels,out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How well did you do?\n", "
ANSWER HERE: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Astronomy example : a stellar evolution emulator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, here's a final real astronomy application. Let's say we want to create a stellar evolution emulator, that will return effective temperature and luminosity given a stellar mass, age, and composition. We will train the emulator using somem isochrone data, which I have saved into a file zall.dat \n", "

\n", "( Note : I couldn't get good results from this experiment, can you? )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from astropy.io import ascii\n", "data=ascii.read('zall.dat')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot log(Te) vs log(L), color coded by metallicity" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(data['logTe'],data['logL/Lo'],c=data['Z'],s=2)\n", "plt.xlim(4.5,3.0)\n", "plt.ylim(-5,5)\n", "plt.xlabel('log Te')\n", "plt.ylabel('log L/Lsun')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, let's shuffle the data to create training, validation, and test data sets" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mass=data['M_ini']\n", "age=data['log(age/yr)']\n", "Z=data['Z']\n", "teff=data['logTe']\n", "lum=data['logL/Lo']\n", "\n", "input = np.vstack([mass,age,Z]).T\n", "output = np.vstack([teff,lum]).T\n", "\n", "inds=np.arange(len(data))\n", "np.random.shuffle(inds)\n", "\n", "# split data set 80/10/10 into training, validation, and test data sets\n", "ntrain= int(0.8*len(inds))\n", "nvalidation = int(0.1*len(inds))\n", "train_input = input[inds[0:ntrain]]\n", "train_labels = output[inds[0:ntrain]]\n", "validation_input = input[inds[ntrain:ntrain+nvalidation]]\n", "validation_labels = output[inds[ntrain:ntrain+nvalidation]]\n", "test_input = input[inds[ntrain+nvalidation:]]\n", "test_labels = output[inds[ntrain+nvalidation:]]\n", "\n", "print(train_input.shape,train_labels.shape)\n", "print(validation_input.shape,validation_labels.shape)\n", "print(test_input.shape,test_labels.shape)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set up and compile the Keras model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net = models.Sequential()\n", "\n", "# add desired layers here\n", "net.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compile, giving optimizer and loss function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net.compile(....)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "history = net.fit(...)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the learning curves." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(history.history.keys())\n", "loss=history.history['loss']\n", "acc=history.history['mse']\n", "val_loss=history.history['val_loss']\n", "val_acc=history.history['val_mse']\n", "epochs=range(len(acc))\n", "\n", "# plots\n", "plt.plot(epochs,loss,label='loss')\n", "plt.plot(epochs,val_loss,label='validation loss')\n", "plt.legend()\n", "plt.show()\n", "plt.plot(epochs,acc,label='mse')\n", "plt.plot(epochs,val_acc,label='validation mse')\n", "plt.ylim(0.,0.2)\n", "plt.legend()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate the model using the test data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net.evaluate(test_input,test_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare the results with the test data labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out=net.predict(train_input)\n", "\n", "#plot training data\n", "plt.scatter(train_labels[:,0],train_labels[:,1],c=train_input[:,2],s=2)\n", "plt.xlim(4.5,3.0)\n", "plt.ylim(-5,5)\n", "plt.xlabel('log Te')\n", "plt.ylabel('log L/Lsun')\n", "\n", "#plot neural net results\n", "plt.figure()\n", "plt.scatter(out[:,0],out[:,1],c=train_input[:,2],s=2)\n", "plt.xlim(4.5,3.0)\n", "plt.ylim(-5,5)\n", "plt.xlabel('log Te')\n", "plt.ylabel('log L/Lsun')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How well did you do?\n", "
ANSWER HERE: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }