Dimensionality reduction : Principal component analysis"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The text.latex.preview rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The mathtext.fallback_to_cm rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: Support for setting the 'mathtext.fallback_to_cm' rcParam is deprecated since 3.3 and will be removed two minor releases later; use 'mathtext.fallback : 'cm' instead.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The validate_bool_maybe_none function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The savefig.jpeg_quality rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The keymap.all_axes rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The animation.avconv_path rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n",
"In /Users/holtz/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: \n",
"The animation.avconv_args rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.\n"
]
}
],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(100, 10)\n",
"[[0.28203457 0.17743954 0.75061475 0.80683474 0.99050514 0.41261768\n",
" 0.37201809 0.77641296 0.34080354 0.93075733]\n",
" [0.85841275 0.42899403 0.75087107 0.75454287 0.10312387 0.90255291\n",
" 0.50525237 0.82645747 0.3200496 0.89552323]\n",
" [0.38920168 0.01083765 0.90538198 0.09128668 0.31931364 0.95006197\n",
" 0.95060715 0.57343789 0.63183721 0.44844552]]\n"
]
}
],
"source": [
"np.random.seed(42)\n",
"X = np.random.normal(size=(100,3))\n",
"R = np.random.random([3,10])\n",
"X = np.dot(X,R)\n",
"print(X.shape)\n",
"np.savetxt('ndim.txt',X)\n",
"print(R)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start by loading a multi-dimensional data set, from the file ndim.txt using numpy.loadtxt() "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(100, 10)\n"
]
}
],
"source": [
"X=np.loadtxt('ndim.txt')\n",
"print(X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dataset has 100 data points, which is the first index. The second index gives the number of characteristics per \n",
"data point, i.e. the dimensionality of the data. How many dimensions does this data have?\n",
" ANSWER HERE: "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Investigate this data set by making some plots. Do you think that it is a good candidate for dimensionality reduction? Why or why not? \n",
" ANSWER HERE: \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"