{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/main/tutorials/W1D2_ModelFitting/instructor/W1D2_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a>   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D2_ModelFitting/instructor/W1D2_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 4: Multiple linear regression and polynomial regression\n",
    "\n",
    "**Week 1, Day 2: Model Fitting**\n",
    "\n",
    "**By Neuromatch Academy**\n",
    "\n",
    "**Content creators**: Pierre-Étienne Fiquet, Anqi Wu, Alex Hyafil with help from Byron Galbraith, Ella Batty\n",
    "\n",
    "**Content reviewers**: Lina Teichmann, Saeed Salehi, Patrick Mineault, Michael Waskom\n",
    "\n",
    "**Production editors:** Spiros Chavlis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Tutorial Objectives\n",
    "\n",
    "*Estimated timing of tutorial: 35 minutes*\n",
    "\n",
    "This is Tutorial 4 of a series on fitting models to data. We start with simple linear regression, using least squares optimization (Tutorial 1) and Maximum Likelihood Estimation (Tutorial 2). We will use bootstrapping to build confidence intervals around the inferred linear model parameters (Tutorial 3). We'll finish our exploration of regression models by generalizing to multiple linear regression and polynomial regression (Tutorial 4). We end by learning how to choose between these various models. We discuss the bias-variance trade-off (Tutorial 5) and Cross Validation for model selection (Tutorial 6).\n",
    "\n",
    "In this tutorial, we will generalize the regression model to incorporate multiple features.\n",
    "- Learn how to structure inputs for regression using the 'Design Matrix'\n",
    "- Generalize the MSE for multiple features using the ordinary least squares estimator\n",
    "- Visualize data and model fit in multiple dimensions\n",
    "- Fit polynomial regression models of different complexity\n",
    "- Plot and evaluate the polynomial regression fits\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from IPython.display import IFrame\n",
    "from ipywidgets import widgets\n",
    "out = widgets.Output()\n",
    "with out:\n",
    "    print(f\"If you want to download the slides: https://osf.io/download/2mkq4/\")\n",
    "    display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/2mkq4/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
    "display(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip3 install vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"neuromatch_cn\",\n",
    "            \"user_key\": \"y1x3mpx5\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W1D2_T4\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Imports\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure Settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure Settings\n",
    "import logging\n",
    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
    "\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Plotting Functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Plotting Functions\n",
    "\n",
    "def evaluate_fits(order_list, mse_list):\n",
    "  \"\"\" Compare the quality of multiple polynomial fits\n",
    "  by plotting their MSE values.\n",
    "\n",
    "  Args:\n",
    "    order_list (list): list of the order of polynomials to be compared\n",
    "    mse_list (list): list of the MSE values for the corresponding polynomial fit\n",
    "  \"\"\"\n",
    "  fig, ax = plt.subplots()\n",
    "  ax.bar(order_list, mse_list)\n",
    "  ax.set(title='Comparing Polynomial Fits', xlabel='Polynomial order', ylabel='MSE')\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def plot_fitted_polynomials(x, y, theta_hat):\n",
    "  \"\"\" Plot polynomials of different orders\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "    theta_hat (dict): polynomial regression weights for different orders\n",
    "  \"\"\"\n",
    "\n",
    "  x_grid = np.linspace(x.min() - .5, x.max() + .5)\n",
    "\n",
    "  plt.figure()\n",
    "\n",
    "  for order in range(0, max_order + 1):\n",
    "    X_design = make_design_matrix(x_grid, order)\n",
    "    plt.plot(x_grid, X_design @ theta_hat[order]);\n",
    "\n",
    "  plt.ylabel('y')\n",
    "  plt.xlabel('x')\n",
    "  plt.plot(x, y, 'C0.');\n",
    "  plt.legend([f'order {o}' for o in range(max_order + 1)], loc=1)\n",
    "  plt.title('polynomial fits')\n",
    "  plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 1: Multiple Linear Regression\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 8 min*\n",
    "\n",
    "This video covers linear regression with multiple inputs (more than 1D) and polynomial regression.\n",
    "\n",
    "<details>\n",
    "<summary> <font color='blue'>Click here for text recap of video </font></summary>\n",
    "\n",
    "Now that we have considered the univariate case and how to produce confidence intervals for our estimator, we turn to the general linear regression case, where we can have more than one regressor, or feature, in our input.\n",
    "\n",
    "Recall that our original univariate linear model was given as\n",
    "\n",
    "\\begin{equation}\n",
    "y = \\theta x + \\epsilon\n",
    "\\end{equation}\n",
    "\n",
    "where $\\theta$ is the slope and $\\epsilon$ some noise. We can easily extend this to the multivariate scenario by adding another parameter for each additional feature\n",
    "\n",
    "\\begin{equation}\n",
    "y = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + ... +\\theta_d x_d + \\epsilon\n",
    "\\end{equation}\n",
    "\n",
    "where $\\theta_0$ is the intercept and $d$ is the number of features (it is also the dimensionality of our input).\n",
    "\n",
    "We can condense this succinctly using vector notation for a single data point\n",
    "\n",
    "\\begin{equation}\n",
    "y_i = \\boldsymbol{\\theta}^{\\top}\\mathbf{x}_i + \\epsilon\n",
    "\\end{equation}\n",
    "\n",
    "and fully in matrix form\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{y} = \\mathbf{X}\\boldsymbol{\\theta} + \\mathbf{\\epsilon}\n",
    "\\end{equation}\n",
    "\n",
    "where $\\mathbf{y}$ is a vector of measurements, $\\mathbf{X}$ is a matrix containing the feature values (columns) for each input sample (rows), and $\\boldsymbol{\\theta}$ is our parameter vector.\n",
    "\n",
    "This matrix $\\mathbf{X}$ is often referred to as the \"[design matrix](https://en.wikipedia.org/wiki/Design_matrix)\".\n",
    "\n",
    "We want to find an optimal vector of paramters $\\boldsymbol{\\hat\\theta}$. Recall our analytic solution to minimizing MSE for a single regressor:\n",
    "\n",
    "\\begin{equation}\n",
    "\\hat\\theta = \\frac{\\sum_{i=1}^N x_i y_i}{\\sum_{i=1}^N x_i^2}.\n",
    "\\end{equation}\n",
    "\n",
    "The same holds true for the multiple regressor case, only now expressed in matrix form\n",
    "\n",
    "\\begin{equation}\n",
    "\\boldsymbol{\\hat\\theta} = (\\mathbf{X}^\\top\\mathbf{X})^{-1}\\mathbf{X}^\\top\\mathbf{y}.\n",
    "\\end{equation}\n",
    "\n",
    "This is called the [ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) (OLS) estimator.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Multiple Linear Regression and Polynomial Regression\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Multiple Linear Regression and Polynomial Regression\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'd4nfTki6Ejc'), ('Bilibili', 'BV11Z4y1u7cf')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Multiple_Linear_Regression_and_Polynomial_Regression_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "For this tutorial we will focus on the two-dimensional case ($d=2$), which allows us to easily visualize our results. As an example, think of a situation where a scientist records the spiking response of a retinal ganglion cell to patterns of light signals that vary in contrast and in orientation. Then contrast and orientation values can be used as features / regressors to predict the cells response.\n",
    "\n",
    "In this case our model can be written for a single data point as:\n",
    "\n",
    "\\begin{equation}\n",
    "y = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\epsilon\n",
    "\\end{equation}\n",
    "\n",
    "or for multiple data points in matrix form where\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{X} =\n",
    "\\begin{bmatrix}\n",
    "1 & x_{1,1} & x_{1,2} \\\\\n",
    "1 & x_{2,1} & x_{2,2} \\\\\n",
    "\\vdots & \\vdots & \\vdots \\\\\n",
    "1 & x_{n,1} & x_{n,2}\n",
    "\\end{bmatrix},\n",
    "\\boldsymbol{\\theta} =\n",
    "\\begin{bmatrix}\n",
    "\\theta_0 \\\\\n",
    "\\theta_1 \\\\\n",
    "\\theta_2 \\\\\n",
    "\\end{bmatrix}\n",
    "\\end{equation}\n",
    "\n",
    "When we refer to $x_{i, j}$, we mean that it is the i-th data point and the j-th feature of that data point.\n",
    "\n",
    "For our actual exploration dataset we shall set $\\boldsymbol{\\theta}=[0, -2, -3]$ and draw $N=40$ noisy samples from $x \\in [-2,2)$. Note that setting the value of $\\theta_0 = 0$ effectively ignores the offset term."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to simulate some data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to simulate some data\n",
    "\n",
    "# Set random seed for reproducibility\n",
    "np.random.seed(1234)\n",
    "\n",
    "# Set parameters\n",
    "theta = [0, -2, -3]\n",
    "n_samples = 40\n",
    "\n",
    "# Draw x and calculate y\n",
    "n_regressors = len(theta)\n",
    "x0 = np.ones((n_samples, 1))\n",
    "x1 = np.random.uniform(-2, 2, (n_samples, 1))\n",
    "x2 = np.random.uniform(-2, 2, (n_samples, 1))\n",
    "X = np.hstack((x0, x1, x2))\n",
    "noise = np.random.randn(n_samples)\n",
    "y = X @ theta + noise\n",
    "\n",
    "\n",
    "ax = plt.subplot(projection='3d')\n",
    "ax.plot(X[:,1], X[:,2], y, '.')\n",
    "\n",
    "ax.set(\n",
    "    xlabel='$x_1$',\n",
    "    ylabel='$x_2$',\n",
    "    zlabel='y'\n",
    ")\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercise 1: Ordinary Least Squares Estimator\n",
    "\n",
    "In this exercise you will implement the OLS approach to estimating $\\boldsymbol{\\hat\\theta}$ from the design matrix $\\mathbf{X}$ and measurement vector $\\mathbf{y}$. You can use the `@` symbol for matrix multiplication, `.T` for transpose, and `np.linalg.inv` for matrix inversion.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def ordinary_least_squares(X, y):\n",
    "  \"\"\"Ordinary least squares estimator for linear regression.\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): design matrix of shape (n_samples, n_regressors)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "\n",
    "  Returns:\n",
    "    ndarray: estimated parameter values of shape (n_regressors)\n",
    "  \"\"\"\n",
    "  ######################################################################\n",
    "  ## TODO for students: solve for the optimal parameter vector using OLS\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: solve for theta_hat vector using OLS\")\n",
    "  ######################################################################\n",
    "\n",
    "  # Compute theta_hat using OLS\n",
    "  theta_hat = ...\n",
    "\n",
    "  return theta_hat\n",
    "\n",
    "\n",
    "theta_hat = ordinary_least_squares(X, y)\n",
    "print(theta_hat)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "After filling in this function, you should see that $\\boldsymbol{\\hat\\theta} = $\n",
    "```[ 0.13861386, -2.09395731, -3.16370742]```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "def ordinary_least_squares(X, y):\n",
    "  \"\"\"Ordinary least squares estimator for linear regression.\n",
    "\n",
    "  Args:\n",
    "    X (ndarray): design matrix of shape (n_samples, n_regressors)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "\n",
    "  Returns:\n",
    "    ndarray: estimated parameter values of shape (n_regressors)\n",
    "  \"\"\"\n",
    "\n",
    "  # Compute theta_hat using OLS\n",
    "  theta_hat = np.linalg.inv(X.T @ X) @ X.T @ y\n",
    "\n",
    "  return theta_hat\n",
    "\n",
    "\n",
    "theta_hat = ordinary_least_squares(X, y)\n",
    "print(theta_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Ordinary_Least_Squares_Estimator_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now that we have our $\\boldsymbol{\\hat\\theta}$, we can obtain $\\hat{\\mathbf{y}}$ and thus our mean squared error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Compute predicted data\n",
    "theta_hat = ordinary_least_squares(X, y)\n",
    "y_hat = X @ theta_hat\n",
    "\n",
    "# Compute MSE\n",
    "print(f\"MSE = {np.mean((y - y_hat)**2):.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Finally, the following code will plot a geometric visualization of the data points (blue) and fitted plane."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to visualize data and predicted plane\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to visualize data and predicted plane\n",
    "\n",
    "theta_hat = ordinary_least_squares(X, y)\n",
    "xx, yy = np.mgrid[-2:2:50j, -2:2:50j]\n",
    "y_hat_grid = np.array([xx.flatten(), yy.flatten()]).T @ theta_hat[1:]\n",
    "y_hat_grid = y_hat_grid.reshape((50, 50))\n",
    "\n",
    "ax = plt.subplot(projection='3d')\n",
    "ax.plot(X[:, 1], X[:, 2], y, '.')\n",
    "ax.plot_surface(xx, yy, y_hat_grid, linewidth=0, alpha=0.5, color='C1',\n",
    "                cmap=plt.get_cmap('coolwarm'))\n",
    "\n",
    "for i in range(len(X)):\n",
    "  ax.plot((X[i, 1], X[i, 1]),\n",
    "          (X[i, 2], X[i, 2]),\n",
    "          (y[i], y_hat[i]),\n",
    "          'g-', alpha=.5)\n",
    "\n",
    "ax.set(\n",
    "    xlabel='$x_1$',\n",
    "    ylabel='$x_2$',\n",
    "    zlabel='y'\n",
    ")\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 2:  Polynomial Regression\n",
    "\n",
    "\n",
    "So far today, you learned how to predict outputs from inputs by fitting a linear regression model. We can now model all sorts of relationships, including in neuroscience!\n",
    "\n",
    "One potential problem with this approach is the simplicity of the model. Linear regression, as the name implies, can only capture a linear relationship between the inputs and outputs. Put another way, the predicted outputs are only a weighted sum of the inputs. What if there are more complicated computations happening? Luckily, many more complex models exist (and you will encounter many more over the next 3 weeks). One model that is still very simple to fit and understand, but captures more complex relationships, is **polynomial regression**, an extension of linear regression.\n",
    "\n",
    "<details>\n",
    "<summary> <font color='blue'>Click here for text recap of relevant part of video </font></summary>\n",
    "\n",
    "Since polynomial regression is an extension of linear regression, everything you learned so far will come in handy now! The goal is the same: we want to predict the dependent variable $y$ given the input values $x$. The key change is the type of relationship between inputs and outputs that the model can capture.\n",
    "\n",
    "Linear regression models predict the outputs as a weighted sum of the inputs:\n",
    "\n",
    "\\begin{equation}\n",
    "y = \\theta_0 + \\theta x + \\epsilon\n",
    "\\end{equation}\n",
    "\n",
    "With polynomial regression, we model the outputs as a polynomial equation based on the inputs. For example, we can model the outputs as:\n",
    "\n",
    "\\begin{equation}\n",
    "y = \\theta_0 + \\theta_1 x + \\theta_2 x^2 + \\theta_3 x^3 + \\epsilon\n",
    "\\end{equation}\n",
    "\n",
    "We can change how complex a polynomial is fit by changing the order of the polynomial. The order of a polynomial refers to the highest power in the polynomial. The equation above is a third order polynomial because the highest value x is raised to is 3. We could add another term ($+ \\theta_4 x^4$) to model an order 4 polynomial and so on.\n",
    "</details>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "First, we will simulate some data to practice fitting polynomial regression models. We will generate random inputs $x$ and then compute y according to $y = x^2 - x - 2 $, with some extra noise both in the input and the output to make the model fitting exercise closer to a real life situation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to simulate some data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to simulate some data\n",
    "\n",
    "# setting a fixed seed to our random number generator ensures we will always\n",
    "# get the same psuedorandom number sequence\n",
    "np.random.seed(121)\n",
    "n_samples = 30\n",
    "x = np.random.uniform(-2, 2.5, n_samples)  # inputs uniformly sampled from [-2, 2.5)\n",
    "y =  x**2 - x - 2   # computing the outputs\n",
    "\n",
    "output_noise = 1/8 * np.random.randn(n_samples)\n",
    "y += output_noise  # adding some output noise\n",
    "\n",
    "input_noise = 1/2 * np.random.randn(n_samples)\n",
    "x += input_noise  # adding some input noise\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(x, y)  # produces a scatter plot\n",
    "ax.set(xlabel='x', ylabel='y');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.1: Design matrix for polynomial regression\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 16 min*\n",
    "\n",
    "Now we have the basic idea of polynomial regression and some noisy data, let's begin! The key difference between fitting a linear regression model and a polynomial regression model lies in how we structure the input variables.\n",
    "\n",
    "Let's go back to one feature for each data point. For linear regression, we used $\\mathbf{X} = \\mathbf{x}$ as the input data, where $\\mathbf{x}$ is a vector where each element is the input for a single data point. To add a constant bias (a y-intercept in a 2-D plot), we use $\\mathbf{X} = \\big[ \\boldsymbol 1, \\mathbf{x} \\big]$, where $\\boldsymbol 1$ is a column of ones.  When fitting, we learn a weight for each column of this matrix. So we learn a weight that multiples with column 1 - in this case that column is all ones so we gain the bias parameter ($+ \\theta_0$).\n",
    "\n",
    "This matrix $\\mathbf{X}$ that we use for our inputs is known as a **design matrix**. We want to create our design matrix so we learn weights for $\\mathbf{x}^2, \\mathbf{x}^3,$ etc. Thus, we want to build our design matrix $X$ for polynomial regression of order $k$ as:\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{X} = \\big[ \\boldsymbol 1 , \\mathbf{x}^1, \\mathbf{x}^2 , \\ldots , \\mathbf{x}^k \\big],\n",
    "\\end{equation}\n",
    "\n",
    "where $\\boldsymbol{1}$ is the vector the same length as $\\mathbf{x}$ consisting of of all ones, and $\\mathbf{x}^p$ is the vector $\\mathbf{x}$ with all elements raised to the power $p$. Note that $\\boldsymbol{1} = \\mathbf{x}^0$ and $\\mathbf{x}^1 = \\mathbf{x}$.\n",
    "\n",
    "If we have inputs with more than one feature, we can use a similar design matrix but include all features raised to each power. Imagine that we have two features per data point: $\\mathbf{x}_m$ is a vector of one feature per data point and  $\\mathbf{x}_n$ is another.  Our design matrix for a polynomial regression would be:\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathbf{X} = \\big[ \\boldsymbol 1 , \\mathbf{x}_m^1, \\mathbf{x}_n^1, \\mathbf{x}_m^2 , \\mathbf{x}_n^2\\ldots , \\mathbf{x}_m^k , \\mathbf{x}_n^k \\big],\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 2.1: Structure design matrix\n",
    "\n",
    "Create a function (`make_design_matrix`) that structures the design matrix given the input data and the order of the polynomial you wish to fit. We will print part of this design matrix for our data and order 5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def make_design_matrix(x, order):\n",
    "  \"\"\"Create the design matrix of inputs for use in polynomial regression\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    order (scalar): polynomial regression order\n",
    "\n",
    "  Returns:\n",
    "    ndarray: design matrix for polynomial regression of shape (samples, order+1)\n",
    "  \"\"\"\n",
    "  ########################################################################\n",
    "  ## TODO for students: create the design matrix ##\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: create the design matrix\")\n",
    "  ########################################################################\n",
    "\n",
    "  # Broadcast to shape (n x 1) so dimensions work\n",
    "  if x.ndim == 1:\n",
    "    x = x[:, None]\n",
    "\n",
    "  #if x has more than one feature, we don't want multiple columns of ones so we assign\n",
    "  # x^0 here\n",
    "  design_matrix = np.ones((x.shape[0], 1))\n",
    "\n",
    "  # Loop through rest of degrees and stack columns (hint: np.hstack)\n",
    "  for degree in range(1, order + 1):\n",
    "      design_matrix = ...\n",
    "\n",
    "  return design_matrix\n",
    "\n",
    "\n",
    "order = 5\n",
    "X_design = make_design_matrix(x, order)\n",
    "\n",
    "print(X_design[0:2, 0:2])\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "You should see that the printed section of this design matrix is\n",
    "\n",
    "```\n",
    "[[ 1.         -1.51194917]\n",
    " [ 1.         -0.35259945]]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "def make_design_matrix(x, order):\n",
    "  \"\"\"Create the design matrix of inputs for use in polynomial regression\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (samples,)\n",
    "    order (scalar): polynomial regression order\n",
    "\n",
    "  Returns:\n",
    "    ndarray: design matrix for polynomial regression of shape (samples, order+1)\n",
    "  \"\"\"\n",
    "\n",
    "  # Broadcast to shape (n x 1) so dimensions work\n",
    "  if x.ndim == 1:\n",
    "    x = x[:, None]\n",
    "\n",
    "  #if x has more than one feature, we don't want multiple columns of ones so we assign\n",
    "  # x^0 here\n",
    "  design_matrix = np.ones((x.shape[0], 1))\n",
    "\n",
    "  # Loop through rest of degrees and stack columns (hint: np.hstack)\n",
    "  for degree in range(1, order + 1):\n",
    "      design_matrix = np.hstack((design_matrix, x**degree))\n",
    "\n",
    "  return design_matrix\n",
    "\n",
    "\n",
    "order = 5\n",
    "X_design = make_design_matrix(x, order)\n",
    "\n",
    "print(X_design[0:2, 0:2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.2: Fitting polynomial regression models\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 24 min*\n",
    "\n",
    "Now that we have the inputs structured correctly in our design matrix, fitting a polynomial regression is the same as fitting a linear regression model! All of the polynomial structure we need to learn is contained in how the inputs are structured in the design matrix. We can use the same least squares solution we computed in previous exercises."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 2.2: Fitting polynomial regression models with different orders\n",
    "\n",
    "Here, we will fit polynomial regression models to find the regression coefficients ($\\theta_0, \\theta_1, \\theta_2,$ ...) by solving the least squares problem. Create a function `solve_poly_reg` that loops over different order polynomials (up to `max_order`), fits that model, and saves out the weights for each. You may invoke the `ordinary_least_squares` function.\n",
    "\n",
    "We will then qualitatively inspect the quality of our fits for each order by plotting the fitted polynomials on top of the data. In order to see smooth curves, we evaluate the fitted polynomials on a grid of $x$ values (ranging between the largest and smallest of the inputs present in the dataset)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def solve_poly_reg(x, y, max_order):\n",
    "  \"\"\"Fit a polynomial regression model for each order 0 through max_order.\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "    max_order (scalar): max order for polynomial fits\n",
    "\n",
    "  Returns:\n",
    "    dict: fitted weights for each polynomial model (dict key is order)\n",
    "  \"\"\"\n",
    "\n",
    "  # Create a dictionary with polynomial order as keys,\n",
    "  # and np array of theta_hat (weights) as the values\n",
    "  theta_hats = {}\n",
    "\n",
    "  # Loop over polynomial orders from 0 through max_order\n",
    "  for order in range(max_order + 1):\n",
    "\n",
    "    ##################################################################################\n",
    "    ## TODO for students: Create design matrix and fit polynomial model for this order\n",
    "    # Fill out function and remove\n",
    "    raise NotImplementedError(\"Student exercise: fit a polynomial model\")\n",
    "    ##################################################################################\n",
    "\n",
    "    # Create design matrix\n",
    "    X_design = ...\n",
    "\n",
    "    # Fit polynomial model\n",
    "    this_theta = ...\n",
    "\n",
    "    theta_hats[order] = this_theta\n",
    "\n",
    "  return theta_hats\n",
    "\n",
    "\n",
    "max_order = 5\n",
    "theta_hats = solve_poly_reg(x, y, max_order)\n",
    "\n",
    "# Visualize\n",
    "plot_fitted_polynomials(x, y, theta_hats)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "def solve_poly_reg(x, y, max_order):\n",
    "  \"\"\"Fit a polynomial regression model for each order 0 through max_order.\n",
    "\n",
    "  Args:\n",
    "    x (ndarray): input vector of shape (n_samples)\n",
    "    y (ndarray): vector of measurements of shape (n_samples)\n",
    "    max_order (scalar): max order for polynomial fits\n",
    "\n",
    "  Returns:\n",
    "    dict: fitted weights for each polynomial model (dict key is order)\n",
    "  \"\"\"\n",
    "\n",
    "  # Create a dictionary with polynomial order as keys,\n",
    "  # and np array of theta_hat (weights) as the values\n",
    "  theta_hats = {}\n",
    "\n",
    "  # Loop over polynomial orders from 0 through max_order\n",
    "  for order in range(max_order + 1):\n",
    "\n",
    "    # Create design matrix\n",
    "    X_design = make_design_matrix(x, order)\n",
    "\n",
    "    # Fit polynomial model\n",
    "    this_theta = ordinary_least_squares(X_design, y)\n",
    "\n",
    "    theta_hats[order] = this_theta\n",
    "\n",
    "  return theta_hats\n",
    "\n",
    "\n",
    "max_order = 5\n",
    "theta_hats = solve_poly_reg(x, y, max_order)\n",
    "\n",
    "# Visualize\n",
    "with plt.xkcd():\n",
    "  plot_fitted_polynomials(x, y, theta_hats)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.3: Evaluating fit quality\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 29 min*\n",
    "\n",
    "As with linear regression, we can compute mean squared error (MSE) to get a sense of how well the model fits the data.\n",
    "\n",
    "We compute MSE as:\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathrm{MSE} = \\frac 1 N ||\\mathbf{y} - \\hat{\\mathbf{y}}||^2 = \\frac 1 N \\sum_{i=1}^N (y_i - \\hat y_i)^2\n",
    "\\end{equation}\n",
    "\n",
    "where the predicted values for each model are given by $ \\hat{\\mathbf{y}} = \\mathbf{X}\\boldsymbol{\\hat\\theta}$.\n",
    "\n",
    "*Which model (i.e. which polynomial order) do you think will have the best MSE?*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 2.3: Compute MSE and compare models\n",
    "\n",
    "We will compare the MSE for different polynomial orders with a bar plot."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "mse_list = []\n",
    "order_list = list(range(max_order + 1))\n",
    "\n",
    "for order in order_list:\n",
    "\n",
    "  X_design = make_design_matrix(x, order)\n",
    "\n",
    "  ########################################################################\n",
    "  ## TODO for students\n",
    "  # Fill out function and remove\n",
    "  raise NotImplementedError(\"Student exercise: compute MSE\")\n",
    "  ########################################################################\n",
    "\n",
    "  # Get prediction for the polynomial regression model of this order\n",
    "  y_hat = ...\n",
    "\n",
    "  # Compute the residuals\n",
    "  residuals = ...\n",
    "\n",
    "  # Compute the MSE\n",
    "  mse = ...\n",
    "\n",
    "  mse_list.append(mse)\n",
    "\n",
    "\n",
    "# Visualize MSE of fits\n",
    "evaluate_fits(order_list, mse_list)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "mse_list = []\n",
    "order_list = list(range(max_order + 1))\n",
    "\n",
    "for order in order_list:\n",
    "\n",
    "  X_design = make_design_matrix(x, order)\n",
    "\n",
    "  # Get prediction for the polynomial regression model of this order\n",
    "  y_hat = X_design @ theta_hats[order]\n",
    "\n",
    "  # Compute the residuals\n",
    "  residuals = y - y_hat\n",
    "\n",
    "  # Compute the MSE\n",
    "  mse = np.mean(residuals ** 2)\n",
    "\n",
    "  mse_list.append(mse)\n",
    "\n",
    "\n",
    "# Visualize MSE of fits\n",
    "with plt.xkcd():\n",
    "  evaluate_fits(order_list, mse_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Summary\n",
    "\n",
    "*Estimated timing of tutorial: 35 minutes*\n",
    "\n",
    "* Linear regression generalizes naturally to multiple dimensions\n",
    "* Linear algebra affords us the mathematical tools to reason and solve such problems beyond the two dimensional case\n",
    "\n",
    "* To change from a linear regression model to a polynomial regression model, we only have to change how the input data is structured\n",
    "\n",
    "* We can choose the complexity of the model by changing the order of the polynomial model fit\n",
    "\n",
    "* Higher order polynomial models tend to have lower MSE on the data they're fit with\n",
    "\n",
    "**Note**: In practice, multidimensional least squares problems can be solved very efficiently (thanks to numerical routines such as LAPACK).\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Notation\n",
    "\n",
    "\\begin{align}\n",
    "x &\\quad \\text{input, independent variable}\\\\\n",
    "y &\\quad \\text{response measurement, dependent variable}\\\\\n",
    "\\epsilon &\\quad \\text{measurement error, noise contribution}\\\\\n",
    "\\theta &\\quad \\text{slope parameter}\\\\\n",
    "\\hat{\\theta} &\\quad \\text{estimated slope parameter}\\\\\n",
    "\\mathbf{x} &\\quad \\text{vector of inputs where each element is a different data point}\\\\\n",
    "\\mathbf{X} &\\quad \\text{design matrix}\\\\\n",
    "\\mathbf{y} &\\quad \\text{vector of measurements}\\\\\n",
    "\\mathbf{\\hat y} &\\quad \\text{vector of estimated measurements}\\\\\n",
    "\\boldsymbol{\\theta} &\\quad \\text{vector of parameters}\\\\\n",
    "\\boldsymbol{\\hat\\theta} &\\quad \\text{vector of estimated parameters}\\\\\n",
    "d &\\quad \\text{dimensionality of input}\\\\\n",
    "N &\\quad \\text{number of samples}\\\\\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "**Suggested readings**\n",
    "\n",
    "[Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares](http://vmls-book.stanford.edu/) by Stephen Boyd and Lieven Vandenberghe"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W1D2_Tutorial4",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}