{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/main/tutorials/W1D3_GeneralizedLinearModels/instructor/W1D3_Tutorial2.ipynb\" target=\"_blank\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a>   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/tutorials/W1D3_GeneralizedLinearModels/instructor/W1D3_Tutorial2.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 2: Classifiers and regularizers\n",
    "\n",
    "**Week 1, Day 3: Generalized Linear Models**\n",
    "\n",
    "**By Neuromatch Academy**\n",
    "\n",
    "**Content creators:** Pierre-Etienne H. Fiquet, Ari Benjamin, Jakob Macke\n",
    "\n",
    "**Content reviewers:** Davide Valeriani, Alish Dipani, Michael Waskom, Ella Batty\n",
    "\n",
    "**Production editors:** Spiros Chavlis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Tutorial Objectives\n",
    "\n",
    "*Estimated timing of tutorial: 1 hour, 35 minutes*\n",
    "\n",
    "This is part 2 of a 2-part series about Generalized Linear Models (GLMs), which are a fundamental framework for supervised learning. In part 1, we learned about and implemented GLMs. In this tutorial, we'll implement logistic regression, a special case of GLMs used to model binary outcomes.\n",
    "Oftentimes the variable you would like to predict takes only one of two possible values. Left or right? Awake or asleep? Car or bus? In this tutorial, we will decode a mouse's left/right decisions from spike train data. Our objectives are to:\n",
    "1.\tLearn about logistic regression, how it is derived within the GLM theory, and how it is implemented in scikit-learn\n",
    "2.\tApply logistic regression to decode choies from neural responses\n",
    "3.\tLearn about regularization, including the different approaches and the influence of hyperparameters\n",
    "\n",
    "<br>\n",
    "\n",
    "We would like to acknowledge [Steinmetz _et al._, 2019](https://www.nature.com/articles/s41586-019-1787-x) for sharing their data, a subset of which is used here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from IPython.display import IFrame\n",
    "from ipywidgets import widgets\n",
    "out = widgets.Output()\n",
    "with out:\n",
    "    print(f\"If you want to download the slides: https://osf.io/download/upyjz/\")\n",
    "    display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/upyjz/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
    "display(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip3 install vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"neuromatch_cn\",\n",
    "            \"user_key\": \"y1x3mpx5\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W1D3_T2\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Imports\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.model_selection import cross_val_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure settings\n",
    "import logging\n",
    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
    "\n",
    "import ipywidgets as widgets\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Plotting Functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Plotting Functions\n",
    "\n",
    "def plot_weights(models, sharey=True):\n",
    "  \"\"\"Draw a stem plot of weights for each model in models dict.\"\"\"\n",
    "  n = len(models)\n",
    "  f = plt.figure(figsize=(10, 2.5 * n))\n",
    "  axs = f.subplots(n, sharex=True, sharey=sharey)\n",
    "  axs = np.atleast_1d(axs)\n",
    "\n",
    "  for ax, (title, model) in zip(axs, models.items()):\n",
    "\n",
    "    ax.margins(x=.02)\n",
    "    stem = ax.stem(model.coef_.squeeze())\n",
    "    stem[0].set_marker(\".\")\n",
    "    stem[0].set_color(\".2\")\n",
    "    stem[1].set_linewidths(.5)\n",
    "    stem[1].set_color(\".2\")\n",
    "    stem[2].set_visible(False)\n",
    "    ax.axhline(0, color=\"C3\", lw=3)\n",
    "    ax.set(ylabel=\"Weight\", title=title)\n",
    "  ax.set(xlabel=\"Neuron (a.k.a. feature)\")\n",
    "  f.tight_layout()\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def plot_function(f, name, var, points=(-10, 10)):\n",
    "    \"\"\"Evaluate f() on linear space between points and plot.\n",
    "\n",
    "    Args:\n",
    "      f (callable): function that maps scalar -> scalar\n",
    "      name (string): Function name for axis labels\n",
    "      var (string): Variable name for axis labels.\n",
    "      points (tuple): Args for np.linspace to create eval grid.\n",
    "    \"\"\"\n",
    "    x = np.linspace(*points)\n",
    "    ax = plt.figure().subplots()\n",
    "    ax.plot(x, f(x))\n",
    "    ax.set(\n",
    "      xlabel=f'${var}$',\n",
    "      ylabel=f'${name}({var})$'\n",
    "    )\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "def plot_model_selection(C_values, accuracies):\n",
    "  \"\"\"Plot the accuracy curve over log-spaced C values.\"\"\"\n",
    "  ax = plt.figure().subplots()\n",
    "  ax.set_xscale(\"log\")\n",
    "  ax.plot(C_values, accuracies, marker=\"o\")\n",
    "  best_C = C_values[np.argmax(accuracies)]\n",
    "  ax.set(\n",
    "      xticks=C_values,\n",
    "      xlabel=\"C\",\n",
    "      ylabel=\"Cross-validated accuracy\",\n",
    "      title=f\"Best C: {best_C:1g} ({np.max(accuracies):.2%})\",\n",
    "  )\n",
    "  plt.show()\n",
    "\n",
    "\n",
    "def plot_non_zero_coefs(C_values, non_zero_l1, n_voxels):\n",
    "  \"\"\"Plot the accuracy curve over log-spaced C values.\"\"\"\n",
    "  ax = plt.figure().subplots()\n",
    "  ax.set_xscale(\"log\")\n",
    "  ax.plot(C_values, non_zero_l1, marker=\"o\")\n",
    "  ax.set(\n",
    "    xticks=C_values,\n",
    "    xlabel=\"C\",\n",
    "    ylabel=\"Number of non-zero coefficients\",\n",
    "  )\n",
    "  ax.axhline(n_voxels, color=\".1\", linestyle=\":\")\n",
    "  ax.annotate(\"Total\\n# Neurons\", (C_values[0], n_voxels * .98), va=\"top\")\n",
    "  plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Data retrieval and loading\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "#@title Data retrieval and loading\n",
    "import os\n",
    "import requests\n",
    "import hashlib\n",
    "\n",
    "url = \"https://osf.io/r9gh8/download\"\n",
    "fname = \"W1D4_steinmetz_data.npz\"\n",
    "expected_md5 = \"d19716354fed0981267456b80db07ea8\"\n",
    "\n",
    "if not os.path.isfile(fname):\n",
    "  try:\n",
    "    r = requests.get(url)\n",
    "  except requests.ConnectionError:\n",
    "    print(\"!!! Failed to download data !!!\")\n",
    "  else:\n",
    "    if r.status_code != requests.codes.ok:\n",
    "      print(\"!!! Failed to download data !!!\")\n",
    "    elif hashlib.md5(r.content).hexdigest() != expected_md5:\n",
    "      print(\"!!! Data download appears corrupted !!!\")\n",
    "    else:\n",
    "      with open(fname, \"wb\") as fid:\n",
    "        fid.write(r.content)\n",
    "\n",
    "def load_steinmetz_data(data_fname=fname):\n",
    "\n",
    "  with np.load(data_fname) as dobj:\n",
    "    data = dict(**dobj)\n",
    "\n",
    "  return data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "\n",
    "# Section 1: Logistic regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Logistic regression\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Logistic regression\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'qfXFrUnLU0o'), ('Bilibili', 'BV1P54y1q7Qn')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Logistic_regression_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Logistic Regression is a binary classification model. It is a GLM with a *logistic* link function and a *Bernoulli* (i.e. coinflip) noise model.\n",
    "\n",
    "Like in the last notebook, logistic regression invokes a standard procedure:\n",
    "\n",
    "1.   Define a *model* of how inputs relate to outputs.\n",
    "2.   Adjust the parameters to maximize (log) probability of your data given your model\n",
    "\n",
    "## Section 1.1: The logistic regression model\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 8 min*\n",
    "\n",
    "<details>\n",
    "<summary> <font color='blue'>Click here for text recap of relevant part of video </font></summary>\n",
    "\n",
    "The fundamental input/output equation of logistic regression is:\n",
    "\n",
    "\\begin{equation}\n",
    "\\hat{y} \\equiv p(y=1|x,\\theta) = \\sigma(\\theta^\\top x)\n",
    "\\end{equation}\n",
    "\n",
    "Note that we interpret the output of logistic regression, $\\hat{y}$, as the **probability that y = 1** given inputs $x$ and parameters $\\theta$.\n",
    "\n",
    "Here $\\sigma(\\cdot)$ is a \"squashing\" function called the **sigmoid function** or **logistic function**. Its output is in the range $0 \\leq y \\leq 1$. It looks like this:\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma(z) = \\frac{1}{1 + \\textrm{exp}(-z)}\n",
    "\\end{equation}\n",
    "\n",
    "Recall that $z = \\theta^\\top x$. The parameters decide whether $\\theta^\\top x$ will be very negative, in which case $\\sigma(\\theta^\\top x)\\approx 0$, or very positive, meaning  $\\sigma(\\theta^\\top x)\\approx 1$.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 1.1: Implement the sigmoid function\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cellView": "both",
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def sigmoid(z):\n",
    "  \"\"\"Return the logistic transform of z.\"\"\"\n",
    "  ##############################################################################\n",
    "  # TODO for students: Fill in the missing code (...) and remove the error\n",
    "  raise NotImplementedError(\"Student exercise: implement the sigmoid function\")\n",
    "  ##############################################################################\n",
    "\n",
    "  sigmoid = ...\n",
    "\n",
    "  return sigmoid\n",
    "\n",
    "\n",
    "# Visualize\n",
    "plot_function(sigmoid, \"\\sigma\", \"z\", (-10, 10))\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "def sigmoid(z):\n",
    "  \"\"\"Return the logistic transform of z.\"\"\"\n",
    "\n",
    "  sigmoid = 1 / (1 + np.exp(-z))\n",
    "\n",
    "  return sigmoid\n",
    "\n",
    "\n",
    "# Visualize\n",
    "with plt.xkcd():\n",
    "  plot_function(sigmoid, \"\\sigma\", \"z\", (-10, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Implement_the_sigmoid_function_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 1.2: Using scikit-learn\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 13 min*\n",
    "\n",
    "Unlike the previous notebook, we're not going to write the code that implements all of the Logistic Regression model itself. Instead, we're going to use the implementation in [scikit-learn](https://scikit-learn.org/stable/), a very popular library for Machine Learning.\n",
    "\n",
    "The goal of this next section is to introduce `scikit-learn` classifiers and understand how to apply it to real neural data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Section 2: Decoding neural data with logistic regression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.1: Setting up the data\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 15 min*\n",
    "\n",
    "In this notebook we'll use the Steinmetz dataset that you have seen previously. Recall that this dataset includes recordings of neurons as mice perform a decision task.\n",
    "\n",
    "Mice had the task of turning a wheel to indicate whether they perceived a Gabor stimulus to the left, to the right, or not at all. Neuropixel probes measured spikes across the cortex. Check out the following task schematic below from the BiorXiv preprint.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute to see schematic\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute to see schematic\n",
    "import IPython\n",
    "IPython.display.Image(\"http://kordinglab.com/images/others/steinmetz-task.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Today we're going to **decode the decision from neural data** using Logistic Regression. We will only consider trials where the mouse chose \"Left\" or \"Right\" and ignore NoGo trials.\n",
    "\n",
    "### Data format\n",
    "\n",
    "In the hidden `Data retrieval and loading` cell, there is a function that loads the data:\n",
    "\n",
    "- `spikes`: an array of normalized spike rates with shape `(n_trials, n_neurons)`\n",
    "- `choices`: a vector of 0s and 1s, indicating the animal's behavioural response, with length `n_trials`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "data = load_steinmetz_data()\n",
    "for key, val in data.items():\n",
    "  print(key, val.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "As with the GLMs you've seen in the previous tutorial (Linear and Poisson Regression), we will need two data structures:\n",
    "\n",
    "- an `X` matrix with shape `(n_samples, n_features)`\n",
    "- a `y` vector with length `n_samples`.\n",
    "\n",
    "In the previous notebook, `y` corresponded to the neural data, and `X` corresponded to something about the experiment. Here, we are going to invert those relationships. That's what makes this a *decoding* model: we are going to predict behaviour (`y`) from the neural responses (`X`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "y = data[\"choices\"]\n",
    "X = data[\"spikes\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.2: Fitting the model\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 25 min*\n",
    "\n",
    "Using a Logistic Regression model within `scikit-learn` is very simple."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Define the model\n",
    "log_reg = LogisticRegression(penalty=None)\n",
    "\n",
    "# Fit it to data\n",
    "log_reg.fit(X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "There's two steps here:\n",
    "\n",
    "- We *initialized* the model with a hyperparameter, telling it what penalty to use (we'll focus on this in the second part of the notebook)\n",
    "- We *fit* the model by passing it the `X` and `y` objects.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.3: Classifying the training data\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 27 min*\n",
    "\n",
    "Fitting the model performs maximum likelihood optimization, learning a set of *feature weights*. We can use those learned weights to *classify* new data, or predict the labels for each sample:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "y_pred = log_reg.predict(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.4: Evaluating the model\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 30 min*\n",
    "\n",
    "Now, we need to evaluate the predictions of the model. We will use an *accuracy* score for this purpose. The accuracy of a classifier is determined by the proportion of correct trials, where the predicted label matches the true label, out of the total number of trials."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 2.4: Classifier accuracy\n",
    "\n",
    "For the first exercise, implement a function to evaluate a classifier using the accuracy score. Use it to get the accuracy of the classifier on the *training* data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def compute_accuracy(X, y, model):\n",
    "  \"\"\"Compute accuracy of classifier predictions.\n",
    "\n",
    "  Args:\n",
    "    X (2D array): Data matrix\n",
    "    y (1D array): Label vector\n",
    "    model (sklearn estimator): Classifier with trained weights.\n",
    "\n",
    "  Returns:\n",
    "    accuracy (float): Proportion of correct predictions.\n",
    "  \"\"\"\n",
    "  #############################################################################\n",
    "  # TODO Complete the function, then remove the next line to test it\n",
    "  raise NotImplementedError(\"Implement the compute_accuracy function\")\n",
    "  #############################################################################\n",
    "\n",
    "  y_pred = model.predict(X)\n",
    "\n",
    "  accuracy = ...\n",
    "\n",
    "  return accuracy\n",
    "\n",
    "\n",
    "# Compute train accuracy\n",
    "train_accuracy = compute_accuracy(X, y, log_reg)\n",
    "print(f\"Accuracy on the training data: {train_accuracy:.2%}\")\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "You should see\n",
    "```\n",
    "Accuracy on the training data: 100.00%\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "def compute_accuracy(X, y, model):\n",
    "  \"\"\"Compute accuracy of classifier predictions.\n",
    "\n",
    "  Args:\n",
    "    X (2D array): Data matrix\n",
    "    y (1D array): Label vector\n",
    "    model (sklearn estimator): Classifier with trained weights.\n",
    "\n",
    "  Returns:\n",
    "    accuracy (float): Proportion of correct predictions.\n",
    "  \"\"\"\n",
    "\n",
    "  y_pred = model.predict(X)\n",
    "\n",
    "  accuracy = (y == y_pred).sum() / len(y)\n",
    "\n",
    "  return accuracy\n",
    "\n",
    "\n",
    "# Compute train accuracy\n",
    "train_accuracy = compute_accuracy(X, y, log_reg)\n",
    "print(f\"Accuracy on the training data: {train_accuracy:.2%}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Classifier_accuracy_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 2.5: Cross-validating the classifier\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 40 min*\n",
    "\n",
    "Classification accuracy on the training data is 100%! That might sound impressive, but you should recall from yesterday the concept of *overfitting*: the classifier may have learned something idiosyncratic about the training data. If that's the case, it won't have really learned the underlying data->decision function, and thus won't generalize well to new data.\n",
    "\n",
    "To check this, we can evaluate the *cross-validated* accuracy.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute to see schematic\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute to see schematic\n",
    "import IPython\n",
    "IPython.display.Image(\"http://kordinglab.com/images/others/justCV-01.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Cross-validating using `scikit-learn` helper functions\n",
    "\n",
    "Yesterday, we asked you to write your own functions for implementing cross-validation. In practice, this won't be necessary, because `scikit-learn` offers a number of [helpful functions](https://scikit-learn.org/stable/model_selection.html) that will do this for you. For example, you can cross-validate a classifier using `cross_val_score`.\n",
    "\n",
    "`cross_val_score` takes a `sklearn` model like `LogisticRegression`, as well as your `X` and `y` data. It then retrains your model on test/train splits of `X` and `y`, and returns the test accuracy on each of the test sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "accuracies = cross_val_score(LogisticRegression(penalty=None), X, y, cv=8)  # k=8 cross validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Run to plot out these `k=8` accuracy scores.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Run to plot out these `k=8` accuracy scores.\n",
    "f, ax = plt.subplots(figsize=(8, 3))\n",
    "ax.boxplot(accuracies, vert=False, widths=.7)\n",
    "ax.scatter(accuracies, np.ones(8))\n",
    "ax.set(\n",
    "  xlabel=\"Accuracy\",\n",
    "  yticks=[],\n",
    "  title=f\"Average test accuracy: {accuracies.mean():.2%}\"\n",
    ")\n",
    "ax.spines[\"left\"].set_visible(False)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The lower cross-validated accuracy compared to the training accuracy (100%) suggests that the model is being *overfit*. Is this surprising? Think about the shape of the $X$ matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "print(X.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The model has almost three times as many features as samples. This is a situation where overfitting is very likely (almost guaranteed).\n",
    "\n",
    "**Link to neuroscience**: Neuro data commonly has more features than samples. Having more neurons than independent trials is one example. In fMRI data, there are commonly more measured voxels than independent trials.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Why more features than samples leads to overfitting\n",
    "\n",
    "In brief, the variance of model estimation increases when there are more features than samples. That is, you would get a very different model every time you get new data and run `.fit()`. This is very related to the *bias/variance tradeoff* you learned about on day 1.\n",
    "\n",
    "Why does this happen? Here's a tiny example to get your intuition going. Imagine trying to find a best-fit line in 2D when you only have 1 datapoint. There are simply an infinite number of lines that pass through that point. This is the situation we find ourselves in with more features than samples.\n",
    "\n",
    "### What we can do about it\n",
    "As you learned on day 1, you can decrease model variance if you don't mind increasing its bias. Here, we will increase bias by assuming that the correct parameters are all small. In our 2D example, this is like preferring the horizontal line to all others. This is one example of *regularization*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "-----\n",
    "\n",
    "# Section 3: Regularization\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 50 min*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Video 2: Regularization\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 2: Regularization\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'b2IaUCZ91bo'), ('Bilibili', 'BV1Tg4y1i773')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Regularization_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "\n",
    "<details>\n",
    "<summary> <font color='blue'>Click here for text recap of video </font></summary>\n",
    "\n",
    "Regularization forces a model to learn a set solutions you *a priori* believe to be more correct, which reduces overfitting because it doesn't have as much flexibility to fit idiosyncracies in the training data. This adds model bias, but it's a good bias because you know (maybe) that parameters should be small or mostly 0.\n",
    "\n",
    "In a GLM, a common form of regularization is to *shrink* the classifier weights. In a linear model, you can see its effect by plotting the weights. We've defined a helper function, `plot_weights`, that we'll use extensively in this section.\n",
    "\n",
    "</details>\n",
    "\n",
    "Here is what the weights look like for a Logistic Regression model with no regularization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "log_reg = LogisticRegression(penalty=None).fit(X, y)\n",
    "plot_weights({\"No regularization\": log_reg})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "It's important to understand this plot. Each dot visualizes a value in our parameter vector $\\theta$. (It's the same style of plot as the one showing $\\theta$ in the video). Since each feature is the time-averaged response of a neuron, each dot shows how the model uses each neuron to estimate a decision.\n",
    "\n",
    "Note the scale of the y-axis. Some neurons have values of about $20$, whereas others scale to $-20$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.1: $L_2$ regularization\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 53 min*\n",
    "\n",
    "Regularization comes in different flavors. A very common one uses an $L_2$ or \"ridge\" penalty. This changes the objective function to\n",
    "\n",
    "\\begin{equation}\n",
    "-\\log\\mathcal{L}'(\\theta | X, y)= -\\log\\mathcal{L}(\\theta | X, y) +\\frac\\beta2\\sum_i\\theta_i^2,\n",
    "\\end{equation}\n",
    "\n",
    "where $\\beta$ is a *hyperparameter* that sets the *strength* of the regularization.\n",
    "\n",
    "You can use regularization in `scikit-learn` by changing the `penalty`, and you can set the strength of the regularization with the `C` hyperparameter ($C = \\frac{1}{\\beta}$, so this sets the *inverse* regularization).\n",
    "\n",
    "Let's compare the unregularized classifier weights with the classifier weights when we use the default `C = 1`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "log_reg_l2 = LogisticRegression(penalty=\"l2\", C=1).fit(X, y)\n",
    "\n",
    "# now show the two models\n",
    "models = {\n",
    "  \"No regularization\": log_reg,\n",
    "  \"$L_2$ (C = 1)\": log_reg_l2,\n",
    "}\n",
    "plot_weights(models)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Using the same scale for the two y axes, it's almost impossible to see the $L_2$ weights. Let's allow the y axis scales to adjust to each set of weights:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "plot_weights(models, sharey=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "\n",
    "Now you can see that the weights have the same basic pattern, but the regularized weights are an order-of-magnitude smaller.\n",
    "\n",
    "### Interactive Demo 3.1: The effect of varying C on parameter size\n",
    "\n",
    "We can use this same approach to see how the weights depend on the *strength* of the regularization:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute this cell to enable the widget!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute this cell to enable the widget!\n",
    "\n",
    "# Precompute the models so the widget is responsive\n",
    "log_C_steps = 1, 11, 1\n",
    "penalized_models = {}\n",
    "for log_C in np.arange(*log_C_steps, dtype=int):\n",
    "  m = LogisticRegression(\"l2\", C=10 ** log_C, max_iter=5000)\n",
    "  penalized_models[log_C] = m.fit(X, y)\n",
    "\n",
    "@widgets.interact\n",
    "def plot_observed(log_C = widgets.IntSlider(value=1, min=1, max=10, step=1)):\n",
    "  models = {\n",
    "    \"No regularization\": log_reg,\n",
    "    f\"$L_2$ (C = $10^{{{log_C}}}$)\": penalized_models[log_C]\n",
    "  }\n",
    "  plot_weights(models)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Recall from above that $C=\\frac1\\beta$ so larger $C$ is less regularization. The top panel corresponds to $C \\rightarrow \\infty$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.2: $L_1$ regularization\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 1 hr, 3 min*\n",
    "\n",
    "$L_2$ is not the only option for regularization. There is also the $L_1$, or \"Lasso\" penalty. This changes the objective function to\n",
    "\n",
    "\\begin{equation}\n",
    "-\\log\\mathcal{L}'(\\theta | X, y) = -\\log\\mathcal{L}(\\theta | X, y) +\\frac\\beta2\\sum_i|\\theta_i|\n",
    "\\end{equation}\n",
    "\n",
    "In practice, using the summed absolute values of the weights causes *sparsity*: instead of just getting smaller, some of the weights will get forced to $0$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "log_reg_l1 = LogisticRegression(penalty=\"l1\", C=1, solver=\"saga\", max_iter=5000)\n",
    "log_reg_l1.fit(X, y)\n",
    "models = {\n",
    "  \"$L_2$ (C = 1)\": log_reg_l2,\n",
    "  \"$L_1$ (C = 1)\": log_reg_l1,\n",
    "}\n",
    "plot_weights(models)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Note: You'll notice that we added two additional parameters: `solver=\"saga\"` and `max_iter=5000`. The `LogisticRegression` class can use several different optimization algorithms (\"solvers\"), and not all of them support the $L_1$ penalty. At a certain point, the solver will give up if it hasn't found a minimum value. The `max_iter` parameter tells it to make more attempts; otherwise, we'd see an ugly warning about \"convergence\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.3: The key difference between $L_1$ and $L_2$ regularization: sparsity\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 1 hr, 10 min*\n",
    "\n",
    "When should you use $L_1$ vs. $L_2$ regularization? Both penalties shrink parameters, and both will help reduce overfitting. However, the models they lead to are different.\n",
    "\n",
    "In particular, the $L_1$ penalty encourages *sparse* solutions in which most parameters are 0. Let's unpack the notion of sparsity.\n",
    "\n",
    "A \"dense\" vector has mostly nonzero elements:\n",
    "$\\begin{bmatrix}\n",
    "  0.1 \\\\ -0.6\\\\-9.1\\\\0.07\n",
    "\\end{bmatrix}$.\n",
    "A \"sparse\" vector has mostly zero elements:\n",
    "$\\begin{bmatrix}\n",
    "  0 \\\\ -0.7\\\\ 0\\\\0\n",
    "\\end{bmatrix}$.\n",
    "\n",
    "The same is true of matrices:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute to plot a dense and a sparse matrix\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute to plot a dense and a sparse matrix\n",
    "np.random.seed(50)\n",
    "n = 5\n",
    "M = np.random.random((n, n))\n",
    "M_sparse = np.random.choice([0, 1], size=(n, n), p=[0.8, 0.2])\n",
    "\n",
    "fig, axs = plt.subplots(1, 2, sharey=True, figsize=(10, 5))\n",
    "\n",
    "axs[0].imshow(M)\n",
    "axs[1].imshow(M_sparse)\n",
    "axs[0].axis('off')\n",
    "axs[1].axis('off')\n",
    "axs[0].set_title(\"A dense matrix\", fontsize=15)\n",
    "axs[1].set_title(\"A sparse matrix\", fontsize=15)\n",
    "text_kws = dict(ha=\"center\", va=\"center\")\n",
    "for i in range(n):\n",
    "  for j in range(n):\n",
    "    iter_parts = axs, [M, M_sparse], [\"{:.1f}\", \"{:d}\"]\n",
    "    for ax, mat, fmt in zip(*iter_parts):\n",
    "      val = mat[i, j]\n",
    "      color = \".1\" if val > .7 else \"w\"\n",
    "      ax.text(j, i, fmt.format(val), c=color, **text_kws)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 3.3: The effect of $L_1$ regularization on parameter sparsity\n",
    "\n",
    "Please complete the following function to fit a regularized `LogisticRegression` model and return **the number of coefficients in the parameter vector that are equal to 0**.\n",
    "\n",
    "Don't forget to check out the [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def count_non_zero_coefs(X, y, C_values):\n",
    "  \"\"\"Fit models with different L1 penalty values and count non-zero coefficients.\n",
    "\n",
    "  Args:\n",
    "    X (2D array): Data matrix\n",
    "    y (1D array): Label vector\n",
    "    C_values (1D array): List of hyperparameter values\n",
    "\n",
    "  Returns:\n",
    "    non_zero_coefs (list): number of coefficients in each model that are nonzero\n",
    "\n",
    "  \"\"\"\n",
    "  #############################################################################\n",
    "  # TODO Complete the function and remove the error\n",
    "  raise NotImplementedError(\"Implement the count_non_zero_coefs function\")\n",
    "  #############################################################################\n",
    "\n",
    "  non_zero_coefs = []\n",
    "  for C in C_values:\n",
    "\n",
    "    # Initialize and fit the model\n",
    "    # (Hint, you may need to set max_iter)\n",
    "    model = ...\n",
    "    ...\n",
    "\n",
    "    # Get the coefs of the fit model (in sklearn, we can do this using model.coef_)\n",
    "    coefs = ...\n",
    "\n",
    "    # Count the number of non-zero elements in coefs\n",
    "    non_zero = ...\n",
    "    non_zero_coefs.append(non_zero)\n",
    "\n",
    "  return non_zero_coefs\n",
    "\n",
    "\n",
    "# Use log-spaced values for C\n",
    "C_values = np.logspace(-4, 4, 5)\n",
    "\n",
    "# Count non zero coefficients\n",
    "non_zero_l1 = count_non_zero_coefs(X, y, C_values)\n",
    "\n",
    "# Visualize\n",
    "plot_non_zero_coefs(C_values, non_zero_l1, n_voxels=X.shape[1])\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "def count_non_zero_coefs(X, y, C_values):\n",
    "  \"\"\"Fit models with different L1 penalty values and count non-zero coefficients.\n",
    "\n",
    "  Args:\n",
    "    X (2D array): Data matrix\n",
    "    y (1D array): Label vector\n",
    "    C_values (1D array): List of hyperparameter values\n",
    "\n",
    "  Returns:\n",
    "    non_zero_coefs (list): number of coefficients in each model that are nonzero\n",
    "\n",
    "  \"\"\"\n",
    "  non_zero_coefs = []\n",
    "  for C in C_values:\n",
    "\n",
    "    # Initialize and fit the model\n",
    "    # (Hint, you may need to set max_iter)\n",
    "    model = LogisticRegression(penalty=\"l1\", C=C, solver=\"saga\", max_iter=5000)\n",
    "    model.fit(X,y)\n",
    "\n",
    "    # Get the coefs of the fit model (in sklearn, we can do this using model.coef_)\n",
    "    coefs = model.coef_\n",
    "\n",
    "    # Count the number of non-zero elements in coefs\n",
    "    non_zero = np.sum(coefs != 0)\n",
    "    non_zero_coefs.append(non_zero)\n",
    "\n",
    "  return non_zero_coefs\n",
    "\n",
    "\n",
    "# Use log-spaced values for C\n",
    "C_values = np.logspace(-4, 4, 5)\n",
    "\n",
    "# Count non zero coefficients\n",
    "non_zero_l1 = count_non_zero_coefs(X, y, C_values)\n",
    "\n",
    "# Visualize\n",
    "with plt.xkcd():\n",
    "  plot_non_zero_coefs(C_values, non_zero_l1, n_voxels=X.shape[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Effect_of_L1_on_sparsity_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Smaller $C$ (bigger $\\beta$) leads to sparser solutions.\n",
    "\n",
    "**Link to neuroscience**: When is it OK to assume that the parameter vector is sparse? Whenever it is true that most features don't affect the outcome. One use-case might be decoding low-level visual features from whole-brain fMRI: we may expect only voxels in V1 and thalamus should be used in the prediction.\n",
    "\n",
    "**WARNING**: be careful when interpreting $\\theta$. Never interpret the nonzero coefficients as *evidence* that only those voxels/neurons/features carry information about the outcome. This is a product of our regularization scheme, and thus *our prior assumption that the solution is sparse*. Other regularization types or models may find very distributed relationships across the brain. Never use a model as evidence for a phenomena when that phenomena is encoded in the assumptions of the model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Section 3.4: Choosing the regularization penalty\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 1 hr, 25 min*\n",
    "\n",
    "In the examples above, we just picked arbitrary numbers for the strength of regularization. How do you know what value of the hyperparameter to use?\n",
    "\n",
    "The answer is the same as when you want to know whether you have learned good parameter values: use cross-validation. The best hyperparameter will be the one that allows the model to generalize best to unseen data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "### Coding Exercise 3.4: Model selection\n",
    "\n",
    "In the final exercise, we will use cross-validation to evaluate a set of models, each with a different $L_2$ penalty. Your `model_selection` function should have a for-loop that gets the mean cross-validated accuracy for each penalty value (use the `cross_val_score` function that we introduced above, with 8 folds)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "```python\n",
    "def model_selection(X, y, C_values):\n",
    "  \"\"\"Compute CV accuracy for each C value.\n",
    "\n",
    "  Args:\n",
    "    X (2D array): Data matrix\n",
    "    y (1D array): Label vector\n",
    "    C_values (1D array): Array of hyperparameter values\n",
    "\n",
    "  Returns:\n",
    "    accuracies (1D array): CV accuracy with each value of C\n",
    "\n",
    "  \"\"\"\n",
    "  #############################################################################\n",
    "  # TODO Complete the function and remove the error\n",
    "  raise NotImplementedError(\"Implement the model_selection function\")\n",
    "  #############################################################################\n",
    "\n",
    "  accuracies = []\n",
    "  for C in C_values:\n",
    "\n",
    "    # Initialize and fit the model\n",
    "    # (Hint, you may need to set max_iter)\n",
    "    model = ...\n",
    "\n",
    "    # Get the accuracy for each test split using cross-validation\n",
    "    accs = ...\n",
    "\n",
    "    # Store the average test accuracy for this value of C\n",
    "    accuracies.append(...)\n",
    "\n",
    "  return accuracies\n",
    "\n",
    "\n",
    "# Use log-spaced values for C\n",
    "C_values = np.logspace(-4, 4, 9)\n",
    "\n",
    "# Compute accuracies\n",
    "accuracies = model_selection(X, y, C_values)\n",
    "\n",
    "# Visualize\n",
    "plot_model_selection(C_values, accuracies)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# to_remove solution\n",
    "\n",
    "def model_selection(X, y, C_values):\n",
    "  \"\"\"Compute CV accuracy for each C value.\n",
    "\n",
    "  Args:\n",
    "    X (2D array): Data matrix\n",
    "    y (1D array): Label vector\n",
    "    C_values (1D array): Array of hyperparameter values.\n",
    "\n",
    "  Returns:\n",
    "    accuracies (1D array): CV accuracy with each value of C.\n",
    "\n",
    "  \"\"\"\n",
    "  accuracies = []\n",
    "  for C in C_values:\n",
    "\n",
    "    # Initialize and fit the model\n",
    "    # (Hint, you may need to set max_iter)\n",
    "    model = LogisticRegression(penalty=\"l2\", C=C, max_iter=5000)\n",
    "\n",
    "    # Get the accuracy for each test split using cross-validation\n",
    "    accs = cross_val_score(model, X, y, cv=8)\n",
    "\n",
    "    # Store the average test accuracy for this value of C\n",
    "    accuracies.append(accs.mean())\n",
    "\n",
    "  return accuracies\n",
    "\n",
    "\n",
    "# Use log-spaced values for C\n",
    "C_values = np.logspace(-4, 4, 9)\n",
    "\n",
    "# Compute accuracies\n",
    "accuracies = model_selection(X, y, C_values)\n",
    "\n",
    "# Visualize\n",
    "with plt.xkcd():\n",
    "  plot_model_selection(C_values, accuracies)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Model_selection_Exercise\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "This plot suggests that the right value of $C$ does matter — up to a point. Remember that C is the *inverse* regularization. The plot shows that models where the regularization was too strong (small C values) performed very poorly. For $C > 10^{-2}$, the differences are marginal, but the best performance was obtained with an intermediate value ($C \\approx 10^1$)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Summary\n",
    "\n",
    "*Estimated timing of tutorial: 1 hour, 35 minutes*\n",
    "\n",
    "In this notebook, we learned about Logistic Regression, a fundamental algorithm for *classification*. We applied the algorithm to a *neural decoding* problem: we tried to predict an animal's behavioural choice from its neural activity. We saw again how important it is to use *cross-validation* to evaluate complex models that are at risk for *overfitting*, and we learned how *regularization* can be used to fit models that generalize better. Finally, we learned about some of the different options for regularization, and we saw how cross-validation can be useful for *model selection*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Notation\n",
    "\n",
    "\\begin{align}\n",
    "x &\\quad \\text{input}\\\\\n",
    "y &\\quad \\text{measurement, response}\\\\\n",
    "\\theta &\\quad \\text{parameter}\\\\\n",
    "\\sigma(z) &\\quad \\text{logistic function}\\\\\n",
    "C &\\quad \\text{inverse regularization strength parameter}\\\\\n",
    "\\beta &\\quad \\text{regularization strength parameter}\\\\\n",
    "\\hat{y} &\\quad \\text{estimated output}\\\\\n",
    "\\mathcal{L}(\\theta| y_i, x_i) &\\quad \\text{likelihood of that parameter } \\theta \\text{ producing response } y_i \\text{ from input } x_i\\\\\n",
    "L_1 &\\quad \\text{Lasso regularization}\\\\\n",
    "L_2 &\\quad \\text{ridge regularization}\\\\\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Bonus\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Bonus Section 1: The Logistic Regression model in full\n",
    "\n",
    "The fundamental input/output equation of logistic regression is:\n",
    "\n",
    "\\begin{equation}\n",
    "p(y_i = 1 |x_i, \\theta) = \\sigma(\\theta^\\top x_i)\n",
    "\\end{equation}\n",
    "\n",
    "**The logistic link function**\n",
    "\n",
    "You've seen $\\theta^T x_i$ before, but the $\\sigma$ is new. It's the *sigmoidal* or *logistic* link function that \"squashes\" $\\theta^\\top x_i$ to keep it between $0$ and $1$:\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma(z) = \\frac{1}{1 + \\textrm{exp}(-z)}\n",
    "\\end{equation}\n",
    "\n",
    "**The Bernoulli likelihood**\n",
    "\n",
    "You might have noticed that the output of the sigmoid, $\\hat{y}$ is not a binary value (0 or 1), even though the true data $y$ is! Instead, we interpret the value of $\\hat{y}$ as the *probability that y = 1*:\n",
    "\n",
    "\\begin{equation}\n",
    "\\hat{y_i} \\equiv p(y_i=1|x_i,\\theta) = \\frac{1}{{1 + \\textrm{exp}(-\\theta^\\top x_i)}}\n",
    "\\end{equation}\n",
    "\n",
    "To get the likelihood of the parameters, we need to define *the probability of seeing $y$ given $\\hat{y}$*. In logistic regression, we do this using the Bernoulli distribution:\n",
    "\n",
    "\\begin{equation}\n",
    "P(y_i\\ |\\ \\hat{y}_i) = \\hat{y}_i^{y_i}(1 - \\hat{y}_i)^{(1 - y_i)}\n",
    "\\end{equation}\n",
    "\n",
    "So plugging in the regression model:\n",
    "\n",
    "\\begin{equation}\n",
    "P(y_i\\ |\\ \\theta, x_i) = \\sigma(\\theta^\\top x_i)^{y_i}\\left(1 - \\sigma(\\theta^\\top x_i)\\right)^{(1 - y_i)}.\n",
    "\\end{equation}\n",
    "\n",
    "This expression effectively measures how good our parameters $\\theta$ are. We can also write it as the likelihood of the parameters given the data:\n",
    "\n",
    "\\begin{equation}\n",
    "\\mathcal{L}(\\theta\\ |\\ y_i, x_i) = P(y_i\\ |\\ \\theta, x_i),\n",
    "\\end{equation}\n",
    "\n",
    "and then use this as a target of optimization, considering all of the trials independently:\n",
    "\n",
    "\\begin{equation}\n",
    "\\log\\mathcal{L}(\\theta | X, y) = \\sum_{i=1}^Ny_i\\log\\left(\\sigma(\\theta^\\top x_i)\\right)\\ +\\ (1-y_i)\\log\\left(1 - \\sigma(\\theta^\\top x_i)\\right).\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Bonus Section 2: More detail about model selection\n",
    "\n",
    "In the final exercise, we used all of the data to choose the hyperparameters. That means we don't have any fresh data left over to evaluate the performance of the selected model. In practice, you would want to have two *nested* layers of cross-validation, where the final evaluation is performed on data that played no role in selecting or training the model.\n",
    "\n",
    "Indeed, the proper method for splitting your data to choose hyperparameters can get confusing. Here's a guide that the authors of this notebook developed while writing a tutorial on using machine learning for neural decoding [arxiv:1708.00909](https://arxiv.org/abs/1708.00909)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Execute to see schematic\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Execute to see schematic\n",
    "import IPython\n",
    "IPython.display.Image(\"http://kordinglab.com/images/others/CV-01.png\")"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W1D3_Tutorial2",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}