{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 2: Convolutional Neural Networks\n", "\n", "**Week 1, Day 5: Deep Learning**\n", "\n", "**By Neuromatch Academy**\n", "\n", "**Content creators**: Jorge A. Menendez, Carsen Stringer\n", "\n", "**Content reviewers**: Roozbeh Farhoodi, Madineh Sarvestani, Kshitij Dwivedi, Spiros Chavlis, Ella Batty, Michael Waskom\n", "\n", "**Production editors:** Spiros Chavlis" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial Objectives\n", "\n", "*Estimated timing of tutorial: 40 minutes*\n", "\n", "In this short tutorial, we'll go through an introduction to 2D convolutions and apply a convolutional network to an image to prepare for creating normative models in Tutorial 3.\n", "\n", "In this tutorial, we will\n", "* Understand the basics of 2D convolution\n", "* Build a convolutional layer using PyTorch\n", "* Visualize and analyze its outputs\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @markdown\n", "from IPython.display import IFrame\n", "from ipywidgets import widgets\n", "out = widgets.Output()\n", "with out:\n", " print(f\"If you want to download the slides: https://osf.io/download/s59jy/\")\n", " display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/s59jy/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install and import feedback gadget\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_cn\",\n", " \"user_key\": \"y1x3mpx5\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W1D5_T2\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "both", "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import os\n", "import numpy as np\n", "import torch\n", "from torch import nn\n", "from torch import optim\n", "from matplotlib import pyplot as plt\n", "import matplotlib as mpl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure settings\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Figure settings\n", "import logging\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting Functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Plotting Functions\n", "\n", "def show_stimulus(img, ax=None, show=False):\n", " \"\"\"Visualize a stimulus\"\"\"\n", " if ax is None:\n", " ax = plt.gca()\n", " ax.imshow(img+0.5, cmap=mpl.cm.binary)\n", " ax.set_xticks([])\n", " ax.set_yticks([])\n", " ax.spines['left'].set_visible(False)\n", " ax.spines['bottom'].set_visible(False)\n", " if show:\n", " plt.show()\n", "\n", "\n", "def plot_weights(weights, channels=[0]):\n", " \"\"\" plot convolutional channel weights\n", " Args:\n", " weights: weights of convolutional filters (conv_channels x K x K)\n", " channels: which conv channels to plot\n", " \"\"\"\n", " wmax = torch.abs(weights).max()\n", " fig, axs = plt.subplots(1, len(channels), figsize=(12, 2.5))\n", " for i, channel in enumerate(channels):\n", " im = axs[i].imshow(weights[channel, 0], vmin=-wmax, vmax=wmax, cmap='bwr')\n", " axs[i].set_title(f'channel {channel}')\n", "\n", " cb_ax = fig.add_axes([1, 0.1, 0.05, 0.8])\n", " plt.colorbar(im, ax=cb_ax)\n", " cb_ax.axis('off')\n", " plt.show()\n", "\n", "\n", "def plot_example_activations(stimuli, act, channels=[0]):\n", " \"\"\" plot activations act and corresponding stimulus\n", " Args:\n", " stimuli: stimulus input to convolutional layer (n x h x w) or (h x w)\n", " act: activations of convolutional layer (n_bins x conv_channels x n_bins)\n", " channels: which conv channels to plot\n", " \"\"\"\n", " if stimuli.ndim>2:\n", " n_stimuli = stimuli.shape[0]\n", " else:\n", " stimuli = stimuli.unsqueeze(0)\n", " n_stimuli = 1\n", "\n", " fig, axs = plt.subplots(n_stimuli, 1 + len(channels), figsize=(12, 12))\n", "\n", " # plot stimulus\n", " for i in range(n_stimuli):\n", " show_stimulus(stimuli[i].squeeze(), ax=axs[i, 0])\n", " axs[i, 0].set_title('stimulus')\n", "\n", " # plot example activations\n", " for k, (channel, ax) in enumerate(zip(channels, axs[i][1:])):\n", " im = ax.imshow(act[i, channel], vmin=-3, vmax=3, cmap='bwr')\n", " ax.set_xlabel('x-pos')\n", " ax.set_ylabel('y-pos')\n", " ax.set_title(f'channel {channel}')\n", "\n", " cb_ax = fig.add_axes([1.05, 0.8, 0.01, 0.1])\n", " plt.colorbar(im, cax=cb_ax)\n", " cb_ax.set_title('activation\\n strength')\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper Functions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Helper Functions\n", "\n", "def load_data_split(data_name):\n", " \"\"\"Load mouse V1 data from Stringer et al. (2019)\n", "\n", " Data from study reported in this preprint:\n", " https://www.biorxiv.org/content/10.1101/679324v2.abstract\n", "\n", " These data comprise time-averaged responses of ~20,000 neurons\n", " to ~4,000 stimulus gratings of different orientations, recorded\n", " through Calcium imaginge. The responses have been normalized by\n", " spontaneous levels of activity and then z-scored over stimuli, so\n", " expect negative numbers. The repsonses were split into train and\n", " test and then each set were averaged in bins of 6 degrees.\n", "\n", " This function returns the relevant data (neural responses and\n", " stimulus orientations) in a torch.Tensor of data type torch.float32\n", " in order to match the default data type for nn.Parameters in\n", " Google Colab.\n", "\n", " It will hold out some of the trials when averaging to allow us to have test\n", " tuning curves.\n", "\n", " Args:\n", " data_name (str): filename to load\n", "\n", " Returns:\n", " resp_train (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n", " each row contains the responses of each neuron to a given stimulus.\n", " As mentioned above, neural \"response\" is actually an average over\n", " responses to stimuli with similar angles falling within specified bins.\n", " resp_test (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n", " each row contains the responses of each neuron to a given stimulus.\n", " As mentioned above, neural \"response\" is actually an average over\n", " responses to stimuli with similar angles falling within specified bins\n", " stimuli: (torch.Tensor): n_stimuli x 1 column vector with orientation\n", " of each stimulus, in degrees. This is actually the mean orientation\n", " of all stimuli in each bin.\n", "\n", " \"\"\"\n", " with np.load(data_name) as dobj:\n", " data = dict(**dobj)\n", " resp_train = data['resp_train']\n", " resp_test = data['resp_test']\n", " stimuli = data['stimuli']\n", "\n", " # Return as torch.Tensor\n", " resp_train_tensor = torch.tensor(resp_train, dtype=torch.float32)\n", " resp_test_tensor = torch.tensor(resp_test, dtype=torch.float32)\n", " stimuli_tensor = torch.tensor(stimuli, dtype=torch.float32)\n", "\n", " return resp_train_tensor, resp_test_tensor, stimuli_tensor\n", "\n", "\n", "def filters(out_channels=6, K=7):\n", " \"\"\" make example filters, some center-surround and gabors\n", " Returns:\n", " filters: out_channels x K x K\n", " \"\"\"\n", " grid = np.linspace(-K/2, K/2, K).astype(np.float32)\n", " xx,yy = np.meshgrid(grid, grid, indexing='ij')\n", "\n", " # create center-surround filters\n", " sigma = 1.1\n", " gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*sigma**2))\n", " wide_gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*(sigma*2)**2))\n", " center_surround = gaussian - 0.5 * wide_gaussian\n", "\n", " # create gabor filters\n", " thetas = np.linspace(0, 180, out_channels-2+1)[:-1] * np.pi/180\n", " gabors = np.zeros((len(thetas), K, K), np.float32)\n", " lam = 10\n", " phi = np.pi/2\n", " gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*(sigma*0.4)**2))\n", " for i,theta in enumerate(thetas):\n", " x = xx*np.cos(theta) + yy*np.sin(theta)\n", " gabors[i] = gaussian * np.cos(2*np.pi*x/lam + phi)\n", "\n", " filters = np.concatenate((center_surround[np.newaxis,:,:],\n", " -1*center_surround[np.newaxis,:,:],\n", " gabors),\n", " axis=0)\n", " filters /= np.abs(filters).max(axis=(1,2))[:,np.newaxis,np.newaxis]\n", " filters -= filters.mean(axis=(1,2))[:,np.newaxis,np.newaxis]\n", " # convert to torch\n", " filters = torch.from_numpy(filters)\n", " # add channel axis\n", " filters = filters.unsqueeze(1)\n", "\n", " return filters\n", "\n", "\n", "def grating(angle, sf=1 / 28, res=0.1, patch=False):\n", " \"\"\"Generate oriented grating stimulus\n", "\n", " Args:\n", " angle (float): orientation of grating (angle from vertical), in degrees\n", " sf (float): controls spatial frequency of the grating\n", " res (float): resolution of image. Smaller values will make the image\n", " smaller in terms of pixels. res=1.0 corresponds to 640 x 480 pixels.\n", " patch (boolean): set to True to make the grating a localized\n", " patch on the left side of the image. If False, then the\n", " grating occupies the full image.\n", "\n", " Returns:\n", " torch.Tensor: (res * 480) x (res * 640) pixel oriented grating image\n", "\n", " \"\"\"\n", "\n", " angle = np.deg2rad(angle) # transform to radians\n", "\n", " wpix, hpix = 640, 480 # width and height of image in pixels for res=1.0\n", "\n", " xx, yy = np.meshgrid(sf * np.arange(0, wpix * res) / res, sf * np.arange(0, hpix * res) / res)\n", "\n", " if patch:\n", " gratings = np.cos(xx * np.cos(angle + .1) + yy * np.sin(angle + .1)) # phase shift to make it better fit within patch\n", " gratings[gratings < 0] = 0\n", " gratings[gratings > 0] = 1\n", " xcent = gratings.shape[1] * .75\n", " ycent = gratings.shape[0] / 2\n", " xxc, yyc = np.meshgrid(np.arange(0, gratings.shape[1]), np.arange(0, gratings.shape[0]))\n", " icirc = ((xxc - xcent) ** 2 + (yyc - ycent) ** 2) ** 0.5 < wpix / 3 / 2 * res\n", " gratings[~icirc] = 0.5\n", "\n", " else:\n", " gratings = np.cos(xx * np.cos(angle) + yy * np.sin(angle))\n", " gratings[gratings < 0] = 0\n", " gratings[gratings > 0] = 1\n", "\n", " gratings -= 0.5\n", "\n", " # Return torch tensor\n", " return torch.tensor(gratings, dtype=torch.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data retrieval and loading\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Data retrieval and loading\n", "\n", "import hashlib\n", "import requests\n", "\n", "fname = \"W3D4_stringer_oribinned6_split.npz\"\n", "url = \"https://osf.io/p3aeb/download\"\n", "expected_md5 = \"b3f7245c6221234a676b71a1f43c3bb5\"\n", "\n", "if not os.path.isfile(fname):\n", " try:\n", " r = requests.get(url)\n", " except requests.ConnectionError:\n", " print(\"!!! Failed to download data !!!\")\n", " else:\n", " if r.status_code != requests.codes.ok:\n", " print(\"!!! Failed to download data !!!\")\n", " elif hashlib.md5(r.content).hexdigest() != expected_md5:\n", " print(\"!!! Data download appears corrupted !!!\")\n", " else:\n", " with open(fname, \"wb\") as fid:\n", " fid.write(r.content)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: Introduction to 2D convolutions\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.1: What is a 2D convolution?" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "A 2D convolution is an integral of the product of a filter $f$ and an input image $I$ computed at various positions as the filter is slid across the input. The output of the convolution operation at position $(x,y)$ can be written as follows, where the filter $f$ is size $(K, K)$:\n", "\n", "\\begin{equation}\n", "C(x,y) = \\sum_{k_x=-K/2}^{K/2} \\sum_{k_y=-K/2}^{K/2} f(k_x,k_y) I(x+k_x,y+k_y)\n", "\\end{equation}\n", "\n", "This **convolutional filter** is often called a **kernel**.\n", "\n", "Here is an illustration of a 2D convolution from this [article](https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-convolution-neural-networks-e3f054dd5daa):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to view convolution gif\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to view convolution gif\n", "\n", "from IPython.display import Image\n", "Image(url='https://miro.medium.com/max/700/1*5BwZUqAqFFP5f3wKYQ6wJg.gif')" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.2: 2D convolutions in deep learning\n", "\n", "*Estimated timing to here from start of tutorial: 6 min*\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video 1: 2D Convolutions\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "remove-input" ] }, "outputs": [], "source": [ "# @title Video 1: 2D Convolutions\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'zgO9rHYbDxE'), ('Bilibili', 'BV1jw411d7Kg')]\n", "tab_contents = display_videos(video_ids, W=730, H=410)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_2D_Convolutions_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "This video covers convolutions and how to implement them in Pytorch.\n", "\n", "
\n", " Click here for text recap of video \n", "\n", "Recall Aude Oliva’s discussion of convolutions in the [intro](https://www.youtube.com/watch?v=IZvcy0Myb3M). Convolutional neural networks with several layers revolutionized the deep learning field, and in particular AlexNet, depicted here, was the first deep neural network to excel on the ImageNet classification task. The first layer in the network takes as input an image, runs convolutional filters on the image, rectifies the output, them downsamples the output (the pooling layer). The next layers repeat this process, and then at the end fully connected linear layers are attached which output a label for the image.\n", "\n", "The main advantages of convolutional layers over fully connected layers are the reduction in parameters through weight-sharing, which we will get into shortly, and also the fact that the units have local receptive fields. These local receptive fields allow the network to pool over units in spatial proximity and helps the network learn translation invariant representations.\n", "\n", "A convolution is the integral of the product of two functions, one of which is a stimulus and the other which is a filter. This integral is computed at all positions by sliding the filter weights across the stimulus. If you want to perform a convolution and get the same output as the input you need to pad the input by half the filter size on each side. This is called “same” padding. Another parameter of this convolution computation is the stride -- how often the convolution is computed along the stimulus dimension. In this case, we use a stride of 1, but we can increase the stride and in turn have fewer units.\n", "\n", "All the units of this filter are called a single output **channel**. A convolutional layer often consists of multiple output channels each with their own filter weights. We call the number of output convolutional channels $C_{out}$.\n", "\n", "We will implement this convolutional layer in pytorch. We will create a convolutional layer `ConvolutionalLayer` which takes as input a stimulus, which in our case are the gratings images. The convolutional layer is initialized with a few different parameters - first is the # of input channels $C_{in}$, which is 1 in our case. Next the number of convolutional channels which we’ll call $C_{out}$ which we can set to 6. Then the size of the filter $K$ which we set by default to 7. There’s also an optional `filters` input which we use to initialize the convolutional weights. We set them as the weights of the conv layer we just created, and set the bias terms for the conv layer to zero.\n", "\n", "We declare an `nn.conv2d` variable to be an attribute of the class `ConvolutionalLayer` called `conv`. For this convolutional layer we set the padding to half the filter size and the stride to 1 to get the same size output as the input.\n", "\n", "What are some example filters we might use? One is a center-surround filter. It is positive in the middle and negative in the surround. Another is a gabor filter, which has a positive section next to a negative section. Look at the responses of these filters given the image. Both of these filter types are inspired by neurons recorded in the brain.\n", "\n", "In fact, convolutional neural networks are inspired by the brain. In the retina there are a variety of cell types which we can think of as filters and each of these cells tile the entire visual space. The picture here shows the part of visual space each one of these cells responds to. In barrel cortex we have a similar situation where each whisker’s activation corresponds to a single cortical column of activity and the functions computed in each of these columns are similar.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Object recognition was essentially an unsolved problem in machine learning until the [advent](https://en.wikipedia.org/wiki/AlexNet) of techniques for effectively training *deep* convolutional neural networks. See Bonus Section 1 for more information on why we use CNNs to model the visual system.\n", "\n", "Convolutional neural networks consist of 2D convolutional layers, ReLU non-linearities, 2D pooling layers, and at the output, a fully connected layer. We will see an example network with all these components in tutorial 3.\n", "\n", "A 2D convolutional layer has multiple output channels. Each output **channel** is the result of a 2D convolutional filter applied to the input. In the gif below, the input is in blue, the filter is in gray, and the output is in green. The number of units in the output channel depends on the *stride* you set. In the gif below, the stride is 1 because the input image is sampled at each position, a stride of 2 would mean skipping over input positions. In most applications, especially with small filter sizes, a stride of 1 is used.\n", "\n", "(*Technical note*: if filter size *K* is odd and you set the *pad=K//2* and *stride=1* (as is shown below), you get a **channel** of units that is the same size as the input. See a more detailed explanation of strides and pads [here](https://theano-pymc.readthedocs.io/en/latest/tutorial/conv_arithmetic.html) if interested)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to view convolution gif\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to view convolution gif\n", "\n", "from IPython.display import Image\n", "Image(url='https://miro.medium.com/max/790/1*1okwhewf5KCtIPaFib4XaA.gif')" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 1.3: 2D convolutions in Pytorch\n", "\n", "*Estimated timing to here from start of tutorial: 18 min*\n", "\n", "In Tutorial 1, fully connected linear layers were used to decode stimuli from neural activity. Convolutional layers are different from their fully connected counterparts in two ways:\n", " * In a fully connected layer, each unit computes a weighted sum over all the input units. In a convolutional layer, on the other hand, each unit computes a weighted sum over only a small patch of the input, referred to as the unit's **receptive field**. When the input is an image, the receptive field can be thought of as a local patch of pixels.\n", " * In a fully connected layer, each unit uses its own independent set of weights to compute the weighted sum. In a convolutional layer, all the units (within the same channel) **share the same weights**. This set of shared weights is called the **convolutional filter or kernel**. The result of this computation is a convolution, where each unit has computed the same weighted sum over a different part of the input. This reduces the number of parameters in the network substantially.\n", "\n", "

\n", " \n", "

\n", "\n", "\n", "We will compute the difference in the number of weights for a fully connected layer versus a convolutional layer in the Think exercise below.\n", "\n", "First, let's visualize the stimuli in the dataset from tutorial 1. During the neural recordings from [Stringer _et al._, 2021](https://doi.org/10.1016/j.cell.2021.03.042), mice were presented oriented gratings:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to plot example stimuli\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to plot example stimuli\n", "\n", "orientations = np.linspace(-90, 90, 5)\n", "\n", "h_ = 3\n", "n_col = len(orientations)\n", "h, w = grating(0).shape # height and width of stimulus\n", "\n", "fig, axs = plt.subplots(1, n_col, figsize=(h_ * n_col, h_))\n", "for i, ori in enumerate(orientations):\n", " stimulus = grating(ori)\n", " axs[i].set_title(f'{ori: .0f}$^o$')\n", " show_stimulus(stimulus, axs[i])\n", "fig.suptitle(f'stimulus size: {h} x {w}')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now let's implement 2D convolutional operations. We will use multiple convolutional channels and implement this operation efficiently using pytorch. A *layer* of convolutional channels can be implemented with one line of code using the PyTorch class `nn.Conv2d()`, which requires the following arguments for initialization (see full documentation [here](https://pytorch.org/docs/master/generated/torch.nn.Conv2d.html)):\n", " * $C^{in}$: the number of input channels\n", " * $C^{out}$: the number of output channels (number of different convolutional filters)\n", " * $K$: the size of the $C^{out}$ different convolutional filters\n", "\n", "When you run the network, you can input a stimulus of arbitrary size $(H^{in}, W^{in})$, but it needs to be shaped as a 4D input $(N, C^{in}, H^{in}, W^{in})$, where $N$ is the number of images. In our case, $C^{in}=1$ because there is only one color channel (our images are grayscale, but often $C^{in}=3$ in image processing)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "class ConvolutionalLayer(nn.Module):\n", " \"\"\"Deep network with one convolutional layer\n", " Attributes: conv (nn.Conv2d): convolutional layer\n", " \"\"\"\n", " def __init__(self, c_in=1, c_out=6, K=7, filters=None):\n", " \"\"\"Initialize layer\n", "\n", " Args:\n", " c_in: number of input stimulus channels\n", " c_out: number of output convolutional channels\n", " K: size of each convolutional filter\n", " filters: (optional) initialize the convolutional weights\n", "\n", " \"\"\"\n", " super().__init__()\n", " self.conv = nn.Conv2d(c_in, c_out, kernel_size=K,\n", " padding=K//2, stride=1)\n", " if filters is not None:\n", " self.conv.weight = nn.Parameter(filters)\n", " self.conv.bias = nn.Parameter(torch.zeros((c_out,), dtype=torch.float32))\n", "\n", " def forward(self, s):\n", " \"\"\"Run stimulus through convolutional layer\n", "\n", " Args:\n", " s (torch.Tensor): n_stimuli x c_in x h x w tensor with stimuli\n", "\n", " Returns:\n", " (torch.Tensor): n_stimuli x c_out x h x w tensor with convolutional layer unit activations.\n", "\n", " \"\"\"\n", " a = self.conv(s) # output of convolutional layer\n", "\n", " return a" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "See that `ConvolutionalLayer` takes as input `filters`. We have predesigned some filters that you can use by calling the `filters` function below. These are similar to filters we think are implemented in biological circuits such as the retina and the visual cortex. Some of them are **center-surround** filters and some of them are **gabor** filters. Check out this [website](http://www.cns.nyu.edu/~david/courses/perception/lecturenotes/ganglion/ganglion.html) for more details on center-surround filters, and this [website](https://en.wikipedia.org/wiki/Gabor_filter) for more details on gabor filters, if you're interested.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Execute this cell to create and visualize filters\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @markdown Execute this cell to create and visualize filters\n", "\n", "example_filters = filters(out_channels=6, K=7)\n", "\n", "plt.figure(figsize=(8,4))\n", "plt.subplot(1,2,1)\n", "plt.imshow(example_filters[0,0], vmin=-1, vmax=1, cmap='bwr')\n", "plt.title('center-surround filter')\n", "plt.axis('off')\n", "plt.subplot(1,2,2)\n", "plt.imshow(example_filters[4,0], vmin=-1, vmax=1, cmap='bwr')\n", "plt.title('gabor filter')\n", "plt.axis('off')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Coding Exercise 1.3: 2D convolution in PyTorch\n", "\n", "We will now run the convolutional layer on our stimulus. We will use gratings stimuli made using the function `grating`, which returns a stimulus which is 48 x 64.\n", "\n", "Reminder, `nn.Conv2d` takes in a tensor of size $(N, C^{in}, H^{in}, W^{in}$) where $N$ is the number of stimuli, $C^{in}$ is the number of input channels, and $(H^{in}, W^{in})$ is the size of the stimulus. We will need to add these first two dimensions to our stimulus, then input it to the convolutional layer.\n", "\n", "We will plot the outputs of the convolution. `convout` is a tensor of size $(N, C^{out}, H^{in}, W^{in})$ where $N$ is the number of examples and $C^{out}$ are the number of convolutional channels. It is the same size as the input because we used a stride of 1 and padding that is half the kernel size.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "# Stimulus parameters\n", "in_channels = 1 # how many input channels in our images\n", "h = 48 # height of images\n", "w = 64 # width of images\n", "\n", "# Convolution layer parameters\n", "K = 7 # filter size\n", "out_channels = 6 # how many convolutional channels to have in our layer\n", "example_filters = filters(out_channels, K) # create filters to use\n", "\n", "convout = np.zeros(0) # assign convolutional activations to convout\n", "\n", "################################################################################\n", "## TODO for students: create convolutional layer in pytorch\n", "# Complete and uncomment\n", "raise NotImplementedError(\"Student exercise: create convolutional layer\")\n", "################################################################################\n", "\n", "# Initialize conv layer and add weights from function filters\n", "# you need to specify :\n", "# * the number of input channels c_in\n", "# * the number of output channels c_out\n", "# * the filter size K\n", "convLayer = ConvolutionalLayer(..., filters=example_filters)\n", "\n", "# Create stimuli (H_in, W_in)\n", "orientations = [-90, -45, 0, 45, 90]\n", "stimuli = torch.zeros((len(orientations), in_channels, h, w), dtype=torch.float32)\n", "for i,ori in enumerate(orientations):\n", " stimuli[i, 0] = grating(ori)\n", "\n", "convout = convLayer(...)\n", "convout = convout.detach() # detach gradients\n", "\n", "plot_example_activations(stimuli, convout, channels=np.arange(0, out_channels))\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "# Stimulus parameters\n", "in_channels = 1 # how many input channels in our images\n", "h = 48 # height of images\n", "w = 64 # width of images\n", "\n", "# Convolution layer parameters\n", "K = 7 # filter size\n", "out_channels = 6 # how many convolutional channels to have in our layer\n", "example_filters = filters(out_channels, K) # create filters to use\n", "\n", "convout = np.zeros(0) # assign convolutional activations to convout\n", "\n", "# Initialize conv layer and add weights from function filters\n", "# you need to specify :\n", "# * the number of input channels c_in\n", "# * the number of output channels c_out\n", "# * the filter size K\n", "convLayer = ConvolutionalLayer(c_in=in_channels, c_out=out_channels, K=K, filters=example_filters)\n", "\n", "# Create stimuli (H_in, W_in)\n", "orientations = [-90, -45, 0, 45, 90]\n", "stimuli = torch.zeros((len(orientations), in_channels, h, w), dtype=torch.float32)\n", "for i,ori in enumerate(orientations):\n", " stimuli[i, 0] = grating(ori)\n", "\n", "convout = convLayer(stimuli)\n", "convout = convout.detach() # detach gradients\n", "\n", "with plt.xkcd():\n", " plot_example_activations(stimuli, convout, channels=np.arange(0, out_channels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_2D_Convolution_in_pytorch_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Think! 1.3: Output and weight shapes in a convolutional layer\n", "\n", "Let's think about the shape of the weights and outputs of `convLayer`:\n", " - How many convolutional activations are there in each channel, and why is it this size?\n", " - How many weights does `convLayer` have?\n", " - How many weights would it have if it were a fully connected layer?\n", "\n", "Additionally, let's think about why the activations look the way they do. It seems like for all channels the activations are only non-zero for edges of the gratings (where the grating goes from white-to-black and from black-to-white).\n", " - Channel 0 and 1 seem to respond to every edge regardless of the orientation, but their signs are different. What type of filter might produce these types of responses?\n", " - Channels 2-5 seem to respond differently depending on the orientation of the stimulus. What type of filter might produce these types of responses?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove explanation\n", "\n", "\"\"\"\n", "1. There are H * W activations in each channel because the stride in the\n", "convolutional layer is set to 1, and the padding is set to K//2.\n", "\n", "2. The convLayer has K * K * C_out weights and C_out bias terms.\n", "\n", "3. A fully connected layer would have (H * W) * C_out weights and C_out bias terms.\n", "\n", "4. A center-surround filter will respond to a change in luminance (black-to-white\n", "or white-to-black) regardless of orientation.\n", "\n", "5. A gabor filter of different orientations will respond this way. See the\n", "exercise below for more explanation.\n", "\"\"\";" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Output_and_weight_shapes_conv_layer_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Please see Bonus Section 2 to visualize the convolutional filter weights. See the Bonus Tutorial to use CNNs as encoding models of neurons (by fitting directly to neural responses)." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "*Estimated timing of tutorial: 40 minutes*\n", "\n", "In this notebook, we built a 2D convolutional layer which is meant to represent the responses of neurons in the mouse visual cortex, or the responses of neurons\n", "which are inputs to the mouse visual cortex.\n", "\n", "In Tutorial 3, we will add to this 2D convolutional layer a fully-connected layer and train this model to predict whether an orientation is left or right. We will see if the convolutional filters it learns are similar to mouse visual cortex. See Section 3 in the Bonus Tutorial to fit convolutional neural networks directly to neural activity." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Bonus" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Bonus Section 1: Why CNN's?\n", "\n", "CNN models are particularly [well-suited](https://www.nature.com/articles/nn.4244) to modeling the visual system for a number of reasons:\n", "\n", "1. **Distributed computation**: like any other neural network, CNN's use distributed representations to compute -- much like the brain seems to do. Such models, therefore, provide us with a vocabulary with which to talk and think about such distributed representations. Because we know the exact function the model is built to perform (e.g., orientation discrimination), we can analyze its internal representations with respect to this function and begin to interpret why the representations look the way they do. Most importantly, we can then use these insights to analyze the structure of neural representations observed in recorded population activity. We can qualitatively and quantitatively compare the representations we see in the model and in a given population of real neurons to hopefully tease out the computations it performs.\n", "\n", "2. **Hierarchical architecture**: like in any other deep learning architecture, each layer of a deep CNN comprises a non-linear transformation of the previous layer. Thus, there is a natural hierarchy whereby layers closer to the network output represent increasingly more abstract information about the input image. For example, in a network trained to do object recognition, the early layers might represent information about edges in the image, whereas later layers closer to the output might represent various object categories. This resembles the [hierarchical structure of the visual system](https://pubmed.ncbi.nlm.nih.gov/1822724/), where [lower-level areas](https://www.jneurosci.org/content/25/46/10577.short) (e.g., retina, V1) represent visual features of the sensory input and [higher-level areas](https://www.sciencedirect.com/science/article/pii/S089662731200092X) (e.g., V4, IT) represent properties of objects in the visual scene. We can then naturally use a single CNN to model multiple visual areas, using early CNN layers to model lower-level visual areas and late CNN layers to model higher-level visual areas.\n", "\n", " Relative to fully connected networks, CNN's, in fact, have further hierarchical structure built-in through the max pooling layers. Recall that each output of a convolution + pooling block is the result of processing a local patch of the inputs to that block. If we stack such blocks in a sequence, then the outputs of each block will be sensitive to increasingly larger regions of the initial raw input to the network: an output from the first block is sensitive to a single patch of these inputs, corresponding to its \"receptive field\"; an output from the second block is sensitive to a patch of outputs from the first block, which together are sensitive to a larger patch of raw inputs comprising the union of their receptive fields. Receptive fields thus get larger for deeper layers (see [here](http://colah.github.io/posts/2014-07-Conv-Nets-Modular/) for a nice visual depiction of this). This resembles primate visual systems, where neurons in higher-level visual areas respond to stimuli in wider regions of the visual field than neurons in lower-level visual areas.\n", "\n", "3. **Convolutional layers**: through the weight sharing constraint, the outputs of each channel of a convolutional layer process different parts of the input image in exactly the same way. This architectural constraint effectively builds into the network the assumption that objects in the world typically look the same regardless of where they are in space. This is useful for modeling the visual system for two (largely separate) reasons:\n", " * Firstly, this assumption is generally valid in mammalian visual systems, since mammals tend to view the same object from many perspectives. Two neurons at a similar hierarchy in the visual system with different receptive fields could thus end up receiving statistically similar synaptic inputs, so that the synaptic weights developed over time may end up being similar as well.\n", " * Secondly, this architecture significantly improves object recognition ability. Object recognition was essentially an unsolved problem in machine learning until the [advent](https://en.wikipedia.org/wiki/AlexNet) of techniques for effectively training *deep* convolutional neural networks. Fully connected networks on their own can't achieve object recognition abilities anywhere close to human levels, making them bad models of human object recognition. Indeed, it is generally the case that [the better a neural network model is at object recognition, the closer the match between its representations and those observed in the brain](https://www.pnas.org/content/111/23/8619.short). That said, it is worth noting that our much simpler orientation discrimination task (in Tutorial 3) can be solved by relatively simple networks." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Bonus Section 2: Understanding activations from weight" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Bonus Coding Exercise 2: Visualizing convolutional filter weights\n", "\n", "Why do the activations look the way they do? Let's look at the weights (`convLayer.conv.weight.detach()`) of the convolutional filters and try to interpret them." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {} }, "source": [ "```python\n", "################################################################################\n", "## TODO for students: get weights\n", "# Complete and uncomment\n", "raise NotImplementedError(\"Student exercise: get weights\")\n", "################################################################################\n", "\n", "# get weights of conv layer in convLayer\n", "weights = ...\n", "print(weights.shape) # can you identify what each of the dimensions are?\n", "\n", "plot_weights(weights, channels=np.arange(0, out_channels))\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "\n", "# get weights of conv layer in convLayer\n", "weights = convLayer.conv.weight.detach()\n", "print(weights.shape) # can you identify what each of the dimensions are?\n", "\n", "with plt.xkcd():\n", " plot_weights(weights, channels=np.arange(0, out_channels))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Visualizing_convolutional_filter_weights_Bonus_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "In the function `filters` we pre-made center-surround filters and Gabor filters of various orientations. [Gabor filters](https://en.wikipedia.org/wiki/Gabor_filter) have a positive region next to a negative region and the orientation of these regions of the filter determine the orientation of edges to which they respond.\n", "\n", "In the visual cortex, Hubel and Wiesel discovered simple cells, which would correspond to a unit in the channels 2-5 above with Gabor filters. There are also some neurons with activity that resemble center-surround filters, which would correspond to the first two convolutional channels above.\n", "\n", "There were additional cells discovered by Hubel and Wiesel - complex cells - that respond to an oriented grating regardless of where the bars are exactly (note that the responses we see are specific to where the bars are). These cells therefore have some level of translation invariance. This is something that convolutional neural networks try to replicate -- e.g., a grating is still oriented horizontally even if it moves slightly, and a cat is still a cat even if it's in a different position in the image." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Bonus Think! 2: Complex cell\n", "\n", "- How might you create a complex cell and have responses that are translation invariant?\n", "- How might you create a cell that responds to multiple orientations?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove explanation\n", "\n", "\"\"\"\n", "1. A complex cell combines two rectified Gabor filters of the same orientation but with\n", "different phases. This is something a neural network can reproduce with a\n", "RELU layer and combinations of convolutional channels. Additionally neural networks\n", "employ a strategy called pooling layers, which combine units in a small region and\n", "take the maximum value of their activations as the output. We will see these\n", "layers in the next tutorial, where we will try to decode orientation directly from\n", "images.\n", "\n", "2. A cell that responds to multiple orientations would have to be a sum of multiple\n", "convolutional channels. In the next bonus section, we will hook up this convolutional\n", "layer to a fully connected layer.\n", "\"\"\";" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Submit your feedback\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {}, "tags": [ "hide-input" ] }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Complex_cell_Bonus_Discussion\")" ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W1D5_Tutorial2", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 0 }