{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial 2: Convolutional Neural Networks\n",
"\n",
"**Week 1, Day 5: Deep Learning**\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"**Content creators**: Jorge A. Menendez, Carsen Stringer\n",
"\n",
"**Content reviewers**: Roozbeh Farhoodi, Madineh Sarvestani, Kshitij Dwivedi, Spiros Chavlis, Ella Batty, Michael Waskom\n",
"\n",
"**Production editors:** Spiros Chavlis"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Tutorial Objectives\n",
"\n",
"*Estimated timing of tutorial: 40 minutes*\n",
"\n",
"In this short tutorial, we'll go through an introduction to 2D convolutions and apply a convolutional network to an image to prepare for creating normative models in Tutorial 3.\n",
"\n",
"In this tutorial, we will\n",
"* Understand the basics of 2D convolution\n",
"* Build a convolutional layer using PyTorch\n",
"* Visualize and analyze its outputs\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from IPython.display import IFrame\n",
"from ipywidgets import widgets\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: https://osf.io/download/s59jy/\")\n",
" display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/s59jy/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and import feedback gadget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install and import feedback gadget\n",
"\n",
"!pip3 install vibecheck datatops --quiet\n",
"\n",
"from vibecheck import DatatopsContentReviewContainer\n",
"def content_review(notebook_section: str):\n",
" return DatatopsContentReviewContainer(\n",
" \"\", # No text prompt\n",
" notebook_section,\n",
" {\n",
" \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
" \"name\": \"neuromatch_cn\",\n",
" \"user_key\": \"y1x3mpx5\",\n",
" },\n",
" ).render()\n",
"\n",
"\n",
"feedback_prefix = \"W1D5_T2\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"execution": {}
},
"outputs": [],
"source": [
"# Imports\n",
"import os\n",
"import numpy as np\n",
"import torch\n",
"from torch import nn\n",
"from torch import optim\n",
"from matplotlib import pyplot as plt\n",
"import matplotlib as mpl"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure settings\n",
"import logging\n",
"logging.getLogger('matplotlib.font_manager').disabled = True\n",
"\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina'\n",
"plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting Functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Plotting Functions\n",
"\n",
"def show_stimulus(img, ax=None, show=False):\n",
" \"\"\"Visualize a stimulus\"\"\"\n",
" if ax is None:\n",
" ax = plt.gca()\n",
" ax.imshow(img+0.5, cmap=mpl.cm.binary)\n",
" ax.set_xticks([])\n",
" ax.set_yticks([])\n",
" ax.spines['left'].set_visible(False)\n",
" ax.spines['bottom'].set_visible(False)\n",
" if show:\n",
" plt.show()\n",
"\n",
"\n",
"def plot_weights(weights, channels=[0]):\n",
" \"\"\" plot convolutional channel weights\n",
" Args:\n",
" weights: weights of convolutional filters (conv_channels x K x K)\n",
" channels: which conv channels to plot\n",
" \"\"\"\n",
" wmax = torch.abs(weights).max()\n",
" fig, axs = plt.subplots(1, len(channels), figsize=(12, 2.5))\n",
" for i, channel in enumerate(channels):\n",
" im = axs[i].imshow(weights[channel, 0], vmin=-wmax, vmax=wmax, cmap='bwr')\n",
" axs[i].set_title(f'channel {channel}')\n",
"\n",
" cb_ax = fig.add_axes([1, 0.1, 0.05, 0.8])\n",
" plt.colorbar(im, ax=cb_ax)\n",
" cb_ax.axis('off')\n",
" plt.show()\n",
"\n",
"\n",
"def plot_example_activations(stimuli, act, channels=[0]):\n",
" \"\"\" plot activations act and corresponding stimulus\n",
" Args:\n",
" stimuli: stimulus input to convolutional layer (n x h x w) or (h x w)\n",
" act: activations of convolutional layer (n_bins x conv_channels x n_bins)\n",
" channels: which conv channels to plot\n",
" \"\"\"\n",
" if stimuli.ndim>2:\n",
" n_stimuli = stimuli.shape[0]\n",
" else:\n",
" stimuli = stimuli.unsqueeze(0)\n",
" n_stimuli = 1\n",
"\n",
" fig, axs = plt.subplots(n_stimuli, 1 + len(channels), figsize=(12, 12))\n",
"\n",
" # plot stimulus\n",
" for i in range(n_stimuli):\n",
" show_stimulus(stimuli[i].squeeze(), ax=axs[i, 0])\n",
" axs[i, 0].set_title('stimulus')\n",
"\n",
" # plot example activations\n",
" for k, (channel, ax) in enumerate(zip(channels, axs[i][1:])):\n",
" im = ax.imshow(act[i, channel], vmin=-3, vmax=3, cmap='bwr')\n",
" ax.set_xlabel('x-pos')\n",
" ax.set_ylabel('y-pos')\n",
" ax.set_title(f'channel {channel}')\n",
"\n",
" cb_ax = fig.add_axes([1.05, 0.8, 0.01, 0.1])\n",
" plt.colorbar(im, cax=cb_ax)\n",
" cb_ax.set_title('activation\\n strength')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helper Functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Helper Functions\n",
"\n",
"def load_data_split(data_name):\n",
" \"\"\"Load mouse V1 data from Stringer et al. (2019)\n",
"\n",
" Data from study reported in this preprint:\n",
" https://www.biorxiv.org/content/10.1101/679324v2.abstract\n",
"\n",
" These data comprise time-averaged responses of ~20,000 neurons\n",
" to ~4,000 stimulus gratings of different orientations, recorded\n",
" through Calcium imaginge. The responses have been normalized by\n",
" spontaneous levels of activity and then z-scored over stimuli, so\n",
" expect negative numbers. The repsonses were split into train and\n",
" test and then each set were averaged in bins of 6 degrees.\n",
"\n",
" This function returns the relevant data (neural responses and\n",
" stimulus orientations) in a torch.Tensor of data type torch.float32\n",
" in order to match the default data type for nn.Parameters in\n",
" Google Colab.\n",
"\n",
" It will hold out some of the trials when averaging to allow us to have test\n",
" tuning curves.\n",
"\n",
" Args:\n",
" data_name (str): filename to load\n",
"\n",
" Returns:\n",
" resp_train (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n",
" each row contains the responses of each neuron to a given stimulus.\n",
" As mentioned above, neural \"response\" is actually an average over\n",
" responses to stimuli with similar angles falling within specified bins.\n",
" resp_test (torch.Tensor): n_stimuli x n_neurons matrix of neural responses,\n",
" each row contains the responses of each neuron to a given stimulus.\n",
" As mentioned above, neural \"response\" is actually an average over\n",
" responses to stimuli with similar angles falling within specified bins\n",
" stimuli: (torch.Tensor): n_stimuli x 1 column vector with orientation\n",
" of each stimulus, in degrees. This is actually the mean orientation\n",
" of all stimuli in each bin.\n",
"\n",
" \"\"\"\n",
" with np.load(data_name) as dobj:\n",
" data = dict(**dobj)\n",
" resp_train = data['resp_train']\n",
" resp_test = data['resp_test']\n",
" stimuli = data['stimuli']\n",
"\n",
" # Return as torch.Tensor\n",
" resp_train_tensor = torch.tensor(resp_train, dtype=torch.float32)\n",
" resp_test_tensor = torch.tensor(resp_test, dtype=torch.float32)\n",
" stimuli_tensor = torch.tensor(stimuli, dtype=torch.float32)\n",
"\n",
" return resp_train_tensor, resp_test_tensor, stimuli_tensor\n",
"\n",
"\n",
"def filters(out_channels=6, K=7):\n",
" \"\"\" make example filters, some center-surround and gabors\n",
" Returns:\n",
" filters: out_channels x K x K\n",
" \"\"\"\n",
" grid = np.linspace(-K/2, K/2, K).astype(np.float32)\n",
" xx,yy = np.meshgrid(grid, grid, indexing='ij')\n",
"\n",
" # create center-surround filters\n",
" sigma = 1.1\n",
" gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*sigma**2))\n",
" wide_gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*(sigma*2)**2))\n",
" center_surround = gaussian - 0.5 * wide_gaussian\n",
"\n",
" # create gabor filters\n",
" thetas = np.linspace(0, 180, out_channels-2+1)[:-1] * np.pi/180\n",
" gabors = np.zeros((len(thetas), K, K), np.float32)\n",
" lam = 10\n",
" phi = np.pi/2\n",
" gaussian = np.exp(-(xx**2 + yy**2)**0.5/(2*(sigma*0.4)**2))\n",
" for i,theta in enumerate(thetas):\n",
" x = xx*np.cos(theta) + yy*np.sin(theta)\n",
" gabors[i] = gaussian * np.cos(2*np.pi*x/lam + phi)\n",
"\n",
" filters = np.concatenate((center_surround[np.newaxis,:,:],\n",
" -1*center_surround[np.newaxis,:,:],\n",
" gabors),\n",
" axis=0)\n",
" filters /= np.abs(filters).max(axis=(1,2))[:,np.newaxis,np.newaxis]\n",
" filters -= filters.mean(axis=(1,2))[:,np.newaxis,np.newaxis]\n",
" # convert to torch\n",
" filters = torch.from_numpy(filters)\n",
" # add channel axis\n",
" filters = filters.unsqueeze(1)\n",
"\n",
" return filters\n",
"\n",
"\n",
"def grating(angle, sf=1 / 28, res=0.1, patch=False):\n",
" \"\"\"Generate oriented grating stimulus\n",
"\n",
" Args:\n",
" angle (float): orientation of grating (angle from vertical), in degrees\n",
" sf (float): controls spatial frequency of the grating\n",
" res (float): resolution of image. Smaller values will make the image\n",
" smaller in terms of pixels. res=1.0 corresponds to 640 x 480 pixels.\n",
" patch (boolean): set to True to make the grating a localized\n",
" patch on the left side of the image. If False, then the\n",
" grating occupies the full image.\n",
"\n",
" Returns:\n",
" torch.Tensor: (res * 480) x (res * 640) pixel oriented grating image\n",
"\n",
" \"\"\"\n",
"\n",
" angle = np.deg2rad(angle) # transform to radians\n",
"\n",
" wpix, hpix = 640, 480 # width and height of image in pixels for res=1.0\n",
"\n",
" xx, yy = np.meshgrid(sf * np.arange(0, wpix * res) / res, sf * np.arange(0, hpix * res) / res)\n",
"\n",
" if patch:\n",
" gratings = np.cos(xx * np.cos(angle + .1) + yy * np.sin(angle + .1)) # phase shift to make it better fit within patch\n",
" gratings[gratings < 0] = 0\n",
" gratings[gratings > 0] = 1\n",
" xcent = gratings.shape[1] * .75\n",
" ycent = gratings.shape[0] / 2\n",
" xxc, yyc = np.meshgrid(np.arange(0, gratings.shape[1]), np.arange(0, gratings.shape[0]))\n",
" icirc = ((xxc - xcent) ** 2 + (yyc - ycent) ** 2) ** 0.5 < wpix / 3 / 2 * res\n",
" gratings[~icirc] = 0.5\n",
"\n",
" else:\n",
" gratings = np.cos(xx * np.cos(angle) + yy * np.sin(angle))\n",
" gratings[gratings < 0] = 0\n",
" gratings[gratings > 0] = 1\n",
"\n",
" gratings -= 0.5\n",
"\n",
" # Return torch tensor\n",
" return torch.tensor(gratings, dtype=torch.float32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data retrieval and loading\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Data retrieval and loading\n",
"\n",
"import hashlib\n",
"import requests\n",
"\n",
"fname = \"W3D4_stringer_oribinned6_split.npz\"\n",
"url = \"https://osf.io/p3aeb/download\"\n",
"expected_md5 = \"b3f7245c6221234a676b71a1f43c3bb5\"\n",
"\n",
"if not os.path.isfile(fname):\n",
" try:\n",
" r = requests.get(url)\n",
" except requests.ConnectionError:\n",
" print(\"!!! Failed to download data !!!\")\n",
" else:\n",
" if r.status_code != requests.codes.ok:\n",
" print(\"!!! Failed to download data !!!\")\n",
" elif hashlib.md5(r.content).hexdigest() != expected_md5:\n",
" print(\"!!! Data download appears corrupted !!!\")\n",
" else:\n",
" with open(fname, \"wb\") as fid:\n",
" fid.write(r.content)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 1: Introduction to 2D convolutions\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.1: What is a 2D convolution?"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"A 2D convolution is an integral of the product of a filter $f$ and an input image $I$ computed at various positions as the filter is slid across the input. The output of the convolution operation at position $(x,y)$ can be written as follows, where the filter $f$ is size $(K, K)$:\n",
"\n",
"\\begin{equation}\n",
"C(x,y) = \\sum_{k_x=-K/2}^{K/2} \\sum_{k_y=-K/2}^{K/2} f(k_x,k_y) I(x+k_x,y+k_y)\n",
"\\end{equation}\n",
"\n",
"This **convolutional filter** is often called a **kernel**.\n",
"\n",
"Here is an illustration of a 2D convolution from this [article](https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-convolution-neural-networks-e3f054dd5daa):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Execute this cell to view convolution gif\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @markdown Execute this cell to view convolution gif\n",
"\n",
"from IPython.display import Image\n",
"Image(url='https://miro.medium.com/max/700/1*5BwZUqAqFFP5f3wKYQ6wJg.gif')"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.2: 2D convolutions in deep learning\n",
"\n",
"*Estimated timing to here from start of tutorial: 6 min*\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Video 1: 2D Convolutions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 1: 2D Convolutions\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'zgO9rHYbDxE'), ('Bilibili', 'BV1jw411d7Kg')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_2D_Convolutions_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"This video covers convolutions and how to implement them in Pytorch.\n",
"\n",
"
Click here for text recap of video
\n",
"\n",
"Recall Aude Oliva’s discussion of convolutions in the [intro](https://www.youtube.com/watch?v=IZvcy0Myb3M). Convolutional neural networks with several layers revolutionized the deep learning field, and in particular AlexNet, depicted here, was the first deep neural network to excel on the ImageNet classification task. The first layer in the network takes as input an image, runs convolutional filters on the image, rectifies the output, them downsamples the output (the pooling layer). The next layers repeat this process, and then at the end fully connected linear layers are attached which output a label for the image.\n",
"\n",
"The main advantages of convolutional layers over fully connected layers are the reduction in parameters through weight-sharing, which we will get into shortly, and also the fact that the units have local receptive fields. These local receptive fields allow the network to pool over units in spatial proximity and helps the network learn translation invariant representations.\n",
"\n",
"A convolution is the integral of the product of two functions, one of which is a stimulus and the other which is a filter. This integral is computed at all positions by sliding the filter weights across the stimulus. If you want to perform a convolution and get the same output as the input you need to pad the input by half the filter size on each side. This is called “same” padding. Another parameter of this convolution computation is the stride -- how often the convolution is computed along the stimulus dimension. In this case, we use a stride of 1, but we can increase the stride and in turn have fewer units.\n",
"\n",
"All the units of this filter are called a single output **channel**. A convolutional layer often consists of multiple output channels each with their own filter weights. We call the number of output convolutional channels $C_{out}$.\n",
"\n",
"We will implement this convolutional layer in pytorch. We will create a convolutional layer `ConvolutionalLayer` which takes as input a stimulus, which in our case are the gratings images. The convolutional layer is initialized with a few different parameters - first is the # of input channels $C_{in}$, which is 1 in our case. Next the number of convolutional channels which we’ll call $C_{out}$ which we can set to 6. Then the size of the filter $K$ which we set by default to 7. There’s also an optional `filters` input which we use to initialize the convolutional weights. We set them as the weights of the conv layer we just created, and set the bias terms for the conv layer to zero.\n",
"\n",
"We declare an `nn.conv2d` variable to be an attribute of the class `ConvolutionalLayer` called `conv`. For this convolutional layer we set the padding to half the filter size and the stride to 1 to get the same size output as the input.\n",
"\n",
"What are some example filters we might use? One is a center-surround filter. It is positive in the middle and negative in the surround. Another is a gabor filter, which has a positive section next to a negative section. Look at the responses of these filters given the image. Both of these filter types are inspired by neurons recorded in the brain.\n",
"\n",
"In fact, convolutional neural networks are inspired by the brain. In the retina there are a variety of cell types which we can think of as filters and each of these cells tile the entire visual space. The picture here shows the part of visual space each one of these cells responds to. In barrel cortex we have a similar situation where each whisker’s activation corresponds to a single cortical column of activity and the functions computed in each of these columns are similar.\n",
"\n",
"
\n",
" \n",
"