Publish Date: 2021-11-26

Word Count: 1.9k

Read Count:

D2L 05 深度学习计算

在本章中，我们开始深入探索深度学习计算的关键组件，即模型构建、参数访问与初始化、设计自定义层和块、将模型读写到磁盘，以及利用GPU实现显著的加速。这些知识将使你从基础用户变为高级用户。虽然本章不介绍任何新的模型或数据集，但后面的高级模型章节在很大程度上依赖于本章的知识

层和块

为了实现这些复杂的网络，我们引入了神经网络块的概念。块可以描述单个层、由多个层组成的组件或整个模型本身

自定义块

简要总结一下每个块必须提供的基本功能：

将输入数据作为其正向传播函数的参数，并通过正向传播函数来生成输出
计算其输出关于输入的梯度，可通过其反向传播函数进行访问。通常这是自动发生的
存储和访问正向传播计算所需的参数，并能根据需要初始化模型参数

下面写一个简单的线性模块

class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

除了自定义带参数的层外，还可以自定义不带参数的模块，这样可以更轻松地加入到其他模块当中，例如可以将损失函数写为一个模块加入到 nn.Sequential 中

顺序块

nn.Sequential 提供了两个功能：

将块逐个追加到列表中，在之后能够通过 index 访问，一般的模块不能通过 index 访问其子模块，而是通过属性的方式访问子模块
在正向传播时，将输入按顺序传递给块组成链条

教材中的简易实现

class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for block in args:
            # 这里，`block`是`Module`子类的一个实例。我们把它保存在'Module'类的成员变量
            # `_modules` 中。`block`的类型是OrderedDict。
            self._modules[block] = block

    def forward(self, X):
        # OrderedDict保证了按照成员添加的顺序遍历它们
        for block in self._modules.values():
            X = block(X)
        return X

在最后的练习中有提问：如果把 self._modules 换为普通的 python 列表有什么问题？回答：使用 self._modules 将会把这些子模块注册到模块中，这是 nn.Module 的重要属性之一。当在其他模块使用该 MySequential 时，这些子模块同样也会被识别，并递归地注册到该模块中。而普通的 python list 类无法做到

嵌套块

可以直接通过 add_module(key, module) 方法加入子模块，所有添加子模块的本质都是向 self._module OrderedDict 中注册新的 name: module pair

def block1():
    return nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                         nn.Linear(8, 4), nn.ReLU())

def block2():
    net = nn.Sequential()
    for i in range(4):
        # 在这里嵌套
        net.add_module(f'block {i}', block1())
    return net

rgnet = nn.Sequential(block2(), nn.Linear(4, 1))

在正向传播函数中执行代码

可以在 forward() 函数中任意执行你希望的过程。在反向传播的过程中只会更新模块中的参数，如下面的 self.rand_weight 就不会被更新

class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # 不计算梯度的随机权重参数。因此其在训练期间保持不变。
        self.rand_weight = torch.rand((20, 20), requires_grad=False)
        self.linear = nn.Linear(20, 20)

    def forward(self, X):
        X = self.linear(X)
        # 使用创建的常量参数以及`relu`和`dot`函数。
        X = F.relu(torch.mm(X, self.rand_weight) + 1)
        # 复用全连接层。这相当于两个全连接层共享参数。
        X = self.linear(X)
        return X.sum()

参数管理

在本节中，我们将介绍以下内容：

访问参数，用于调试、诊断和可视化
参数初始化
在不同模型组件间共享参数

参数访问

通过 state_dict() 查看网络层参数的 OrderedDict

import torch
from torch import nn

net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1))
X = torch.rand(size=(2, 4))
net(X)

print(net[2].state_dict())
# OrderedDict([('weight', tensor([[-0.1280,  0.3443,  0.0572, -0.1925,  0.3084, -0.2585,  0.1038,  0.0608]])), ('bias', tensor([0.1917]))])

通过类属性访问参数，并进一步访问该参数的值

print(net[2].bias)
print(net[2].bias.data)
# tensor([-0.1373], requires_grad=True)
# tensor([-0.1373])

通过 named_parameters() 也可以访问所有参数的名字和参数，返回的是一个生成器，通过循环可遍历所有参数，以 (name, param) 元组形式为返回值

参数初始化

默认情况下，PyTorch会根据一个范围均匀地初始化权重和偏置矩阵，这个范围是根据输入和输出维度计算出的。PyTorch的 nn.init 模块提供了多种预置初始化方法，其本质是对 module.weight 进行赋值

def xavier(m):
    if type(m) == nn.Linear:
        nn.init.xavier_uniform_(m.weight)
def init_42(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 42)

net[0].apply(xavier)
net[2].apply(init_42)
print(net[0].weight.data[0])
print(net[2].weight.data)

参数绑定

有时我们希望在多个层间共享参数。让我们看看如何优雅地做这件事

# 我们需要给共享层一个名称，以便可以引用它的参数。
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.Linear(8, 1))
net(X)

在反向传播时，这两个隐藏层的梯度将会加在一起

读写文件

现在是时候学习如何加载和存储权重向量和整个模型

加载和保存张量

可以直接调用 load 和 save 函数，以读写各种数据类型，包括 Pytorch 模型（模型包含 state_dict 即参数）

import torch
from torch import nn
from torch.nn import functional as F

x = torch.arange(4)
torch.save(x, 'x-file')

x2 = torch.load('x-file')

# 可以存储各种的数据类型
y = torch.zeros(4)
torch.save([x, y],'x-files')

x2, y2 = torch.load('x-files')

加载和保存模型参数

仅仅保存参数，并将保存的参数加载到新建的模型中

# 假设定义了一个模型
class MLP(nn.Module):
    def __init__(self):
        pass

    def forward(self, x):
        pass

net = MLP()

torch.save(net.state_dict(), 'mlp.params')

# 新建模型并加载参数
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))

GPU

我们可以指定用于存储和计算的设备，如CPU和GPU。默认情况下，张量是在内存中创建的，然后使用CPU计算它，pytorch 中与 GPU 相关的包为 torch.cuda

import torch
from torch import nn

# 获得 GPU 设备，返回值为字符串 'cpu' or 'cuda' or f'cuda:{idx}'
torch.device('cpu') 
torch.device('cuda')	# equals torch.device('cuda:0')
torch.device('cuda:1')

torch.cuda.device_count()	# GPU 数量
torch.cuda.is_available()	# GPU 是否可用

torch.version.cuda			# cuda 版本

查看张量所在设备，并更改所在设备

x = torch.tensor([1, 2, 3])
print(x.device)	# cpu

# 将 x 移到 GPU 上
x = x.to('cuda')	# equals x = x.cuda()

# net 是一个 Module，将其参数移动到 GPU 上
net.to(x.device)	# net.cuda()
# 查看网络的参数是否在 GPU 上
print(next(net.parameters()).device)

注意：

计算需要在同一个设备上进行，即模型和数据需要在同一个 GPU 或者 CPU 上进行计算
to(device) 对于张量不是一个 inplace 操作，而对于模型来说是一个 inplace 操作，也就是说对于张量需要使用 x = x.to() 来使改变真正生效

Declan

https://declk.github.io/archives/53554dbc.html

All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Declan !

Dive into deep learning

D2L 06 卷积神经网络

2021-11-27 课程 Dive into deep learning

Dive into deep learning

D2L 04 多层感知机

2021-11-24 课程 Dive into deep learning

Dive into deep learning

D2L 05 深度学习计算

D2L 05 深度学习计算

层和块

自定义块

顺序块

嵌套块

在正向传播函数中执行代码

参数管理

参数访问

参数初始化

参数绑定

读写文件

加载和保存张量

加载和保存模型参数

GPU

你的赏识是我前进的动力