====================================================================
Backpropagation 강의 보고 왔는데, 확실히 강의 보고 했으면 훨씬 쉽게 했을거 같다.
20.01.19
====================================================================
과제는 Linear Classifier와 Two Layer Neural Network로 구성되어 있는데,
먼저 Linear Classifier에서는 SVM Loss와 Soft Max 방식의 Loss Function을 navie 한 방식과 vectorization을 적용한
방식으로 구현하고, SGD 방식에서 batch를 가져오는 것을 구현한다. 그리고 모델이 Weight을 학습하는 train 부분과 학습된 모델을 가지고 데이터의 class를 분류하는 부분을 구현한 후, 마지막으로 모델의 hyperparameter를 선정하는
부분 등으로 이루어져있다.
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples. When you implment the regularization over W, please DO NOT
multiply the regularization term by 1/2 (no coefficient).
Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
dW = torch.zeros_like(W) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = W.t().mv(X[i])
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
#######################################################################
# TODO: #
# Compute the gradient of the loss function and store it dW. (part 1) #
# Rather than first computing the loss and then computing the #
# derivative, it is simple to compute the derivative at the same time #
# that the loss is being computed. #
#######################################################################
# Replace "pass" statement with your code
dW[:,j] += X[i]
dW[:,y[i]] -= X[i]
#######################################################################
# END OF YOUR CODE #
#######################################################################
# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss /= num_train
# Add regularization to the loss.
loss += reg * torch.sum(W * W)
#############################################################################
# TODO: #
# Compute the gradient of the loss function and store it in dW. (part 2) #
#############################################################################
# Replace "pass" statement with your code
dW /= num_train
dW += 2 * W * reg
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW
먼저 SVM Loss의 naive 한 구현을 보면 for문을 돌면서 순서대로 loss와 dW를 계산하면 된다.
PPT에 나온 그대로 적용하면 돼서 그다지 어렵지는 않을 것이다.
def svm_loss_vectorized(W, X, y, reg):
"""
Structured SVM loss function, vectorized implementation. When you implment
the regularization over W, please DO NOT multiply the regularization term by
1/2 (no coefficient). The inputs and outputs are the same as svm_loss_naive.
Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as torch scalar
- gradient of loss with respect to weights W; a tensor of same shape as W
"""
loss = 0.0
dW = torch.zeros_like(W) # initialize the gradient as zero
#############################################################################
# TODO: #
# Implement a vectorized version of the structured SVM loss, storing the #
# result in loss. #
#############################################################################
# Replace "pass" statement with your code
num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.mm(W)
correct_class_score = scores[range(num_train),y].view(-1,1)
margin = scores - correct_class_score + 1
loss = margin.clamp(min=0).sum() - num_train
loss /= num_train
loss += reg * torch.sum(W * W)
#############################################################################
# END OF YOUR CODE #
#############################################################################
#############################################################################
# TODO: #
# Implement a vectorized version of the gradient for the structured SVM #
# loss, storing the result in dW. #
# #
# Hint: Instead of computing the gradient from scratch, it may be easier #
# to reuse some of the intermediate values that you used to compute the #
# loss. #
#############################################################################
# Replace "pass" statement with your code
margin_mask = (margin > 0).type(X.dtype)
margin_mask[range(num_train), y] -= margin_mask.sum(axis=1)
dW = X.T.mm(margin_mask)
dW /= num_train
dW += 2 * reg * W
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW
이다음으로는 SVM Loss의 vectorization을 적용한 버전이다. vectorization은 파이썬의 브로드 캐스팅 등의 기능을
활용해 수행하는데, 항상 할 때마다 이 부분이 가장 어려운 거 같다. 시간도 많이 잡아먹었었다.
def sample_batch(X, y, num_train, batch_size):
"""
Sample batch_size elements from the training data and their
corresponding labels to use in this round of gradient descent.
"""
X_batch = None
y_batch = None
#########################################################################
# TODO: Store the data in X_batch and their corresponding labels in #
# y_batch; after sampling, X_batch should have shape (batch_size, dim) #
# and y_batch should have shape (batch_size,) #
# #
# Hint: Use torch.randint to generate indices. #
#########################################################################
# Replace "pass" statement with your code
batch_indices = torch.randint(0,num_train, (batch_size,))
X_batch = X[batch_indices]
y_batch = y[batch_indices]
#########################################################################
# END OF YOUR CODE #
#########################################################################
return X_batch, y_batch
다음으로 sample_batch 함수 구현, 이건 간단하게 파이썬의 팬시 인덱싱 기능을 이용해서 구현하면 된다.
def train_linear_classifier(loss_func, W, X, y, learning_rate=1e-3,
reg=1e-5, num_iters=100, batch_size=200,
verbose=False):
"""
Train this linear classifier using stochastic gradient descent.
Inputs:
- loss_func: loss function to use when training. It should take W, X, y
and reg as input, and output a tuple of (loss, dW)
- W: A PyTorch tensor of shape (D, C) giving the initial weights of the
classifier. If W is None then it will be initialized here.
- X: A PyTorch tensor of shape (N, D) containing training data; there are N
training samples each of dimension D.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c
means that X[i] has label 0 <= c < C for C classes.
- learning_rate: (float) learning rate for optimization.
- reg: (float) regularization strength.
- num_iters: (integer) number of steps to take when optimizing
- batch_size: (integer) number of training examples to use at each step.
- verbose: (boolean) If true, print progress during optimization.
Returns: A tuple of:
- W: The final value of the weight matrix and the end of optimization
- loss_history: A list of Python scalars giving the values of the loss at each
training iteration.
"""
# assume y takes values 0...K-1 where K is number of classes
num_train, dim = X.shape
if W is None:
# lazily initialize W
num_classes = torch.max(y) + 1
W = 0.000001 * torch.randn(dim, num_classes, device=X.device, dtype=X.dtype)
else:
num_classes = W.shape[1]
# Run stochastic gradient descent to optimize W
loss_history = []
for it in range(num_iters):
# TODO: implement sample_batch function
X_batch, y_batch = sample_batch(X, y, num_train, batch_size)
# evaluate loss and gradient
loss, grad = loss_func(W, X_batch, y_batch, reg)
loss_history.append(loss.item())
# perform parameter update
#########################################################################
# TODO: #
# Update the weights using the gradient and the learning rate. #
#########################################################################
# Replace "pass" statement with your code
W -= learning_rate * grad
#########################################################################
# END OF YOUR CODE #
#########################################################################
if verbose and it % 100 == 0:
print('iteration %d / %d: loss %f' % (it, num_iters, loss))
return W, loss_history
Loss Function과 batch를 가져오는 함수를 구현했으니, 이제는 모델의 train을 구현하는데, 여기서는
gradient descent를 수행하는 것을 구현하면 된다. 아마 큰 어려움 없이 할 수 있을 것이다.
def predict_linear_classifier(W, X):
"""
Use the trained weights of this linear classifier to predict labels for
data points.
Inputs:
- W: A PyTorch tensor of shape (D, C), containing weights of a model
- X: A PyTorch tensor of shape (N, D) containing training data; there are N
training samples each of dimension D.
Returns:
- y_pred: PyTorch int64 tensor of shape (N,) giving predicted labels for each
elemment of X. Each element of y_pred should be between 0 and C - 1.
"""
y_pred = torch.zeros(X.shape[0], dtype=torch.int64)
###########################################################################
# TODO: #
# Implement this method. Store the predicted labels in y_pred. #
###########################################################################
# Replace "pass" statement with your code
y_pred = X.matmul(W).argmax(axis=1)
###########################################################################
# END OF YOUR CODE #
###########################################################################
return y_pred
그리고 학습된 모델을 가지고 데이터를 분류하는 predict 부분을 구현하면 되는데, 이 부분도 간단히 matmul 메서드를
이용하면 쉽게 할 수 있다.
def svm_get_search_params():
"""
Return candidate hyperparameters for the SVM model. You should provide
at least two param for each, and total grid search combinations
should be less than 25.
Returns:
- learning_rates: learning rate candidates, e.g. [1e-3, 1e-2, ...]
- regularization_strengths: regularization strengths candidates
e.g. [1e0, 1e1, ...]
"""
learning_rates = []
regularization_strengths = []
###########################################################################
# TODO: add your own hyper parameter lists. #
###########################################################################
# Replace "pass" statement with your code
learning_rates = [42e-5, 44e-5]
regularization_strengths= [7e-3, 9e-3]
###########################################################################
# END OF YOUR CODE #
###########################################################################
return learning_rates, regularization_strengths
이제는 최적의 hyperparameter를 선정하는 부분을 다루는데, 위 함수에서는 테스트해보고 싶은
후보 hyperparameter를 반환한다. 원래 learning_rate와 regularization_strength의
경우의 수 합이 5 이상 25 미만이어야 하는데, 나중에 최종 결과 낼 때는 5가지의 경우의 수도 부담스러워서
임의로 주피터 노트북 코드 수정해서 4가지 경우로 했다.
def test_one_param_set(cls, data_dict, lr, reg, num_iters=2000):
"""
Train a single LinearClassifier instance and return the learned instance
with train/val accuracy.
Inputs:
- cls (LinearClassifier): a newly-created LinearClassifier instance.
Train/Validation should perform over this instance
- data_dict (dict): a dictionary that includes
['X_train', 'y_train', 'X_val', 'y_val']
as the keys for training a classifier
- lr (float): learning rate parameter for training a SVM instance.
- reg (float): a regularization weight for training a SVM instance.
- num_iters (int, optional): a number of iterations to train
Returns:
- cls (LinearClassifier): a trained LinearClassifier instances with
(['X_train', 'y_train'], lr, reg)
for num_iter times.
- train_acc (float): training accuracy of the svm_model
- val_acc (float): validation accuracy of the svm_model
"""
train_acc = 0.0 # The accuracy is simply the fraction of data points
val_acc = 0.0 # that are correctly classified.
###########################################################################
# TODO: #
# Write code that, train a linear SVM on the training set, compute its #
# accuracy on the training and validation sets #
# #
# Hint: Once you are confident that your validation code works, you #
# should rerun the validation code with the final value for num_iters. #
# Before that, please test with small num_iters first #
###########################################################################
# Feel free to uncomment this, at the very beginning,
# and don't forget to remove this line before submitting your final version
#num_iters = 100
# Replace "pass" statement with your code
for i in range(num_iters):
cls.train(data_dict['X_train'], data_dict['y_train'], lr, reg)
train_acc = (data_dict['y_train'] == cls.predict(data_dict['X_train'])).float().mean().item()
val_acc = (data_dict['y_val'] == cls.predict(data_dict['X_val'])).float().mean().item()
############################################################################
# END OF YOUR CODE #
############################################################################
return cls, train_acc, val_acc
위에서 hyperparameter 후보를 선택했다면, 이제는 그것들을 조합했을 때의 결과를 내는 부분을 짠다.
num_iters 만큼 모델을 학습시켜주면 된다.
def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops). When you implment
the regularization over W, please DO NOT multiply the regularization term by
1/2 (no coefficient).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A PyTorch tensor of shape (D, C) containing weights.
- X: A PyTorch tensor of shape (N, D) containing a minibatch of data.
- y: A PyTorch tensor of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an tensor of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = torch.zeros_like(W)
#############################################################################
# TODO: Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability (Check Numeric Stability #
# in http://cs231n.github.io/linear-classify/). Plus, don't forget the #
# regularization! #
#############################################################################
# Replace "pass" statement with your code
num_classes = W.shape[1]
num_train = X.shape[0]
for i in range(num_train):
scores = W.t().mv(X[i])
scores -= scores.max()
scores = torch.exp(scores)
scores_prob = scores/scores.sum()
loss -= torch.log(scores_prob[y[i]])
for j in range(num_classes):
dW[:, j] += ((scores_prob[j] - (1 if j == y[i] else 0)) * X[i])
loss /= num_train
loss += reg * torch.sum(W * W)
dW /= num_train
dW += 2 * reg * W
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW
지금부터는 SoftMax Loss를 구현한다. 구현은 강의에 나온 대로 하면 되는데, 코드 중간에 보면
'scores -= scores.max()' 이 부분이 있는데 이걸 안 하면 나중에 Loss가 Nan이 되는 부분이 있어서 꼭 해주어야 한다.
def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorized version. When you implment the
regularization over W, please DO NOT multiply the regularization term by 1/2
(no coefficient).
Inputs and outputs are the same as softmax_loss_naive.
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = torch.zeros_like(W)
#############################################################################
# TODO: Compute the softmax loss and its gradient using no explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability (Check Numeric Stability #
# in http://cs231n.github.io/linear-classify/). Don't forget the #
# regularization! #
#############################################################################
# Replace "pass" statement with your code
num_classes = W.shape[1]
num_train = X.shape[0]
scores = X.matmul(W)
scores -= scores.max(axis = 1).values.view(-1,1)
scores = torch.exp(scores)
scores_prob = scores/scores.sum(axis=1).view(-1,1)
offset = scores_prob[range(num_train),y]
loss -= torch.log(offset).sum()
scores_prob[range(num_train), y] -= 1
dW = X.T.matmul(scores_prob)
loss /= num_train
loss += reg * torch.sum(W * W)
dW /= num_train
dW += 2 * reg * W
#############################################################################
# END OF YOUR CODE #
#############################################################################
return loss, dW
다시 위에서 naive 한 구현에 vectorization을 적용해서 구현해야 한다.
브로드 캐스팅은 매번 할 때마다, 머리에 잘 그려지지 않아서 진짜 시간 많이 잡아먹었다 ㅜ
def softmax_get_search_params():
"""
Return candidate hyperparameters for the Softmax model. You should provide
at least two param for each, and total grid search combinations
should be less than 25.
Returns:
- learning_rates: learning rate candidates, e.g. [1e-3, 1e-2, ...]
- regularization_strengths: regularization strengths candidates
e.g. [1e0, 1e1, ...]
"""
learning_rates = []
regularization_strengths = []
###########################################################################
# TODO: Add your own hyper parameter lists. This should be similar to the #
# hyperparameters that you used for the SVM, but you may need to select #
# different hyperparameters to achieve good performance with the softmax #
# classifier. #
###########################################################################
# Replace "pass" statement with your code
learning_rates = [1e-3, 7e-4]
regularization_strengths= [7e-3, 5e-3]
###########################################################################
# END OF YOUR CODE #
###########################################################################
return learning_rates, regularization_strengths
마지막으로 softmax의 hyperparameter 후보를 리턴하는 함수를 구현하면 된다
하면서도 느꼈는데, 이번 과제 정말 많은 것 같다. 흠... 체감상 저번 거 2배는 긴 거 같은데 ㅋㅋㅋㅋㅋ
원래 글 1개로 한 번에 하려고 했는데, Neural Network는 2부에서 이어서 해야겠다
'Deep Learning for Computer Vision' 카테고리의 다른 글
EECS 498-007 / 598-005 한국어 강의 (0) | 2021.01.20 |
---|---|
EECS 498-007 / 598-005 Assignment #2-2 (1) | 2021.01.16 |
EECS 498-007 / 598-005 Lecture 5 : Neural Networks (0) | 2021.01.14 |
EECS 498-007 / 598-005 Lecture 4 : Optimization (0) | 2021.01.11 |
EECS 498-007 / 598-005 Lecture 3 : Linear Classifier (2) | 2021.01.07 |