EECS 498-007 / 598-005 Assignment #2-2

Two Layer Neural Network 부분은 먼저 forward pass 부분을 구현하는 것으로 시작된다.

def nn_forward_pass(params, X):
    """
    The first stage of our neural network implementation: Run the forward pass
    of the network to compute the hidden layer features and classification
    scores. The network architecture should be:

    FC layer -> ReLU (hidden) -> FC layer (scores)

    As a practice, we will NOT allow to use torch.relu and torch.nn ops
    just for this time (you can use it from A3).

    Inputs:
    - params: a dictionary of PyTorch Tensor that store the weights of a model.
      It should have following keys with shape
          W1: First layer weights; has shape (D, H)
          b1: First layer biases; has shape (H,)
          W2: Second layer weights; has shape (H, C)
          b2: Second layer biases; has shape (C,)
    - X: Input data of shape (N, D). Each X[i] is a training sample.

    Returns a tuple of:
    - scores: Tensor of shape (N, C) giving the classification scores for X
    - hidden: Tensor of shape (N, H) giving the hidden layer representation
      for each input value (after the ReLU).
    """
    # Unpack variables from the params dictionary
    W1, b1 = params['W1'], params['b1']
    W2, b2 = params['W2'], params['b2']
    N, D = X.shape

    # Compute the forward pass
    hidden = None
    scores = None
    ############################################################################
    # TODO: Perform the forward pass, computing the class scores for the input.#
    # Store the result in the scores variable, which should be an tensor of    #
    # shape (N, C).                                                            #
    ############################################################################
    # Replace "pass" statement with your code
    hidden = X.matmul(W1) + b1
    hidden = hidden.clamp(min=0)
    scores = hidden.matmul(W2) + b2
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return scores, hidden

행렬곱 두 번만 하면 forward pass는 간단하게 해결할 수 있다. 그리고 relu의 경우 여기서는 clamp 메서드를 사용해서

0 미만인 값도 다 0으로 만들어 주었다.

def nn_forward_backward(params, X, y=None, reg=0.0):
    """
    Compute the loss and gradients for a two layer fully connected neural
    network. When you implement loss and gradient, please don't forget to
    scale the losses/gradients by the batch size.

    Inputs: First two parameters (params, X) are same as nn_forward_pass
    - params: a dictionary of PyTorch Tensor that store the weights of a model.
      It should have following keys with shape
          W1: First layer weights; has shape (D, H)
          b1: First layer biases; has shape (H,)
          W2: Second layer weights; has shape (H, C)
          b2: Second layer biases; has shape (C,)
    - X: Input data of shape (N, D). Each X[i] is a training sample.
    - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
      an integer in the range 0 <= y[i] < C. This parameter is optional; if it
      is not passed then we only return scores, and if it is passed then we
      instead return the loss and gradients.
    - reg: Regularization strength.

    Returns:
    If y is None, return a tensor scores of shape (N, C) where scores[i, c] is
    the score for class c on input X[i].

    If y is not None, instead return a tuple of:
    - loss: Loss (data loss and regularization loss) for this batch of training
      samples.
    - grads: Dictionary mapping parameter names to gradients of those parameters
      with respect to the loss function; has the same keys as self.params.
    """
    # Unpack variables from the params dictionary
    W1, b1 = params['W1'], params['b1']
    W2, b2 = params['W2'], params['b2']
    N, D = X.shape

    scores, h1 = nn_forward_pass(params, X)
    # If the targets are not given then jump out, we're done
    if y is None:
      return scores

    # Compute the loss
    loss = None
    ############################################################################
    # TODO: Compute the loss, based on the results from nn_forward_pass.       #
    # This should include both the data loss and L2 regularization for W1 and  #
    # W2. Store the result in the variable loss, which should be a scalar. Use #
    # the Softmax classifier loss. When you implment the regularization over W,#
    # please DO NOT multiply the regularization term by 1/2 (no coefficient).  #
    # If you are not careful here, it is easy to run into numeric instability  #
    # (Check Numeric Stability in http://cs231n.github.io/linear-classify/).   #
    ############################################################################
    # Replace "pass" statement with your code
    num_train = X.shape[0]
    
    scores -= scores.max(axis=1).values.view(-1,1)
    scores = scores.exp()
    probs = scores/scores.sum(axis=1).view(-1,1)
    valid_probs = probs[range(num_train), y]
    loss = (-valid_probs.log()).sum()
    loss /= num_train
    loss += reg * (torch.sum(W1 * W1) + torch.sum(W2 * W2))
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    # Backward pass: compute gradients
    grads = {}
    ###########################################################################
    # TODO: Compute the backward pass, computing the derivatives of the       #
    # weights and biases. Store the results in the grads dictionary.          #
    # For example, grads['W1'] should store the gradient on W1, and be a      #
    # tensor of same size                                                     #
    ###########################################################################
    # Replace "pass" statement with your code
    probs[range(num_train), y] -=1
    dscores = probs / num_train
    
    grads['b2'] = dscores.sum(axis=0)
    
    grads['W2'] = h1.T.matmul(dscores)
    grads['W2'] += reg * 2 * W2
    
    dh1 = dscores.matmul(W2.T)
    dh1 = dh1 * (h1 > 0)
    
    grads['b1'] = dh1.sum(axis=0)
    
    grads['W1'] = X.T.matmul(dh1)
    grads['W1'] += reg * 2 * W1
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return loss, grads

앞에서 forward pass를 했으니 이제는 backward pass를 구현한다. loss는 이전에 soft max에서 했던 것처럼 하면

되는데, gradients 구현이 많이 어려웠다. 강의에서 이 부분 못할 거 같으면 back propagation 강의 듣고 하라고 했는데,

안 듣고 하다 보니 식이 머릿속에서 잘 안 떠올랐는데 그냥 pdf 라도 보면서 했으면 더 빨리 끝냈을 거 같기도 하다.

def nn_train(params, loss_func, pred_func, X, y, X_val, y_val,
            learning_rate=1e-3, learning_rate_decay=0.95,
            reg=5e-6, num_iters=100,
            batch_size=200, verbose=False):
  """
  Train this neural network using stochastic gradient descent.

  Inputs:
  - params: a dictionary of PyTorch Tensor that store the weights of a model.
    It should have following keys with shape
        W1: First layer weights; has shape (D, H)
        b1: First layer biases; has shape (H,)
        W2: Second layer weights; has shape (H, C)
        b2: Second layer biases; has shape (C,)
  - loss_func: a loss function that computes the loss and the gradients.
    It takes as input:
    - params: Same as input to nn_train
    - X_batch: A minibatch of inputs of shape (B, D)
    - y_batch: Ground-truth labels for X_batch
    - reg: Same as input to nn_train
    And it returns a tuple of:
      - loss: Scalar giving the loss on the minibatch
      - grads: Dictionary mapping parameter names to gradients of the loss with
        respect to the corresponding parameter.
  - pred_func: prediction function that im
  - X: A PyTorch tensor of shape (N, D) giving training data.
  - y: A PyTorch tensor f shape (N,) giving training labels; y[i] = c means that
    X[i] has label c, where 0 <= c < C.
  - X_val: A PyTorch tensor of shape (N_val, D) giving validation data.
  - y_val: A PyTorch tensor of shape (N_val,) giving validation labels.
  - learning_rate: Scalar giving learning rate for optimization.
  - learning_rate_decay: Scalar giving factor used to decay the learning rate
    after each epoch.
  - reg: Scalar giving regularization strength.
  - num_iters: Number of steps to take when optimizing.
  - batch_size: Number of training examples to use per step.
  - verbose: boolean; if true print progress during optimization.

  Returns: A dictionary giving statistics about the training process
  """
  num_train = X.shape[0]
  iterations_per_epoch = max(num_train // batch_size, 1)

  # Use SGD to optimize the parameters in self.model
  loss_history = []
  train_acc_history = []
  val_acc_history = []

  for it in range(num_iters):
    X_batch, y_batch = sample_batch(X, y, num_train, batch_size)

    # Compute loss and gradients using the current minibatch
    loss, grads = loss_func(params, X_batch, y=y_batch, reg=reg)
    loss_history.append(loss.item())

    #########################################################################
    # TODO: Use the gradients in the grads dictionary to update the         #
    # parameters of the network (stored in the dictionary self.params)      #
    # using stochastic gradient descent. You'll need to use the gradients   #
    # stored in the grads dictionary defined above.                         #
    #########################################################################
    # Replace "pass" statement with your code
    for param in params.keys():
        params[param] -= learning_rate * grads[param]
    #########################################################################
    #                             END OF YOUR CODE                          #
    #########################################################################

    if verbose and it % 100 == 0:
      print('iteration %d / %d: loss %f' % (it, num_iters, loss.item()))

    # Every epoch, check train and val accuracy and decay learning rate.
    if it % iterations_per_epoch == 0:
      # Check accuracy
      y_train_pred = pred_func(params, loss_func, X_batch)
      train_acc = (y_train_pred == y_batch).float().mean().item()
      y_val_pred = pred_func(params, loss_func, X_val)
      val_acc = (y_val_pred == y_val).float().mean().item()
      train_acc_history.append(train_acc)
      val_acc_history.append(val_acc)

      # Decay learning rate
      learning_rate *= learning_rate_decay

  return {
    'loss_history': loss_history,
    'train_acc_history': train_acc_history,
    'val_acc_history': val_acc_history,
  }

신경망을 학습하는 부분으로 각 parameter 당 1 step씩 진행해주면 된다.

def nn_predict(params, loss_func, X):
  """
  Use the trained weights of this two-layer network to predict labels for
  data points. For each data point we predict scores for each of the C
  classes, and assign each data point to the class with the highest score.

  Inputs:
  - params: a dictionary of PyTorch Tensor that store the weights of a model.
    It should have following keys with shape
        W1: First layer weights; has shape (D, H)
        b1: First layer biases; has shape (H,)
        W2: Second layer weights; has shape (H, C)
        b2: Second layer biases; has shape (C,)
  - loss_func: a loss function that computes the loss and the gradients
  - X: A PyTorch tensor of shape (N, D) giving N D-dimensional data points to
    classify.

  Returns:
  - y_pred: A PyTorch tensor of shape (N,) giving predicted labels for each of
    the elements of X. For all i, y_pred[i] = c means that X[i] is predicted
    to have class c, where 0 <= c < C.
  """
  y_pred = None

  ###########################################################################
  # TODO: Implement this function; it should be VERY simple!                #
  ###########################################################################
  # Replace "pass" statement with your code
  scores = loss_func(params, X)
  y_pred = scores.max(axis=1).indices
  ###########################################################################
  #                              END OF YOUR CODE                           #
  ###########################################################################

  return y_pred

학습된 모델을 가지고 predict 하는 부분도 이전에 구현했던 nn_forward_backward를 이용하면 무난하게 할 수 있다.

def nn_get_search_params():
  """
  Return candidate hyperparameters for a TwoLayerNet model.
  You should provide at least two param for each, and total grid search
  combinations should be less than 256. If not, it will take
  too much time to train on such hyperparameter combinations.

  Returns:
  - learning_rates: learning rate candidates, e.g. [1e-3, 1e-2, ...]
  - hidden_sizes: hidden value sizes, e.g. [8, 16, ...]
  - regularization_strengths: regularization strengths candidates
                              e.g. [1e0, 1e1, ...]
  - learning_rate_decays: learning rate decay candidates
                              e.g. [1.0, 0.95, ...]
  """
  learning_rates = []
  hidden_sizes = []
  regularization_strengths = []
  learning_rate_decays = []
  ###########################################################################
  # TODO: Add your own hyper parameter lists. This should be similar to the #
  # hyperparameters that you used for the SVM, but you may need to select   #
  # different hyperparameters to achieve good performance with the softmax  #
  # classifier.                                                             #
  ###########################################################################
  # Replace "pass" statement with your code
  learning_rates = [1, 1.5]
  hidden_sizes = [ 2048, 4096, 5120]
  regularization_strengths = [3e-6, 1e-6, 7e-7]
  learning_rate_decays = [95e-2, 97e-2]
  ###########################################################################
  #                           END OF YOUR CODE                              #
  ###########################################################################

  return learning_rates, hidden_sizes, regularization_strengths, learning_rate_decays

테스트하고 싶은 hyperparameter 후보를 리턴하는 함수이다. 여기서도 경우의 수가 5 이상 25 미만이어야 하는데,

neural net으로 테스트할 때가 linear Classifier보다 훨씬 빠르길래 그냥 주피터 노트북 코드 임의로 수정해서 그거보다 더 많은 경우의 수로 테스트했다.

def find_best_net(data_dict, get_param_set_fn):
  """
  Tune hyperparameters using the validation set.
  Store your best trained TwoLayerNet model in best_net, with the return value
  of ".train()" operation in best_stat and the validation accuracy of the
  trained best model in best_val_acc. Your hyperparameters should be received
  from in nn_get_search_params

  Inputs:
  - data_dict (dict): a dictionary that includes
                      ['X_train', 'y_train', 'X_val', 'y_val']
                      as the keys for training a classifier
  - get_param_set_fn (function): A function that provides the hyperparameters
                                 (e.g., nn_get_search_params)
                                 that gives (learning_rates, hidden_sizes,
                                 regularization_strengths, learning_rate_decays)
                                 You should get hyperparameters from
                                 get_param_set_fn.

  Returns:
  - best_net (instance): a trained TwoLayerNet instances with
                         (['X_train', 'y_train'], batch_size, learning_rate,
                         learning_rate_decay, reg)
                         for num_iter times.
  - best_stat (dict): return value of "best_net.train()" operation
  - best_val_acc (float): validation accuracy of the best_net
  """

  best_net = None
  best_stat = None
  best_val_acc = 0.0

  #############################################################################
  # TODO: Tune hyperparameters using the validation set. Store your best      #
  # trained model in best_net.                                                #
  #                                                                           #
  # To help debug your network, it may help to use visualizations similar to  #
  # the ones we used above; these visualizations will have significant        #
  # qualitative differences from the ones we saw above for the poorly tuned   #
  # network.                                                                  #
  #                                                                           #
  # Tweaking hyperparameters by hand can be fun, but you might find it useful #
  # to write code to sweep through possible combinations of hyperparameters   #
  # automatically like we did on the previous exercises.                      #
  #############################################################################
  # Replace "pass" statement with your code
  learning_rates, hidden_sizes, regularization_strengths, learning_rate_decays = get_param_set_fn()

  for lr in learning_rates:
      for hs in hidden_sizes:
          for rs in regularization_strengths:
              for lrd in learning_rate_decays:
                  model = TwoLayerNet(3 * 32 * 32, hs, 10, device=data_dict['X_train'].device,\
                  dtype=data_dict['X_train'].dtype)
                  
                  stats = model.train(data_dict['X_train'], data_dict['y_train'], data_dict['X_val'],\
                  data_dict['y_val'], num_iters=3000, batch_size=1000, learning_rate=lr, learning_rate_decay=lrd,\
                  reg=rs, verbose=False)
                  
                  if stats['val_acc_history'][-1] > best_val_acc:
                      print(lr, hs, rs, lrd)
                      best_net = model
                      best_stat = stats
                      best_val_acc = stats['val_acc_history'][-1]
                  
  #############################################################################
  #                               END OF YOUR CODE                            #
  #############################################################################

  return best_net, best_stat, best_val_acc

마지막으로 hyperparameter 후보에서 최고의 hyperparmeter를 뽑는 부분.

구현은 쉬운데 여기서 갑자기 이유 없는 에러가 발생해서 멘탈 나갈 뻔했다 다행히 몇 분 기다리니 되던데

코랩 문제였나...?

Assignment 2는 이게 마지막이다. 내용은 알찼는데, 시간 정말 많이 걸렸다 ㅜㅜ

다음 과제는 이거보다 짧았으면

728x90

저작자표시 비영리

'Deep Learning for Computer Vision' 카테고리의 다른 글

EECS 498-007 / 598-005 Lecture 7 : Convolutional Networks (0)	2021.01.21
EECS 498-007 / 598-005 한국어 강의 (0)	2021.01.20
EECS 498-007 / 598-005 Assignment #2-1 (0)	2021.01.16
EECS 498-007 / 598-005 Lecture 5 : Neural Networks (0)	2021.01.14
EECS 498-007 / 598-005 Lecture 4 : Optimization (0)	2021.01.11

딥땔감 2명 타요~

EECS 498-007 / 598-005 Assignment #2-2

'Deep Learning for Computer Vision' 카테고리의 다른 글

티스토리툴바

EECS 498-007 / 598-005 Assignment #2-2

'Deep Learning for Computer Vision' 카테고리의 다른 글

'Deep Learning for Computer Vision' Related Articles

티스토리툴바