Before start...

This project is based on the work in "Let’s Build a Fashion-MNIST CNN, PyTorch Style".
The reason I choose it is because it has a good design to train a model with different hyper-parameters at the same time. Secondly, it uses tensorboard for data visualization.
After setup the environment and reproduced the experiment, I did several improvements as below:
  1. Redesign the CNN module, improve the training accuracy from 0.87 to 0.99.
  2. Add test part for Fashion-MNIST CNN
  3. Re-design the whole code structure, make it more readable
  4. Add python library verification
  5. Add more code comments

Introduction

This project is based on  PyTorch which was introduced by Facebook in 2017. DataSet is Fashion MNIST which is a dataset from Zalando's article images. It has a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, along with a label from 10 classes. It is a replacement for MNIST dataset to benchmark machine learning algorithms. 
 
In this project, I designed a new CNN model, trained it with a hyper-parameters set(3 params 8 combinations), find the best hyper-parameter set(lr=0.001, batch_size=100, shuffle=True, epochs=35, train_accuracy=0.9902), then did a test run based on this set and get a test accuracy 0.9101. To get this result, I also optimized the CNN model, changed the output activate function from Linear to log_softmax, tried different neutrons and kernel size, detail could be found below.

Code design

There are four main parts of the code: CNN design, Training, Testing, Data analyzing/Visualization

CNN construction

CNN is extended on NN module. Designed the structure in __init__ and forward function, the whole process is quite flexible.
class CNN(nn.Module):
 

Training

There is a parameter input  area, you can list all the  parameters you want to try.
params = OrderedDict(
lr = [.01, .001],
batch_size = [100, 1000],
shuffle = [True, False]
)
epochs = 35
There are two layers for the running part.
First, Run layer: decide what kind of hyper-parameters are going to be used, will run all of them one by one.
Second, Epoch layer: run all the batches in one epoch with specific hyper-parameter. 
def begin_run(self, run, network, loader):
def end_run(self):
 
def begin_epoch(self):
def end_epoch(self):

Testing

It's pretty straightforward for this part, load image from test dataset, run it with the trained model, calculate the predicted result.
def run_test(model, loader):
 

Data Analysis/Visualization

It integrated with Tensorboard for Visualization(used ngrok to map tensorboard port to a public URL)
It also used pandas and matplotlib and scikitplot for data analysis.
# Pands
pd.DataFrame.from_dict(
self.run_data,
orient = 'columns',
).to_csv(f'{fileName}.csv')
 
with open(f'{fileName}.json', 'w', encoding='utf-8') as f:
json.dump(self.run_data, f, ensure_ascii=False, indent=4)
 
# Tensorboard
self.tb = SummaryWriter(comment=f'-{run}')
get_ipython().system_raw(
'tensorboard --logdir {} --host 127.0.0.1 --port 8080 &'
.format(LOG_DIR)
)
 
# skplt
skplt.metrics.plot_confusion_matrix(test_set.targets,test_preds.argmax(dim=1), normalize=True)

CNN Model design

As we mentioned above, CNN model is based on the NN model of PyTorch. It is very easy to modify. For this project, I used a 7 layers CNN(6 hidden layers): conv1, pool1, conv2, pool2, fc1, fc2,out. I did not design a huge network here becuse of the long training time since I do not have a good GPU cards.
 
As we know, the input image is 28x28x1, conv1 will use  20 filters with 5x5,  pool1 will use maxpool with 2x2 kernel size and stride 2.
conv2 will use 50 filters with kernel 5x5, pool2 will use maxpool with kernel size 2x2 and stride 2. so the input for fc1 becomes 50x4x4, we choose output as 500, and fc2 output will be 10(classes).
 
For the BP part, I choose  cross_entropy as loss function and Adam(Adaptive Moment Estimation) as the optimizer, I used learn rate as a variable with 0.01 and 0.001.

Optimization

There are different ways to optimize a model, such as
  1. model itself
             different design, layers, neutrons, activate functions
     2.   back forward algorithm
            different gradient descent algorithms, loss function, learning rate
  1. input format
             different sort(shufflle), batch size, transform       
      4. epoch times

Hyper-parameters

store all our hyperparameters in an OrderedDict, so there will be 8 different combinations. You could add more params or more values in current params, it will be generated to specific set automatically by RunBuilder class
 
# put all hyper params into a OrderedDict, easily expandable
params = OrderedDict(
lr = [.01, .001],
batch_size = [100, 1000],
shuffle = [True, False]
)
epochs = 35
 
lr: Learning Rate. We want to try 0.01 and 0.001 for our models.
batch_size: Batch Size to speed up the training process. We’ll use 100 and 1000.
shuffle: Shuffle toggle, whether we shuffle the batch before training.
 
From above combination, I got below train results,

(lr=0.001, batch_size=100, shuffle=True) has best accuracy: 0.9906
(lr=0.01, batch_size=100, shuffle=False) has the worst accuracy 0.8997
However, this is still much better thant previous CNN module.
(lr=0.01, batch_size=1000, shuffle=False) has the best accuracy 0.872050
 
Based on (lr=0.001, batch_size=100, shuffle=True), I got test accuracy 0.9101
35 epochs, train  0.992000% ,test  90.23%

Above is its Normalized confusion Matrix, we could see the worst case is class 6 recoginzation, only 0.68.
 

Model Structure

I tried different layer, kernel size, neutrons, dropout, and activation function.
 
Current layer, kernel size, and neutrons have the best performance.
 
FC2 = 400: 15 epochs, train  0.967267% ,test  89.23%
 
 
Add a dropout between fc1 and fc2, the performance is worse:
 
35 epochs, train  0.970717% ,test  90.03%

 
Changed activation function from linear to log_softmax improved performance dramatically. That is probably softmax is better for K-Classification problem.
 

Other Optimization

 
I also tried different epochs, it turns out 35 has the best result, 36 will have some loss, It may become overfitting
 
36 epochs train 0.9900, test  90.44%

20 epochs train 0.97 test 0.89
 
 
I only used Adam as the optimizer, other function like SGD may show different.
 
optimizer = optim.Adam(network.parameters(), lr=run.lr)
 

Data Analysis/Visualization

All data are stored under directory "run/"
 
The Normalized Confusion Matrix above created by skplt is very straightforward
 
TensorBoard is a very convenient visualization tool for us to get insights into our training and can help greatly with the hyperparameter tuning process. We can easily spot which hyperparameter comp performs the best and then using it to do our real training.
 
Below are some pictures created by TensorBoard automatically.

 

Conclusion

PyTorch as a machine learning framework is flexible, powerful and expressive. It is super friendly to beginners. From this whole prject, It gave me the idea of how to optimize a machine learning model. With a bunch of hacks, finally got the best result with train_accuracy=0.9902 and test accuracy 0.9101(data shows a little bit vary between different run)There should be other ways to improve it too, like a better CNN design(a huge network should improve the peformance), or different optimizer. Another gain is start to use tensorboard as a visualization tool, it is worth more time to dig in.

Reference

  1. https://towardsdatascience.com/build-a-fashion-mnist-cnn-pytorch-style-efb297e22582
  2. https://github.com/zalandoresearch/fashion-mnist
  3. https://pytorch.org/docs/stable/index.html
  4. https://www.tensorflow.org/tensorboard