Competition link — https://www.kaggle.com/c/cassava-leaf-disease-classification

This was my first Kaggle competition. Here I have created an overview of my solution. I will go briefly through the problem statement, the dataset, my approach and as well as some other approaches I liked. I have linked my kernels at the end of this post. My best performing submission achieved a 0.895 accuracy both on the public and private leaderboard.

As the second-largest provider of carbohydrates in Africa, cassava is a key food security crop grown by smallholder farmers because it can withstand harsh conditions. At least 80% of household farms in Sub-Saharan Africa…

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

**ArXiv link — ****https://arxiv.org/abs/1812.01187**

In this paper —

- the authors examine a collection of training procedure and model architecture refinements and empirically evaluate their impact on the final model accuracy via ablation study.
- these tricks introduce minor modifications to the model architecture, data preprocessing, loss function and learning rate schedule that leads to improved accuracy with barely any change in the computational complexity.
- their empirical evaluation shows that several tricks lead to significant accuracy improvement and combining them together can further boost the model accuracy consistently through empirical…

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

**ArXiv link — ****https://arxiv.org/abs/2005.12872**

In this paper —

- the authors present DETR — an object detection method based on transformers and bipartite matching loss that views the problem as a direct set prediction problem.
- their method removes the need for hand-designed components that require prior knowledge about the task like non-maximum suppression and anchor generation.
- the source code is made available at — https://github.com/facebookresearch/detr

Stefan Hinterstoisser, Olivier Pauly, Hauke Heibel, Martina Marek, Martin Bokeloh

**ArXiv link — ****https://arxiv.org/abs/1902.09967**

Data plays an important role in the performance of a machine learning model. Availability of domain specific datasets is a problem and often data needs to be collected and labelled manually — which is time consuming, expensive and error prone. An inexpensive solution is to generate training data synthetically. This allows generating an infinite amount of labelled training images with large and controlled variations.

A huge number of prior works have suggested techniques for synthesizing training data and a major challenge faced by all of them…

ArXiv link — https://arxiv.org/abs/1911.01702

The paper focusing on the problem of document layout analysis. Parsing a document’s rendering into a machine readable hierarchical structure is a major part of many applications. Generating such a hierarchical structure is a challenging tasks due to variations in the entities(lists can be ordered as well as unordered), variations in the structure of a document (one column, two column, etc), also the entities can be arbitrarily nested (a list in a table cell).

In this paper, the authors -

- introduce an end-to-end system for parsing structure of documents including all text elements, figures, tables and…

Feature Extraction is an important technique in Computer Vision widely used for tasks like:

- Object recognition
- Image alignment and stitching (to create a panorama)
- 3D stereo reconstruction
- Navigation for robots/self-driving cars
- and more…

Features are parts or patterns of an object in an image that help to identify it. For example — a square has 4 corners and 4 edges, they can be called features of the square, and they help us humans identify it’s a square. Features include properties like corners, edges, regions of interest points, ridges, etc.

As shown in the image below the yellow points show the…

You might have spent countless hours tuning your hyper-parameters, observing the performance metrics and run-time of your machine learning model, and when you want to go back to your previous iterations, you just can’t get the hyperparameter or some other configuration right to recreate the results. As a researcher, it’s important to log these hyper-parameters and observations somewhere to recreate the same results again if needed. Manually logging them somewhere is both tedious and prone to errors which can set your progress back by days. Also, it’s hard to understand your logs and recreate experiments over a long period.

Say…

“Eigen” is a German word which means “own”, “proper” or “characteristic”.

Let’s have a look at what Wikipedia has to say about Eigenvectors and Eigenvalues:

If

Tis a linear transformation from a vector spaceVover a fieldFinto itself andvis a vector inVthat is not the zero vector, thenvis aneigenvectorofTifT(v) is a scalar multiple ofv. This condition can be written as the equation

T ( v ) = λ vwhere

λis a scalar in the fieldF, known as theeigenvalue,characteristic value…

The goal of this post is to serve as a introduction to basic concepts involved in a convolution neural network. **This post is focused towards the final goal of implementing a MNIST handwritten digit classifier so everything is explained keeping that in mind — convolution layers**,** max pooling layers**,** RelU activation function**, **fully connected layers, dropout layers, cross entropy loss function, etc.**

This post is a part of a 2 part series on introduction to convolution neural network (CNN).

**Part 1 — Basic concepts revolving around CNNs**

**Part 2 — Pytorch Implementation of a CNN to classify MNIST handwritten digits**

…

The goal of this post is to implement a CNN to classify MNIST handwritten digit images using PyTorch.

This post is a part of a 2 part series on introduction to convolution neural network (CNN).

**Part 1 — Basic concepts revolving around CNNs**

**Part 2 — Pytorch Implementation of a CNN to classify MNIST handwritten digits**

This post does not explain working of concepts like **convolution layers**,** max pooling layers**,** fully connected layers**,** dropout layers**, etc in detail. Read the Part 1 if you are not familiar with them.

You can find the code here — https://github.com/iamkrut/MNIST_handwriting_classification

Machine Learning Engineer | Computer Vision | iamkrut.github.io