src: Kaggle competition page header — (

Competition link —

This was my first Kaggle competition. Here I have created an overview of my solution. I will go briefly through the problem statement, the dataset, my approach and as well as some other approaches I liked. I have linked my kernels at the end of this post. My best performing submission achieved a 0.895 accuracy both on the public and private leaderboard.

Problem Statement

Image by HeungSoon (src —

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li

ArXiv link —


In this paper —

  • the authors examine a collection of training procedure and model architecture refinements and empirically evaluate their impact on the final model accuracy via ablation study.
  • these tricks introduce minor modifications to the model architecture, data preprocessing, loss function and learning rate schedule that leads to improved accuracy with barely any change in the computational complexity.
  • their empirical evaluation shows that several tricks lead to significant accuracy improvement and combining them together can further boost the model accuracy consistently through empirical…

(src —

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

ArXiv link —


In this paper —

  • the authors present DETR — an object detection method based on transformers and bipartite matching loss that views the problem as a direct set prediction problem.
  • their method removes the need for hand-designed components that require prior knowledge about the task like non-maximum suppression and anchor generation.
  • the source code is made available at —

DETR Architecture

Stefan Hinterstoisser, Olivier Pauly, Hauke Heibel, Martina Marek, Martin Bokeloh

ArXiv link —

Research Problem

Data plays an important role in the performance of a machine learning model. Availability of domain specific datasets is a problem and often data needs to be collected and labelled manually — which is time consuming, expensive and error prone. An inexpensive solution is to generate training data synthetically. This allows generating an infinite amount of labelled training images with large and controlled variations.

Prior Works

A huge number of prior works have suggested techniques for synthesizing training data and a major challenge faced by all of them…

ArXiv link —

Research problem

The paper focusing on the problem of document layout analysis. Parsing a document’s rendering into a machine readable hierarchical structure is a major part of many applications. Generating such a hierarchical structure is a challenging tasks due to variations in the entities(lists can be ordered as well as unordered), variations in the structure of a document (one column, two column, etc), also the entities can be arbitrarily nested (a list in a table cell).


In this paper, the authors -

  • introduce an end-to-end system for parsing structure of documents including all text elements, figures, tables and…

Brief write up focused on giving an overview of the traditional and deep learning techniques for feature extraction

Feature Extraction is an important technique in Computer Vision widely used for tasks like:

  • Object recognition
  • Image alignment and stitching (to create a panorama)
  • 3D stereo reconstruction
  • Navigation for robots/self-driving cars
  • and more…

What are features?

Features are parts or patterns of an object in an image that help to identify it. For example — a square has 4 corners and 4 edges, they can be called features of the square, and they help us humans identify it’s a square. Features include properties like corners, edges, regions of interest points, ridges, etc.

As shown in the image below the yellow points show the…

Image by Luka from Pexels

You might have spent countless hours tuning your hyper-parameters, observing the performance metrics and run-time of your machine learning model, and when you want to go back to your previous iterations, you just can’t get the hyperparameter or some other configuration right to recreate the results. As a researcher, it’s important to log these hyper-parameters and observations somewhere to recreate the same results again if needed. Manually logging them somewhere is both tedious and prone to errors which can set your progress back by days. Also, it’s hard to understand your logs and recreate experiments over a long period.


Image by joiom -

“Eigen” — Word’s origin

Eigen” is a German word which means “own”, “proper” or “characteristic”.

What are Eigenvectors and Eigenvalues?

Let’s have a look at what Wikipedia has to say about Eigenvectors and Eigenvalues:

If T is a linear transformation from a vector space V over a field F into itself and v is a vector in V that is not the zero vector, then v is an eigenvector of T if T(v) is a scalar multiple of v. This condition can be written as the equation

The goal of this post is to serve as a introduction to basic concepts involved in a convolution neural network. This post is focused towards the final goal of implementing a MNIST handwritten digit classifier so everything is explained keeping that in mind — convolution layers, max pooling layers, RelU activation function, fully connected layers, dropout layers, cross entropy loss function, etc.

This post is a part of a 2 part series on introduction to convolution neural network (CNN).

Part 1 — Basic concepts revolving around CNNs

Part 2 — Pytorch Implementation of a CNN to classify MNIST handwritten digits

1. Convolution Layer

The goal of this post is to implement a CNN to classify MNIST handwritten digit images using PyTorch.

This post is a part of a 2 part series on introduction to convolution neural network (CNN).

Part 1 — Basic concepts revolving around CNNs

Part 2 — Pytorch Implementation of a CNN to classify MNIST handwritten digits

This post does not explain working of concepts like convolution layers, max pooling layers, fully connected layers, dropout layers, etc in detail. Read the Part 1 if you are not familiar with them.

You can find the code here —

The MNIST database…

Krut Patel

Machine Learning Engineer | Computer Vision |

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store