From Encodings to Embeddings
In this article, we will talk about two fundamental concepts in the fields of data representation and machine learning: Encoding and Embedding. The content of this article is partly taken from one of my lectures in CS246 Mining Massive DataSet (MMDS) course at Stanford University. I hope you find it useful.
Introduction
All Machine Learning (ML) methods work with input feature vectors and almost all of them require input features to be numerical. From a ML perspective, there are four types of features:
Numerical (continuous or discrete): numerical data can be characterized by continuous or discrete data. Continuous data can assume any value within a range whereas discrete data has distinct values. Example of continues numerical variable is `height`, and an example of discrete numerical variable is `age`.
0 Comments