Getting Started

A simple explanation and implementation of gradient descent

Image created by me

Let’s say we have a fictional dataset of pairs of variables, a mother and her daughter’s heights:

Hands-on Tutorials

Understanding the math and implementation of Word2Vec

Image by author

Man is to king as woman is to what?

How can you train a machine learning algorithm to correctly predict the right answer, “queen”?

In this article, I’m going to talk about a natural language processing model — Word2Vec —that can be used to make such predictions.

In the Word2Vec model, every word of the vocabulary is represented by a vector. The length of this vector is a hyperparameter that we the human set. Let’s use 100 in this article.

Since every word is a vector of 100 numbers, you can think of every word as a point in a…

Hands-on Tutorials

What makes a GML GML?

GLMs (image by author)

Generalized linear models are a group of models with some common attributes. These common attributes are:

  1. The distribution of the response variable (i.e. the label), given an input x, is a member of the exponential family of distributions.
  2. The natural parameter of the exponential family distribution is a linear combination of theta (i.e. the model parameters) and input data.
  3. At prediction time, the output of the model for a given x is the expected value of the distribution for that x.

If a model has these 3 characteristics, it is a generalized linear model.

Before diving in further, this article…

A midway review of Stanford machine learning class CS229i

If you’re thinking about Stanford’s AI Professional Certificate program, I hope you might find this article helpful.

This meme pretty much sums up my reaction after opening the first problem set.

Two panels from KC Green’s “On Fire.” Photo: KC Green

Why? Well, you see, sh*t’s hard.

For those who don’t know, CS229i is a machine learning class at Stanford. It is part of Stanford’s Artificial Intelligence Professional Program.

In the first problem set, we were asked to derive the mean and variance of the exponential family’s probability density function (PDF). We were also asked to compute the Hessian matrix of the loss function derived from exponential family distribution.


Reason behind least squares

Image by author

A little intro to linear regression first:

Linear regression is about finding a linear model that best fit a given dataset.

For example, in a simple linear regression with one input variable (i.e. one feature), the linear model is a line with formula y = mx + b , where m is the slope and b the y-intercept.

The linear model of best fit is one that minimizes the sum of squared errors.

As shown in the image below, error is the difference between observed and predicted value.

Machine Learning, Programming

A case for machine learning in predicting speed and responsiveness

Image by author

As an engineer on the front-end infrastructure team of a large tech company, I deal with web application performance on a daily basis.

Performance is vital, yet throughout my career, I see teams punt on it until they can no longer get away with it.

I know performance means many things. In this article, I’m using it to refer to responsiveness, or speed of an application when users interact with it.

I think performance gets punted on because of two main reasons. One, performance work is hard. Two, it has little effect on customer value until it becomes poor. …

Hands-on Tutorials, Getting Started

The What, Why, and How of Hyperparameter Tuning

Image source

Hyperparameter tuning is an important part of developing a machine learning model.

In this article, I illustrate the importance of hyperparameter tuning by comparing the predictive power of logistic regression models with various hyperparameter values.

First thing’s first.

What are hyperparameters? — The what

Parameter vs. hyperparameter

  • Parameters are estimated from the dataset. They are part of the model equation. The equation below is a logistic regression model. Theta is the vector containing the parameters of the model.

How machine learning answers the question

A walkthrough of Logistic Regression and Naive Bayes.

Image source

The year was 1912, and the mighty Titanic set sail on her maiden voyage. Jack, a “20 year old” “third class” “male” passenger, won a hand of poker and his ticket to the land of the free. In the last hour of April 14th, Titanic struck an iceberg, and its fate was sealed. Will Jack survive this wreckage?

(Yes, I know he died in the movie, but if he were a real person, would he have survived?)

This is a binary classification problem because we’re trying to predict one of two outcomes…

Getting Started

A beginner’s guide to multiclass classification

Image by author

In my previous article, I talked about binary classification with logistic regression.

We had a list of students’ exam scores and GPAs, along with whether they were admitted to their town’s magnet school.

A beginner’s guide to spam classification

Image by author

Below is an email caught by Gmail’s spam filter. How did the spam filter decide this was a spam?

Lily Chen

Machine learning enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store