Let’s say we have a fictional dataset of pairs of variables, a mother and her daughter’s heights:

*Man is to king as woman is to what?*

How can you train a machine learning algorithm to correctly predict the right answer, “queen”?

In this article, I’m going to talk about a natural language processing model — Word2Vec —that can be used to make such predictions.

In the Word2Vec model, every word of the vocabulary is represented by a vector. The length of this vector is a hyperparameter that we the human set. Let’s use 100 in this article.

Since every word is a vector of 100 numbers, you can think of every word as a point in a…

Generalized linear models are a group of models with some common attributes. These common attributes are:

- The distribution of the response variable (i.e. the label), given an input x, is a member of the exponential family of distributions.
- The natural parameter of the exponential family distribution is a linear combination of theta (i.e. the model parameters) and input data.
- At prediction time, the output of the model for a given x is the expected value of the distribution for that x.

If a model has these 3 characteristics, it is a generalized linear model.

Before diving in further, this article…

If you’re thinking about Stanford’s AI Professional Certificate program, I hope you might find this article helpful.

This meme pretty much sums up my reaction after opening the first problem set.

Why? Well, you see, sh*t’s hard.

For those who don’t know, CS229i is a machine learning class at Stanford. It is part of Stanford’s Artificial Intelligence Professional Program.

In the first problem set, we were asked to derive the mean and variance of the exponential family’s probability density function (PDF). We were also asked to compute the Hessian matrix of the loss function derived from exponential family distribution.

Last…

A little intro to linear regression first:

**Linear regression is about finding a linear model that best fit a given dataset.**

For example, in a simple linear regression with one input variable (i.e. one feature), the linear model is a line with formula `y = mx + b`

, where `m`

is the slope and `b`

the y-intercept.

**The linear model of best fit is one that minimizes the sum of squared errors.**

As shown in the image below, error is the difference between observed and predicted value.

As an engineer on the front-end infrastructure team of a large tech company, I deal with web application performance on a daily basis.

Performance is vital, yet throughout my career, I see teams punt on it until they can no longer get away with it.

I know performance means many things. In this article, I’m using it to refer to responsiveness, or speed of an application when users interact with it.

I think performance gets punted on because of two main reasons. One, performance work is hard. Two, it has little effect on customer value until it becomes poor. …

Hyperparameter tuning is an important part of developing a machine learning model.

In this article, I illustrate the importance of hyperparameter tuning by comparing the predictive power of **logistic regression **models with various hyperparameter values.

First thing’s first.

- Parameters are estimated from the dataset. They are part of the model equation. The equation below is a logistic regression model. Theta is the vector containing the parameters of the model.

A walkthrough of Logistic Regression and Naive Bayes.

The year was 1912, and the mighty Titanic set sail on her maiden voyage. Jack, a “*20 year old” “third class” “male”* passenger, won a hand of poker and his ticket to the land of the free. In the last hour of April 14th, Titanic struck an iceberg, and its fate was sealed. Will Jack survive this wreckage?

(Yes, I know he died in the movie, but if he were a real person, would he have survived?)

This is a binary classification problem because we’re trying to predict one of two outcomes…

In my previous article, I talked about binary classification with logistic regression.

We had a list of students’ exam scores and GPAs, along with whether they were admitted to their town’s magnet school.

Below is an email caught by Gmail’s spam filter. How did the spam filter decide this was a spam?

Machine learning enthusiast