Let’s say we have a fictional dataset of pairs of variables, a mother and her daughter’s heights:
A little intro to linear regression first:
Linear regression is about finding a linear model that best fit a given dataset.
For example, in a simple linear regression with one input variable (i.e. one feature), the linear model is a line with formula
y = mx + b , where
m is the slope and
b the y-intercept.
The linear model of best fit is one that minimizes the sum of squared errors.
As shown in the image below, error is the difference between observed and predicted value.
As an engineer on the front-end infrastructure team of a large tech company, I deal with web application performance on a daily basis.
Performance is vital, yet throughout my career, I see teams punt on it until they can no longer get away with it.
I know performance means many things. In this article, I’m using it to refer to responsiveness, or speed of an application when users interact with it.
I think performance gets punted on because of two main reasons. One, performance work is hard. Two, it has little effect on customer value until it becomes poor. …
Hyperparameter tuning is an important part of developing a machine learning model.
In this article, I illustrate the importance of hyperparameter tuning by comparing the predictive power of logistic regression models with various hyperparameter values.
First thing’s first.
A walkthrough of Logistic Regression and Naive Bayes.
The year was 1912, and the mighty Titanic set sail on her maiden voyage. Jack, a “20 year old” “third class” “male” passenger, won a hand of poker and his ticket to the land of the free. In the last hour of April 14th, Titanic struck an iceberg, and its fate was sealed. Will Jack survive this wreckage?
(Yes, I know he died in the movie, but if he were a real person, would he have survived?)
This is a binary classification problem because we’re trying to predict one of two outcomes…
In my previous article, I talked about binary classification with logistic regression.
We had a list of students’ exam scores and GPAs, along with whether they were admitted to their town’s magnet school.
Below is an email caught by Gmail’s spam filter. How did the spam filter decide this was a spam?
Let’s say you have a dataset where each data point is comprised of a middle school GPA, an entrance exam score, and whether that student is admitted to her town’s magnet high school.
Given a new pair of
(GPA, exam score) from Sarah, how can you predict whether Sarah will be admitted? Sarah’s GPA is 4.3 and her exam score is 79.
Supposed your manager asks you to “improve the front-end performance of our React app.” Where do you start?
Although most of the items on this list now feel like second nature, the me from three or four years ago would have found them extremely helpful.
From a first principle perspective, you can think of performance as boiling down to two factors: memory and CPU. The lower the usage, the better.
As an engineer, I’m at least 2 steps removed from the end-users of my work. I rarely interact directly with customers or listen to their problems and what they want.
As an engineer, I also rarely stop to think about what makes a good design and why good design is important. As long as the app works and does its job, why should design matter?
After all, in our line of work, if a piece of code doesn’t work as expected, it’s the engineer’s problem to figure out why. …
Machine learning enthusiast