Data preprocessing

Many sources tell you that preprocessing data (i.e., cleaning and formatting) often takes up a huge amount of your working time as an ML engineer.

To learn these tools, I originally spent some money on Udemy's "Machine Learning A-Z (opens in a new tab)" course. If you've got a few bucks to spare, I recommend it! In keeping with this site's goal to share only free YouTube resources, I've chosen ones that hit the subjects I found most useful down the road.

Imputation

One-Hot Encoding

Feature Scaling

This one was covered in Ng's course in the first section.

There is much more to this subject and I only scratched the surface. One additional thing I wish I would've been clearer on at this stage is dimensionality reduction (opens in a new tab) as a way to lower your overall feature count (thus minimizing the curse of dimensionality.)

If you can spare the time, I'd advise learning something like PCA (opens in a new tab). Reducing multidimensional data to a 2D plot and then making a graph of said plot is a great way to 'wow' your audience and get a better understanding of your data.

At this point, I'd also recommend watching a couple videos on NumPy, Pandas, and SciKit Learn.

Calc and Linear Algebra XGBoost and Random Forest