Google Cloud Platform ML Engineering

I was looking for a good course on ML Engineering, and after poking around decided the GCP ML Engineering Course would be most useful for my purposes.

The Reasons Behind My Choice

The course was designed to provide a pathway to certification as well as day-to-day usage
GCP complements and rounds out my existing AWS skills and experience
The foundational ML knowledge needed to use the tools effectively is taught (e.g. decisions driving the trade offs between recall and precision)
Google’s has a long history of ML leadership

My Thoughts So Far

I am about 20% in, on module four. There are 20 modules total, and each module has between 1 and 4 sections, each with a lab. After doing the first few modules here is what I am thinking.

Google does a great job with documentation

The clarity of their documentation is remarkable, and this is reflected on every level: word choice, syntax, task breakdown, linked references, etc. I used to work for an educational publisher, so I understand all the choices that go into helping people learn.

Most of the material is presented as a video lesson, followed by a step-by-step walkthrough in the lab. Thus far the labs seem to be mostly about walking you through the console and the ML capabilities withing GCP. “Here’s BigQuery. Did you know you can use SQL to build a model right in BigQuery? Here’s how.” etc.

The Conceptual Lessons

Also excellent. They make good use of graphics to explain the fundamentals, and use analogies to clarify the concepts. For example, a very clear graphic on what a confusion matrix is, how to calculate metrics on a confusion matrix, an example of calculating this in practice, and then a walk through of two business use cases where you would choose to prioritize recall over precision, or vice versa.

I also thought the conceptual breakdown of the component parts of neural net and the “whys” behind each step in training one was excellent. For example, they explain why activation functions are needed¹, some of the most common activation functions, and where and why they might be used.

The Labs

As mentioned above, the lab often seems to be a bit of a guided tour of the interface. There is a lot of hand-holding. But they give you a certain number of free credits each month so that you can get through the course without ever having to purchase usage credit.

When you are ready to do the lab, they spin up a temporary environment for you, and you complete the lab. However, when working through the lab, if you don’t name something exactly as specified in the documentation, or if you make an error in naming and go back and correct it, you may not get credit for completing the lab. So the way they “check” your progress is finicky. (I don’t know if this matters in the end for getting “credit”.)

In the AutoML classification model training lab I just completed, you’re given pre-stored data that’s between 1000-10000 rows and instructions on how to start training a model. But the one hour you are granted per lab was not enough time to actually train the model. By the time the clock ran out the model was maybe 10% into training. Also, the result on each training epochs was not output to a graph on each pass so you can’t see how your training is progressing.

My desire is always to pursue a task to completion. But instead of allowing for this, they provide a shortcut, which is to have you call an endpoint for a model that’s already been trained (on the same data) to make a prediction. So they are aware this is not enough time. It’s probably more cost effective not to waste energy on training the same model over and over again. But why not do something like, have you take pre-trained model and add to its training with a new, smaller dataset?

Overall

The deeper I am getting into the course, the more I am psyched to be taking it. I am getting the foundational ML knowledge I wanted, and access to some really great tools to bring projects to life.

My plan is to crank through the modules at the pace of at least 1/week. So I should be ready to take the certification exam in a few months.

Yay

Turns out they are the key to making neural nets function. You want a non-linear (or non-determinstic) output, which is more akin to how a human brain thinks. ↩