How To Have a Successful ML Term Project

Hint: It's not easy. Many have tried and failed, but here's what I think might help.

1. Have a source for lots of good data

Yeah, it'd be cool to build a playlist creator, but where is the data going to come from? Are you going to have to create all the playlists yourself? In this case, there isn't much good data here.

In contrast, if you're doing something like Optical Character Recognition (OCR), there's a ton of labeled data available (in fact, it's called the MNIST dataset). It's even built into most ML modules.

Use an API

There are many Application Programmable Interfaces (API's) readily available online. Basically, they're ways that companies expose data for people to use and access. Usually you'll have to get what's called an API key to be authorized to access the data, but once you do that, you should be able to access lots and lots of good data. Some API's you may consider include Yelp, Twitter, and Spotify.

2. Clearly define what problem you're going to solve

As you may be able to tell, machine learning problems are very structured. There's no AI that's going to take over the world (yet). Tasks are very specific and as such, must be explicitly defined. Before you start, make sure to define the following (at least)

  • What problem do I want to solve? (what function do I want to approximate?)

  • X's (what features you're feeding into your ML model)

  • Y's (what result am I trying to predict?)

  • Do I have the data to do this? If so, what kind of preprocessing do I need to do?

  • How will this be useful in my project? Do I need to use ML here?

3. Design a Pipeline, using ML modules first

Build out your ML pipeline (described at a high level here) first, and the use a module like sklearn to do the machine learning at first. If you feel like you want to implement your own algorithm, replace that part with your approach, bit-by-bit.

Even if you don't want to implement the full algorithm, implementing something like hyperparameter cross-validation is a great way to learn about a machine learning technique. Pick and choose what you feel like implementing, and build out the rest of your project!

Last updated