Final Project


Due Dates

Mid-Semester Check-In: Between lecture 5 and lecture 7

Final Submission: November 20, 2024


Project Description

The final project is worth 40% of your grade and it is meant to be a culmination of all the knowledge and techniques you acquired during the semester. We will be working on it throughout the entirety of the course, with the mid-semester check-in helping to guide you along. It’s recommended to work in a group but completing it alone is also fine this semester. See the project pdf for more details on the exact requirements.

Each week in this course we introduce a new machine learning concept. It is expected that you are working on your project bit by bit each week. Use what we learned in lecture and from the assignment that week and apply it to your project. This way, when the end of the semester comes your project should be mostly done.


Helpful Milestones

Here is a hypothetical schedule we encourage you to follow. This mirrors what we will learn each week.

Week 1: Find a group to work with

Week 2: Find a data set and define the question you want to answer with that data set. Some good resources to explore are Kaggle, Data.gov, UCI Machine Learning Repository, and Data Hub

Week 3: Begin to explore and manipulate your data set. If you find that your data set might not be suitable for your objective we recommend you find another one now. Start to draft your hypothesis and question.

Week 4: Continue to clean and visualize your data and preform any necessary feature engineering. Finalize your hypothesis and question.

Week 5: Create some visualizations.

Week 6: Begin to think about which models you may want to use. Be sure to consider if your problem is a regression problem or a classification problem. Complete the midsemester check-in anytime between now and before lecture 8.

Week 7: Continue with model construction. Consider if you should be using cross validation in your project.

Week 8: Finish up model construction. Implement any other validation techniques if applicable. Revisit your visualizations and add any that may be beneficial. This would be a great time to stop by office hours if you are having trouble or to make sure you’re on the right track!

Week 9: Complete any final touches and get last minute questions answered. Make sure your write up reflects your thought process throughout the project. Interpret your results and relate it back to your original problem statement/hypothesis.

Week 10: Turn in


Past Projects

Here are some past projects that are good examples. These are meant for inspiration; please do not copy any of their code.

Predicting Heart Failure

Predicting Used Car Prices


Academic Integrity

Many of the data sets you find online also have a data science project associated with them. Do not copy any of these projects or others you find online. Our instructors have caught these in the past and the penalty for plagiarism is an unsatisfactory (U) grade. Refer to the Cornell University Code of Academic Integrity http://cuinfo.cornell.edu/aic.cfm