Data Science

A Student Review of Harvard CS109A Introduction To Data Science

I had the opportunity to take this module over the pandemic summer of 2020. Overall I really enjoyed this course. Its a solid module with a broad yet thorough overview of the data science landscape. Its advisable to be fluent in Python at a basic to intermediate level prior to commencing because it will allow you focus on understanding and absorbing the technical aspects of the course and there is alot to absorb!

The CS109A course is a cornerstone module of the Data Science programme at Harvard. The course is taken by students completing the Masters of Data Science programme via the SEAS Engineering school (as AC 209a) and also by those completing the MA ALM in Data Science via the Harvard Extension School. Its actually also taken as an elective by other schools in Harvard such as the Business school for example.

When Does it Run

The course runs over 3 periods annually. The summer course is a shorter period of time to take it over and tends to be filled with alot of candidates from the extension school with a mixture of undergrads from Harvard college as well. The other periods over fall and spring semester pull from a wider bank of students across Harvard schools.

What Does It Cover

I think you would benefit from taking a basic statistic course (there are several through the school, quite a few people had completed STATS109 prior to this course) before starting this as you will see the first PSET (homework 0) is an opportunity to self-filter yourself (drop off if its too much for you and get 100% refund in that first week). You’ll be asked to complete work that is about 50% calculus/statistics/linear algebra and another 50% python/coding focussed. If you are British think A-level maths/further maths, if you are American I’d say first year college level Math? This should ideally be a course you take after you have done the statistics and coding elements of the data science programme as otherwise it will be a stretch if its all new but I thoroughly enjoyed this course. I would have found it easier had I taken my own advise in hindsight but it simply just accelerated my learning and meant I had to absorb alot quickly. Its a great “summary” course of Data Science and pulls together the practical elements of what a Data Science role entails (but you will not have time to go through each deeply!). Key themes covered:

  • Scraping Data e.g. Beautiful Soup et al
  • Python Libraries – Pandas, Numpy et al
  • KNN, Linear, Logistic Regression, Neural Networks et al
  • Principal Component Analysis
  • Multiple Regression
  • Regularization – LASSO, Ridge
  • Bootsrapping
  • Decision Trees and more

Teaching & Faculty

Solid. Even for the summer course the teaching support, sessions, office hours were thorough and supportive. If you put in the time you will get alot out of it and them. The lead for the course when I took this was from the Statistics school. He is extremely knowledgeable and I also like how he incorporated aspects of what was happening in the world data wise during the course. For example we had a session around building models and accounting for bias and looked at themes like race, the criminal justice system and explored how these models were built or how we would approach them ourselves. Given we also had to do all of this virtually they did well to still help create that discussion led experience in small and larger group sets.

Workload

You will have psets to complete weekly and they can be quite lengthy. Again you have to make a judgement on how much pressure this will be for you personally. If you are already a practising data scientist familiar with models and working in python this should be quite easy for you to follow and accomplish quickly. If you are starting from scratch you will have more effort you need to put in and not just lecture time but attending sessions and possibly office hours too etc.

Is It Worth It?

In a nutshell yes. If you are doing the course to obtain a certificate or degree then absolutely no question. If you are new to data science, as I mentioned its a great summary course. If you are further ahead you may not find it as beneficial unless you are a data scientist with a narrow focus or only used to working in a language like R otherwise coupled with the quality of teaching and resources you would be exposed to, you’d be hard pressed not to get some sort of value from a core programme like this. Checkout the course content and material here: CS109A Course.

About the author

may

May is a student of data science and recently completed a tech entrepreneurship MBA. She also runs a tech podcast around emerging markets for Cornell University and is fascinated by the world of Big Data and sharing insight.

Add Comment

Click here to post a comment

Share via
Copy link
Powered by Social Snap