During the next months I will be struggling with a career path shift (i really prefer to say that it’s a career fork). Currently I’m working as a devops engineer, but I’m pivoting to a data science / machine learning developer position. I have always been interested in a wide range of aspects of the computer science, being more a generalist than a specialist (even that market seems to favour specialists in a concrete level of the stack, and I have specialized in distributed systems and web reliability). At the moment, data science seems an emerging field with a lot of impact and new challenges that really attracts me. I have been presented with the opportunity of being part of a new business intelligence team at my current company, and I’m excited with the possibility of learning new things and to apply my analytic mindset from this new position.

Despite being involved in some small projects related to machine learning, such as automatic text classification and other kind of predictive modelling, I’m new to this field, so I will be working hard to get in shape with its knowledge corpus. I have always been interested in the subject, and it’s a good excuse to go back to study. I have never stopped learning, but not in a structured way.

This blog entry is an action plan; a sort of guide for my next months of study, that will be for sure reevaluated during the way. My objective is to become an advanced beginner (Dreyfus has been here) in the machine learning and data science fields.

October

I have been taking the Udacity’s free course Intro to Machine Learning. I really enjoyed the topics of the course, despite not having finished it (76% completed, starting the PCA lesson). It starts from the bottom. It is not extremely technical and formal with the mathematical foundation of the algorithms presented. Even though, it gives a good overview of the machine learning landscape. I strongly recommend it to the field newcomers. My first objective during this month is to finish it. I’m estimating to finish the course in 10 hours.

Another Udacity’s course that I will do during this month is Intro to Data Analysis. It seems a much shorter course than Intro to Machine Learning, with an introductory aim. Python is my main language, so the pandas and numpy lessons will be useful, as well as the data science introduction. I estimate a dedication of 25 hours.

One of my biggest weaknesses is my lack of statistic knowledge and intuition. I have purchased a physical copy of OpenIntro Statistics (it’s available as a PDF for free) and I will be reading and studying the book, doing the exercises to refresh the knowledge acquired during my computer science degree. The book has 8 lessons (introduction to data, probability, distributions of random variables, foundations of inference, inference for numerical data, inference for categorical data, introduction to linear regression, multiple and logistic regression). During this month I want to finish the 2 first lessons, with their respective exercises. This cannot be just a diagonal lecture, for me it requires a careful study; I estimate it on 8 hours.

Also, I want to read the book Clean Code, and I’m close to ending The Art of Data Science.

If my estimations are correct (and obviously they will not be; it’s just a challenge -i will review the real dedicated time at the end of the month-), I will be studying during 43 hours (not including the lecture time). I’m starting to write this post on October 17th; this is a challenging 3 hours of dedication a day. Obviously this is impossible; I’m working full time at the same time than doing all this self study. I have 5 entire days to dedicate to study (weekends and a holiday day), so I will try to dedicate 7 hours each of these days and 1 hour the normal days. Seems to me like a crazy enterprise. If your dreams don’t scare you, they aren’t big enough.

November

The second month I will continue centered on starting to build a good foundational statistics background. I will study the pending lessons from OpenIntro Statistics. Moreover, I want to do the Udacity’s course Intro to Inferential Statistics.

I have been interested in the Go language for a few months. I want to learn the basics of Go, doing the tutorial A tour of Go. A good intersection of learning Go with my main objective of increasing my competence in machine learning could be implementing some basic machine learning algorithms. A basic linear regression and decision trees could be a good starting point (learn by doing!).

I want to read the book Data Science for business.

December

During December I want to start the Coursera’s Machine Learning course by Andrew Ng. Maybe it overlaps a little with other online courses, but a lot of people recommend it so I think it is a good ending to my introductory months.

Also, I want to continue implementing some basic machine learning algorithms with Go. During the coursera’s lessons, I will challenge myself to code some of the algorithms while reviewing them at the Andrew Ng’s course.

The book to read this month will be Machine Learning for Hackers.

Future work

At the end of the year I will review my advances and feelings after this study sprint. During January I will start working full time as a machine learning developer in the new data science team at my current workplace, so I will be challenged with a lot of practical data analysis tasks and the development of production ready predictive models.

On the self study part, I need to continue becoming confident with the theoretical foundations of machine learning as well as expanding my practical knowledge in the data science field. I want to do the Standford CS229 course and Machine Learning udacity’s course 262 by Georgia Tech. Furthermore, I want to start with the study of neural nets (the hype of the deep learning is not to be ignored). Another important objective in the future is to start participating in Kaggle competitions and building a public portfolio.

But the most important current and future task is to generate good study habits, compatible with a happy social life, a passionate work and a healthy leisure, to continue this knowledge marathon after this initial trip.

References