Data Science

I have been thinking of focusing a bit more on Data Analysis where I am very interested in. So I started with a Data Science course. It has 9 courses provided by IBM. Once I have finished all the courses, I will get a IBM Data Science Professional Certificate. I have finished two of the courses, the level of difficulty is increasing one after another.

I would like to summarise what I have learned so far.

Week 1 provides fundamental knowledge about what is data science and what is data scientist.

Data Scientist is someone who finds solutions to problems by analysing big or small data using appropriate tools and then tells stories to communicate her findings to the relevant stakeholders. And Data Science means using scientific methods, algorithms to extract knowledge and insights from structured and unstructured data in various forms.

It sounds like a very interesting role which requires a combination skills of statistics, programming, creative, analysis, social and communication skills.

Week 2 introduces the open source tools you could use to analyse data. Notebooks are collaborative web-based environments for data exploration and visualisation — the perfect toolbox for data science. It includes Jupyter Notebook, Apache Zeppelin Notebook, Rstudio IDE and IBM Watson Notebook.

They all are very similar, but they are slightly different. Jupyter Notebook, formerly known as IPython notebook, is a platform can use several programming languages including Python, R and Scala. But you can run multiple cell with more than one language. Apart from the default language Python, if you want to run other language, you have to install additional kennels. However, it’s very popular.

Apache Zeppelin Notebook, one of big selling point of Zeppelin Notebook is you can use multiple languages across cells. It supports Scala (with Apache Spark), Python (with Apache Spark), SparkSQL, Hive, Markdown and Shell. It’s one of the key differences compared to Jupyter Notebook.

Rstudio IDE is only using R language.

The above is just my understanding from the course, if you want to know a bit more about other notebooks and difference they have , here is a link providing much more detailed information. Happy reading.



