Finding the Most Desirable Data Science Skills – an Analysis of Data Scientist Job Postings

Data Scientist has commonly been referred to as the sexist job in the 21st Century over the last few years. Indeed, as a job which utilises state-of-the-art algorithms straight out of research papers, and often associated with futuristic inventions such as self-driving cars and human-like chatbots, what is not to like about being a dataContinue reading “Finding the Most Desirable Data Science Skills – an Analysis of Data Scientist Job Postings”

COVID-19 Literature Review using NLP – Part #4 Summarising Articles

In the last few blogs I have shown how the NLP techniques or Topic Modelling and Word2Vec can be used in literature review by grouping relevant articles together and narrow down the number of articles that are relevant and needs to be analysed. In this final blog of the series, I will show how toContinue reading “COVID-19 Literature Review using NLP – Part #4 Summarising Articles”

COVID-19 Literature Review using NLP – Part #3 K-mean clustering of word vectors

In the last blog I have shown how to create word vectors using Python gensim‘s Word2Vec implementation and a large number of text as inputs, as well as the powerful features of similarities and analogies in the word vector space. In this blog, I will show how these word vectors can be used to furtherContinue reading “COVID-19 Literature Review using NLP – Part #3 K-mean clustering of word vectors”

COVID-19 Literature Review using NLP – Part #2 TF-IDF and Word Vectors

In the last blog I have showed how to use topic modelling to seperate a large number of articles into different topics. In this blog, I am going to talk about how to take one step further, and use word vectors and clustering to track down the articles that you really need. Recall from theContinue reading “COVID-19 Literature Review using NLP – Part #2 TF-IDF and Word Vectors”

COVID-19 Literature Review using Natural Language Process – Part #1 Topic Modelling

Ask any PhD students and they will tell you that literature review is a daunting task. Sifting through mountains of papers to find the specific information that you need is no easy feat, especially there are often conflicting information, and worse, abstracts that looks promising but nothing of interest actually offer in the full-text. ThereContinue reading “COVID-19 Literature Review using Natural Language Process – Part #1 Topic Modelling”

Step by step guide to Data Science with Python Pandas and scikit-learn #4: Interpreting and Improving a Model

Data science and Machine learning have become buzzwords that swept the world by storm. They say that data scientist is one of the sexiest job in the 21st century, and machine learning online courses can be found easily online, focusing on fancy machine learning models such as random forest, state vector machine and neural networks.Continue reading “Step by step guide to Data Science with Python Pandas and scikit-learn #4: Interpreting and Improving a Model”

Step by step guide to Data Science with Python Pandas and scikit-learn #3: Applying Machine Learning Pipeline

This is part 3 of a 4 part tutorial which provides a step-by-step guide in addressing a data science problem with machine learning, using the complete data science pipeline and python pandas and scikit-learn as a tool, to analyse and covert data into useful information that can be used by a business. In this thirdContinue reading “Step by step guide to Data Science with Python Pandas and scikit-learn #3: Applying Machine Learning Pipeline”

Step by step guide to machine learning with Python Pandas and scikit-learn #1: Understanding, Loading and Cleaning of Data

Data science and Machine learning has become buzzwords that swept the world by storm. They say that data scientist is one of the sexiest job in the 21st century, and machine learning online courses can be found easily online, focusing on fancy machine learning models such as random forest, state vector machine and neural networks.Continue reading “Step by step guide to machine learning with Python Pandas and scikit-learn #1: Understanding, Loading and Cleaning of Data”

Design a site like this with WordPress.com
Get started