Diving deep with Titanic (no pun intended)

05 Jan 2017

This repo contains a Jupyter notebook with qualitative data analysis and visualization of the data available from the Titanic disaster.

We try to find basic information such as -

1.) Who were the passengers on the Titanic? (Ages,Gender,Class,..etc)

2.) What deck were the passengers on and how does that relate to their class?

3.) Where did the passengers come from?

4.) Who was alone and who was with family?

Descriptive Analysis

Age Distribuion

age-histogram

age-kde-plot

Cabin Population

cabin-population

Class

gender-from-every-class

gender-from-every-class-children-included

gender-ratio-in-every-class

How many people were alone and how many were with family?

alone-bar

Where did people board from?

boarding-point

Then we try to dive a little deeper - what factors saved someone from sinking? (No pun intended)

1) Overall Survival

overall-survival

Effect of Class on Survival

effect-of-class-on-survival

How do people who were alone fare in this?

family-survival

Does it get nuanced with gender and age?

gender-family-survival

family-age-survival

How does Survival change with age for different genders?

Effect of age and gender on survival

Class and Gender on Survival

gender-class-survival

Finally, several predictive algorithms are run on the data to identify the most effective feature and train the features to optimize the model performance. Here is the accuracy obtained with the following models (Cross-Validation was employed)

  1. Logistic Regression Accuracy= 81.01

  2. kNN Accuracy = 75.98

  3. Naive Bayes Accuracy = 76.54

  4. Random Forest Accuracy = 78.77

  5. SVM Accuracy = 79.33

  6. Decision Tree Accuracy = 81.01

  7. BernoulliRBM Accuracy = 81.01

Done for now.