This repo contains a Jupyter notebook with qualitative data analysis and visualization of the data available from the Titanic disaster.
We try to find basic information such as -
1.) Who were the passengers on the Titanic? (Ages,Gender,Class,..etc)
2.) What deck were the passengers on and how does that relate to their class?
3.) Where did the passengers come from?
4.) Who was alone and who was with family?
Descriptive Analysis
Age Distribuion
Cabin Population
Class
How many people were alone and how many were with family?
Where did people board from?
Then we try to dive a little deeper - what factors saved someone from sinking? (No pun intended)
1) Overall Survival
Effect of Class on Survival
How do people who were alone fare in this?
Does it get nuanced with gender and age?
How does Survival change with age for different genders?
Class and Gender on Survival
Finally, several predictive algorithms are run on the data to identify the most effective feature and train the features to optimize the model performance. Here is the accuracy obtained with the following models (Cross-Validation was employed)
-
Logistic Regression Accuracy= 81.01
-
kNN Accuracy = 75.98
-
Naive Bayes Accuracy = 76.54
-
Random Forest Accuracy = 78.77
-
SVM Accuracy = 79.33
-
Decision Tree Accuracy = 81.01
-
BernoulliRBM Accuracy = 81.01
Done for now.