Practical Exam : IT 441-Data Science

ID : 18IT038

Name : Dishank Jani

Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Orange tool is a famous tool which is able to preprocess data efficiently in a easy way.

Here I have used Autistic Spectrum Disorder(ASD) Screening Dataset for Adult which contains 704 instances and 21 attributes.

This data set is a binary classification problem where we have to classify in 2 categories yes(1) or No(0) and the data is continues.

The Dataset is mainly for ASD detection whether the person has that disease or not.

Here is the Visual that I have created where one part is with preprocessing and another without preprocessing.

Here our Class/ASD data is our target variable which can be selected through select column widget.

First of all I have used Imputer for removing the Null Values by taking average of most values.

Next, is normalization to scale the data to reduce dilution.

Similarly We have selected relevent features to improver accuracy.

Once the Preprocessing is finished the model is saved.

It is Important to divide the dataset into training and testing set 80% and 20%, for that we use data sampling.

For Binary Classification I have used KNN and Random forest.

for KNN : Without preprocessing I got 0.765 Precision and With Preprocessing it Increased to 0.979 Precision.

Similar Increase is observed for Random forest also which precision increased from 0.991 to 1 with Preprocessing.

Similar Difference is observed for Confusion Matrix also.

Task-2:
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

Here for the above Visualization, I have visualized different features for the Target Variable Class/ASD.

I have used Pie Plot, Stacked Bar Chart and Stack Column plot for Visualization.

Box Plot can be used once we are signed up.

Data science And machine learning enthusiast