Practical Exam : IT 441-Data Science

Dishank Jani
3 min readNov 18, 2021

ID : 18IT038

Name : Dishank Jani

Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection

Orange tool is a famous tool which is able to preprocess data efficiently in a easy way.

Here I have used Autistic Spectrum Disorder(ASD) Screening Dataset for Adult which contains 704 instances and 21 attributes.

This data set is a binary classification problem where we have to classify in 2 categories yes(1) or No(0) and the data is continues.

The Dataset is mainly for ASD detection whether the person has that disease or not.

Here is the Visual that I have created where one part is with preprocessing and another without preprocessing.

Visual Design

Here our Class/ASD data is our target variable which can be selected through select column widget.

Selection of Target Variable

First of all I have used Imputer for removing the Null Values by taking average of most values.

Impute missing values

Next, is normalization to scale the data to reduce dilution.

Similarly We have selected relevent features to improver accuracy.

Normalization and Selection of relevent features

Once the Preprocessing is finished the model is saved.

It is Important to divide the dataset into training and testing set 80% and 20%, for that we use data sampling.

For Binary Classification I have used KNN and Random forest.

Comparison of Model accuracy with and without Preprocessing

for KNN : Without preprocessing I got 0.765 Precision and With Preprocessing it Increased to 0.979 Precision.

Similar Increase is observed for Random forest also which precision increased from 0.991 to 1 with Preprocessing.

Similar Difference is observed for Confusion Matrix also.

Confusion Matrix Comparision for With Processed Model and Without Preprocessed KNN Model

Task-2:
Generate the Dashboard of preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.

Data Visualization using POWERBI

Here for the above Visualization, I have visualized different features for the Target Variable Class/ASD.

I have used Pie Plot, Stacked Bar Chart and Stack Column plot for Visualization.

Box Plot can be used once we are signed up.

--

--