Structured Data Classification

Classification of structured data

In a work environment, most of us create structured data which, due to its nature, is straightforward when it comes to classification. Often, it’s this structured data that is the most sensitive and therefore most reliant on classification for its protection.

What is structured data?

Structured data is classified data that, due to its highly organised nature, is typically hosted on critical data segregation databases such as SharePoint, Documentum or SAP. These platforms, with the vast amount of data they store, use data classification to help ensure data is stored in the appropriate section to facilitate the correct permissions for its level of sensitivity. So those who have permissions are the only ones who have access to your valuable data.

Defining unstructured data

Unstructured data is data that, because it isn’t classified, is much harder to order, segregate and track. As a result, unstructured data is much harder to protect and control.

Structured data Hands-On

Click here to read more interview Questions and Answers of Tableau

Welcome to Structured Data Classification(75 Min)

File Name: Structured_test

Step 1: – 

import pandas as pd

import numpy as np

import dataframe as df

Step 2:- 

weather = pd.read_csv(‘weather.csv’, sep=’,’)

Step 3:- 

data_size=weather.shape

print(data_size)

weather_col_names = list(weather.columns)

print(weather_col_names)

print(weather.describe())

print(weather.head(3))

Step 4:-

weather_target=weather[‘RainTomorrow’] 

print(weather_target)

Step 5:-

cols_to_drop = [‘Date’,’RainTomorrow’]

weather_feature = weather.drop(cols_to_drop,axis = 1)

print(weather_feature.head(5))

Step 6:

weather_categorical = weather.select_dtypes(include=[object])

print(weather_categorical.head(15))

Step 7:- 

yes_no_cols = [“RainToday”]

weather_feature[yes_no_cols] = weather_feature[yes_no_cols] == ‘Yes’

print(weather_feature.head(5))

Step 8:-

weather_dumm=pd.get_dummies(weather_feature, columns=[“Location”,”WindGustDir”,”WindDir9am”,”WindDir3pm”], prefix=[“Location”,”WindGustDir”,”WindDir9am”,”WindDir3pm”])

weather_matrix = weather_dumm.values.astype(np.float)

Step 9:- 

from sklearn.impute import SimpleImputer

imp=SimpleImputer(missing_values=np.nan,strategy=’mean’, fill_value=None,verbose=0,copy=True)

weather_matrix=imp.fit_transform(weather_matrix)

Step 10:-

from sklearn.preprocessing import StandardScaler

#Standardize the data by removing the mean and scaling to unit variance

scaler = StandardScaler()

#Fit to data, then transform it.

weather_matrix = scaler.fit_transform(weather_matrix)

Step 11:- 

from sklearn.model_selection import train_test_split

seed=5000

train_data,test_data, train_label, test_label = train_test_split(weather_matrix,weather_target,test_size=0.1,random_state = seed)

Step 12:- 

from sklearn.svm import SVC

classifier = SVC(kernel=”linear”,C=0.025,random_state=seed )

classifier = classifier.fit(train_data,train_label)

churn_predicted_target=classifier.predict(test_data)

score = classifier.score(test_data,test_label)

print(‘SVM Classifier : ‘,score)

with open(‘output.txt’, ‘w’) as file:

file.write(str(np.mean(score)))

Step 13:- 

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(max_depth=5,n_estimators=10,max_features=10,random_state=seed)

classifier = classifier.fit(train_data,train_label)

churn_predicted_target=classifier.predict(test_data)

score = classifier.score(test_data,test_label)

print(‘Random Forest Classifier : ‘,score)

with open(‘output1.txt’, ‘w’) as file:

file.write(str(np.mean(score)))

Click here to read more recents blogs of us

About Author


After years of Technical Work, I feel like an expert when it comes to Develop wordpress website. Check out How to Create a Wordpress Website in 5 Mins, and Earn Money Online Follow me on Facebook for all the latest updates.

Leave a Comment