Liu/data/VOLTAGE-QUALITY-CLASSIFICAT.../Voltage Quality Classificat...

1691 lines
80 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "8d4bf69d",
"metadata": {},
"source": [
"# Voltage Quality Classification Model\n",
" The quality of power supplied to end-use equipments depends upon the quality of voltage supplied by utility. Voltage is said to have quality if it has rated value at rated frequency without any distortion from sine wave. The normal voltage quality issues are:\n",
"- Voltage Sag\n",
"- Voltage Swell\n",
"- Voltage Flicker\n",
"- Voltage Harmonics\n",
"- Voltage Interruption\n",
"\n",
"Classification of voltage quality is a must for activating corresponding controllers to mitigate the issues using compensating device. Training data is generated by using functions which simulate the above power quality issues."
]
},
{
"cell_type": "markdown",
"id": "8ba94acd",
"metadata": {},
"source": [
"# Installing the required packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4f7c0ea6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: xgboost in /opt/anaconda3/lib/python3.8/site-packages (1.4.2)\r\n",
"Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.8/site-packages (from xgboost) (1.6.2)\r\n",
"Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.8/site-packages (from xgboost) (1.20.1)\r\n"
]
}
],
"source": [
"!pip install xgboost"
]
},
{
"cell_type": "markdown",
"id": "18fc265e",
"metadata": {},
"source": [
"# Importing the required libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c8212aea",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sn\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import rcParams\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n",
"from xgboost import XGBClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.model_selection import GridSearchCV\n",
"\n",
"%matplotlib inline\n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"id": "0cf4bd32",
"metadata": {},
"source": [
"# Loading the data into the dataframe"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "be3e6804",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>56.46</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>111.20</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>162.57</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>209.00</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>249.09</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"0 1 56.46 Normal\n",
"1 2 111.20 Normal\n",
"2 3 162.57 Normal\n",
"3 4 209.00 Normal\n",
"4 5 249.09 Normal"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(\"Voltage Quality.csv\")\n",
"test = pd.read_csv(\"Voltage Quality Test.csv\")\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "959d88dc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>56.46</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>111.20</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>162.57</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>209.00</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>249.09</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"0 1 56.46 Normal\n",
"1 2 111.20 Normal\n",
"2 3 162.57 Normal\n",
"3 4 209.00 Normal\n",
"4 5 249.09 Normal"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.head()"
]
},
{
"cell_type": "markdown",
"id": "865fe385",
"metadata": {},
"source": [
"# Total number of rows and columns"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "34b1b141",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3366, 3)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.shape # 3366 rows and 3 columns"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c6fcc035",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1020, 3)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.shape"
]
},
{
"cell_type": "markdown",
"id": "91147657",
"metadata": {},
"source": [
"# Checking the type of data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a2612df2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 3366 entries, 0 to 3365\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Sample 3366 non-null int64 \n",
" 1 Voltage 3366 non-null float64\n",
" 2 Problem 3366 non-null object \n",
"dtypes: float64(1), int64(1), object(1)\n",
"memory usage: 79.0+ KB\n"
]
}
],
"source": [
"data.info()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4bda0fc9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1020 entries, 0 to 1019\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Sample 1020 non-null int64 \n",
" 1 Voltage 1020 non-null float64\n",
" 2 Problem 1020 non-null object \n",
"dtypes: float64(1), int64(1), object(1)\n",
"memory usage: 24.0+ KB\n"
]
}
],
"source": [
"test.info()"
]
},
{
"cell_type": "markdown",
"id": "593b3e7c",
"metadata": {},
"source": [
"# Checking for missing values"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a62ca453",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sample 0\n",
"Voltage 0\n",
"Problem 0\n",
"dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.isnull().sum() # No missing values"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ddf6406b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sample 0\n",
"Voltage 0\n",
"Problem 0\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.isnull().sum()"
]
},
{
"cell_type": "markdown",
"id": "451f6cda",
"metadata": {},
"source": [
"# Checking for duplicates"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "48f717bc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"510"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.duplicated().sum() # 510 duplicates"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f1aa7d57",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"136"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.duplicated().sum()"
]
},
{
"cell_type": "markdown",
"id": "2f7bc812",
"metadata": {},
"source": [
"# Finding the unique values in problem column"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "66c66225",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Normal', 'Sag', 'Swell', 'Flicker', 'Interruption', 'Harmonics'],\n",
" dtype=object)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.Problem.unique() # 6 unique values"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e0974cd2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Normal', 'Sag', 'Swell', 'Flicker', 'Interruption', 'Harmonics'],\n",
" dtype=object)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.Problem.unique()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "99109e7f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>56.46</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>111.20</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>162.57</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>209.00</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>249.09</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"0 1 56.46 Normal\n",
"1 2 111.20 Normal\n",
"2 3 162.57 Normal\n",
"3 4 209.00 Normal\n",
"4 5 249.09 Normal"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "1410da43",
"metadata": {},
"source": [
"# Analysing statistical data"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "d35d8713",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>3366.000000</td>\n",
" <td>3366.000000</td>\n",
" <td>3366</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Harmonics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>578</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>17.500000</td>\n",
" <td>0.000226</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>9.812166</td>\n",
" <td>229.620084</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>-585.480000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>9.000000</td>\n",
" <td>-208.957500</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.500000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>26.000000</td>\n",
" <td>209.255000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>34.000000</td>\n",
" <td>585.480000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"count 3366.000000 3366.000000 3366\n",
"unique NaN NaN 6\n",
"top NaN NaN Harmonics\n",
"freq NaN NaN 578\n",
"mean 17.500000 0.000226 NaN\n",
"std 9.812166 229.620084 NaN\n",
"min 1.000000 -585.480000 NaN\n",
"25% 9.000000 -208.957500 NaN\n",
"50% 17.500000 0.000000 NaN\n",
"75% 26.000000 209.255000 NaN\n",
"max 34.000000 585.480000 NaN"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe(include=\"all\")"
]
},
{
"cell_type": "markdown",
"id": "553a108e",
"metadata": {},
"source": [
"# Finding outliers"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "51229178",
"metadata": {},
"outputs": [],
"source": [
"# Function to find the outliers\n",
"def findoutliers(column):\n",
" outliers=[]\n",
" Q1=column.quantile(.25)\n",
" Q3=column.quantile(.75)\n",
" IQR=Q3-Q1\n",
" lower_limit=Q1-(1.5*IQR)\n",
" upper_limit=Q3+(1.5*IQR)\n",
" for out1 in column:\n",
" if out1>upper_limit or out1 <lower_limit:\n",
" outliers.append(out1)\n",
" \n",
" return np.array(outliers) "
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "619f5bc4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(findoutliers(data.Voltage)) # Non of the rows have outliers"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "43cfccaa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(findoutliers(data.Sample)) # Non of the rows have outliers"
]
},
{
"cell_type": "markdown",
"id": "5b9013f7",
"metadata": {},
"source": [
"# Data visualisation"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "f9b1a6f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Distribution plot of Voltage')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 504x360 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Voltage\n",
"fix, (ax1,ax2) =plt.subplots(1,2,figsize=(7,5))\n",
"sn.boxplot(data.Voltage, orient='v',ax=ax1)\n",
"ax1.set_ylabel=data.Voltage.name\n",
"ax1.set_title('Box plot of {}'.format(data.Voltage.name))\n",
"sn.distplot(data.Voltage,ax=ax2) \n",
"ax2.set_title('Distribution plot of {}'.format(data.Voltage.name))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "784eada8",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Harmonics 578\n",
"Swell 578\n",
"Sag 578\n",
"Normal 578\n",
"Flicker 544\n",
"Interruption 510\n",
"Name: Problem, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Problem\n",
"data.Problem.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "012d7672",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7ffee1309b50>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1440x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"pl = sn.factorplot('Problem',data=data, aspect=4,kind='count')\n",
"pl.set_xticklabels()"
]
},
{
"cell_type": "markdown",
"id": "07cb4c79",
"metadata": {},
"source": [
"# Identifying the independent and dependent variables"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "d18b49b6",
"metadata": {},
"outputs": [],
"source": [
"X = data.iloc[:,:-1] # Independent variable\n",
"y = data.Problem # Dependent variable"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "a22d8277",
"metadata": {},
"outputs": [],
"source": [
"x_test = test.iloc[:,:-1]\n",
"y_test = test.Problem "
]
},
{
"cell_type": "markdown",
"id": "57dfd617",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "markdown",
"id": "50e0afdc",
"metadata": {},
"source": [
"# Defining the model"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "56e08b15",
"metadata": {},
"outputs": [],
"source": [
"model_dt = DecisionTreeClassifier()"
]
},
{
"cell_type": "markdown",
"id": "0626248b",
"metadata": {},
"source": [
"# Training the model"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "91e5d0f2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier()"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_dt.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "7a358b81",
"metadata": {},
"source": [
"# Testing the model"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "5ef648e8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.95"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict_dt = model_dt.predict(x_test)\n",
"as_dt = accuracy_score(y_test,y_predict_dt)\n",
"as_dt"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "894932ae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 0.92 1.00 0.96 170\n",
" Harmonics 0.91 0.85 0.88 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 0.95 1.00 0.97 170\n",
" Sag 0.98 0.92 0.95 170\n",
" Swell 0.95 0.92 0.93 170\n",
"\n",
" accuracy 0.95 1020\n",
" macro avg 0.95 0.95 0.95 1020\n",
"weighted avg 0.95 0.95 0.95 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict_dt))"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "d9a53648",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 7, 145, 0, 5, 4, 9],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 3, 10, 0, 0, 157, 0],\n",
" [ 4, 5, 0, 4, 0, 157]])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict_dt)"
]
},
{
"cell_type": "markdown",
"id": "03be07e5",
"metadata": {},
"source": [
"# XGBoost"
]
},
{
"cell_type": "markdown",
"id": "10b72622",
"metadata": {},
"source": [
"# Defining the model"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "d9b1d0d6",
"metadata": {},
"outputs": [],
"source": [
"model_xgb = XGBClassifier(n_estimators=300)"
]
},
{
"cell_type": "markdown",
"id": "473fe685",
"metadata": {},
"source": [
"# Training the model"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "10d8cbdc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[21:26:24] WARNING: /opt/concourse/worker/volumes/live/7a2b9f41-3287-451b-6691-43e9a6c0910f/volume/xgboost-split_1619728204606/work/src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n"
]
},
{
"data": {
"text/plain": [
"XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
" colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
" importance_type='gain', interaction_constraints='',\n",
" learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
" min_child_weight=1, missing=nan, monotone_constraints='()',\n",
" n_estimators=300, n_jobs=16, num_parallel_tree=1,\n",
" objective='multi:softprob', random_state=0, reg_alpha=0,\n",
" reg_lambda=1, scale_pos_weight=None, subsample=1,\n",
" tree_method='exact', validate_parameters=1, verbosity=None)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_xgb.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "8ffd720a",
"metadata": {},
"source": [
"# Testing the model"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "ab0a0c92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.95"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict_xgb = model_dt.predict(x_test)\n",
"as_xgb = accuracy_score(y_test,y_predict_xgb)\n",
"as_xgb"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "f98f8111",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 0.92 1.00 0.96 170\n",
" Harmonics 0.91 0.85 0.88 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 0.95 1.00 0.97 170\n",
" Sag 0.98 0.92 0.95 170\n",
" Swell 0.95 0.92 0.93 170\n",
"\n",
" accuracy 0.95 1020\n",
" macro avg 0.95 0.95 0.95 1020\n",
"weighted avg 0.95 0.95 0.95 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict_xgb))"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "ebdecc0c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 7, 145, 0, 5, 4, 9],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 3, 10, 0, 0, 157, 0],\n",
" [ 4, 5, 0, 4, 0, 157]])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict_xgb)"
]
},
{
"cell_type": "markdown",
"id": "f5dc8e32",
"metadata": {},
"source": [
"# Random Forest"
]
},
{
"cell_type": "markdown",
"id": "d8d6a4f0",
"metadata": {},
"source": [
"# Defining the model"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "8a51ef49",
"metadata": {},
"outputs": [],
"source": [
"model_rf = RandomForestClassifier()"
]
},
{
"cell_type": "markdown",
"id": "11bf5e15",
"metadata": {},
"source": [
"# Training the model"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "8c568b29",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RandomForestClassifier()"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_rf.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "bb844013",
"metadata": {},
"source": [
"# Testing the model"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "654580cb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9774509803921568"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict_rf = model_rf.predict(x_test)\n",
"as_rf = accuracy_score(y_test,y_predict_rf)\n",
"as_rf"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "afa6cdb1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 0.97 1.00 0.99 170\n",
" Harmonics 0.94 0.92 0.93 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 1.00 1.00 1.00 170\n",
" Sag 0.98 0.96 0.97 170\n",
" Swell 0.97 0.98 0.97 170\n",
"\n",
" accuracy 0.98 1020\n",
" macro avg 0.98 0.98 0.98 1020\n",
"weighted avg 0.98 0.98 0.98 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict_rf))"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "cba6c72b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 5, 157, 0, 0, 3, 5],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 0, 6, 0, 0, 164, 0],\n",
" [ 0, 4, 0, 0, 0, 166]])"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict_rf)"
]
},
{
"cell_type": "markdown",
"id": "80d38c0b",
"metadata": {},
"source": [
"# Model Evaluation"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "aa4afe29",
"metadata": {},
"outputs": [],
"source": [
"Accuracy_Score = [as_dt,as_rf,as_xgb]\n",
"Models = ['Decision Tree', 'Random Forest','XG Boost']"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "bae685ee",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sn.barplot(Accuracy_Score, Models, color=\"r\")\n",
"plt.xlabel('Accuracy Score')\n",
"plt.title('Accuracy Score')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "e59d07ca",
"metadata": {},
"source": [
"Random Forest seems to have the highest accuracy score hence we go ahead with Random Forest algorithm."
]
},
{
"cell_type": "markdown",
"id": "f20c8195",
"metadata": {},
"source": [
"# Hyperparameter (using grid search)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "e513bf46",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting 5 folds for each of 120 candidates, totalling 600 fits\n"
]
},
{
"data": {
"text/plain": [
"GridSearchCV(cv=5, estimator=RandomForestClassifier(),\n",
" param_grid={'max_depth': [10, 15, 14, 13, 12],\n",
" 'n_estimators': [150, 160, 170, 180, 190, 200],\n",
" 'random_state': [4, 5, 6, 7]},\n",
" verbose=1)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"parameters = {'max_depth':[10,15,14,13,12],\n",
" 'random_state': [4,5,6,7],\n",
" 'n_estimators':[150,160,170,180,190,200]}\n",
"\n",
"grid = GridSearchCV(model_rf,parameters,cv=5,verbose=1)\n",
"grid.fit(X,y)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "10a4aca4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8773607700142415"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grid.best_score_"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "94bb569e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'max_depth': 13, 'n_estimators': 190, 'random_state': 5}"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grid.best_params_"
]
},
{
"cell_type": "markdown",
"id": "71b111fc",
"metadata": {},
"source": [
"# Training Random Forest with best parameters"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "0ee81152",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RandomForestClassifier(max_depth=13, n_estimators=190, random_state=5)"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_rf = RandomForestClassifier(n_estimators=190,max_depth=13,random_state=5)\n",
"model_rf.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "77dd9b12",
"metadata": {},
"source": [
"# Testing Random Forest with best parameters"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "ed03903a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9892156862745098"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict = model_rf.predict(x_test)\n",
"accuracy_score(y_test,y_predict)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "96b24b5a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 1.00 1.00 1.00 170\n",
" Harmonics 0.94 1.00 0.97 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 1.00 1.00 1.00 170\n",
" Sag 1.00 0.96 0.98 170\n",
" Swell 1.00 0.98 0.99 170\n",
"\n",
" accuracy 0.99 1020\n",
" macro avg 0.99 0.99 0.99 1020\n",
"weighted avg 0.99 0.99 0.99 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict))"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "9b77ef9f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 0, 170, 0, 0, 0, 0],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 0, 7, 0, 0, 163, 0],\n",
" [ 0, 4, 0, 0, 0, 166]])"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "596a373a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}