Liu/data/VOLTAGE-QUALITY-CLASSIFICAT.../Voltage Quality Classificat...

1691 lines
80 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"id": "8d4bf69d",
"metadata": {},
"source": [
"# Voltage Quality Classification Model\n",
" The quality of power supplied to end-use equipments depends upon the quality of voltage supplied by utility. Voltage is said to have quality if it has rated value at rated frequency without any distortion from sine wave. The normal voltage quality issues are:\n",
"- Voltage Sag\n",
"- Voltage Swell\n",
"- Voltage Flicker\n",
"- Voltage Harmonics\n",
"- Voltage Interruption\n",
"\n",
"Classification of voltage quality is a must for activating corresponding controllers to mitigate the issues using compensating device. Training data is generated by using functions which simulate the above power quality issues."
]
},
{
"cell_type": "markdown",
"id": "8ba94acd",
"metadata": {},
"source": [
"# Installing the required packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4f7c0ea6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: xgboost in /opt/anaconda3/lib/python3.8/site-packages (1.4.2)\r\n",
"Requirement already satisfied: scipy in /opt/anaconda3/lib/python3.8/site-packages (from xgboost) (1.6.2)\r\n",
"Requirement already satisfied: numpy in /opt/anaconda3/lib/python3.8/site-packages (from xgboost) (1.20.1)\r\n"
]
}
],
"source": [
"!pip install xgboost"
]
},
{
"cell_type": "markdown",
"id": "18fc265e",
"metadata": {},
"source": [
"# Importing the required libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c8212aea",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sn\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib import rcParams\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.svm import SVC\n",
"from sklearn.metrics import classification_report, confusion_matrix, accuracy_score\n",
"from xgboost import XGBClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.model_selection import GridSearchCV\n",
"\n",
"%matplotlib inline\n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"id": "0cf4bd32",
"metadata": {},
"source": [
"# Loading the data into the dataframe"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "be3e6804",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>56.46</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>111.20</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>162.57</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>209.00</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>249.09</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"0 1 56.46 Normal\n",
"1 2 111.20 Normal\n",
"2 3 162.57 Normal\n",
"3 4 209.00 Normal\n",
"4 5 249.09 Normal"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data = pd.read_csv(\"Voltage Quality.csv\")\n",
"test = pd.read_csv(\"Voltage Quality Test.csv\")\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "959d88dc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>56.46</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>111.20</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>162.57</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>209.00</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>249.09</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"0 1 56.46 Normal\n",
"1 2 111.20 Normal\n",
"2 3 162.57 Normal\n",
"3 4 209.00 Normal\n",
"4 5 249.09 Normal"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.head()"
]
},
{
"cell_type": "markdown",
"id": "865fe385",
"metadata": {},
"source": [
"# Total number of rows and columns"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "34b1b141",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3366, 3)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.shape # 3366 rows and 3 columns"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "c6fcc035",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1020, 3)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.shape"
]
},
{
"cell_type": "markdown",
"id": "91147657",
"metadata": {},
"source": [
"# Checking the type of data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a2612df2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 3366 entries, 0 to 3365\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Sample 3366 non-null int64 \n",
" 1 Voltage 3366 non-null float64\n",
" 2 Problem 3366 non-null object \n",
"dtypes: float64(1), int64(1), object(1)\n",
"memory usage: 79.0+ KB\n"
]
}
],
"source": [
"data.info()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4bda0fc9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1020 entries, 0 to 1019\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Sample 1020 non-null int64 \n",
" 1 Voltage 1020 non-null float64\n",
" 2 Problem 1020 non-null object \n",
"dtypes: float64(1), int64(1), object(1)\n",
"memory usage: 24.0+ KB\n"
]
}
],
"source": [
"test.info()"
]
},
{
"cell_type": "markdown",
"id": "593b3e7c",
"metadata": {},
"source": [
"# Checking for missing values"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a62ca453",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sample 0\n",
"Voltage 0\n",
"Problem 0\n",
"dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.isnull().sum() # No missing values"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "ddf6406b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sample 0\n",
"Voltage 0\n",
"Problem 0\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.isnull().sum()"
]
},
{
"cell_type": "markdown",
"id": "451f6cda",
"metadata": {},
"source": [
"# Checking for duplicates"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "48f717bc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"510"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.duplicated().sum() # 510 duplicates"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f1aa7d57",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"136"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.duplicated().sum()"
]
},
{
"cell_type": "markdown",
"id": "2f7bc812",
"metadata": {},
"source": [
"# Finding the unique values in problem column"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "66c66225",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Normal', 'Sag', 'Swell', 'Flicker', 'Interruption', 'Harmonics'],\n",
" dtype=object)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.Problem.unique() # 6 unique values"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e0974cd2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Normal', 'Sag', 'Swell', 'Flicker', 'Interruption', 'Harmonics'],\n",
" dtype=object)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test.Problem.unique()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "99109e7f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>56.46</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>111.20</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>162.57</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>209.00</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>249.09</td>\n",
" <td>Normal</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"0 1 56.46 Normal\n",
"1 2 111.20 Normal\n",
"2 3 162.57 Normal\n",
"3 4 209.00 Normal\n",
"4 5 249.09 Normal"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "1410da43",
"metadata": {},
"source": [
"# Analysing statistical data"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "d35d8713",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sample</th>\n",
" <th>Voltage</th>\n",
" <th>Problem</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>3366.000000</td>\n",
" <td>3366.000000</td>\n",
" <td>3366</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Harmonics</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>578</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>17.500000</td>\n",
" <td>0.000226</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>9.812166</td>\n",
" <td>229.620084</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>-585.480000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>9.000000</td>\n",
" <td>-208.957500</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.500000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>26.000000</td>\n",
" <td>209.255000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>34.000000</td>\n",
" <td>585.480000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sample Voltage Problem\n",
"count 3366.000000 3366.000000 3366\n",
"unique NaN NaN 6\n",
"top NaN NaN Harmonics\n",
"freq NaN NaN 578\n",
"mean 17.500000 0.000226 NaN\n",
"std 9.812166 229.620084 NaN\n",
"min 1.000000 -585.480000 NaN\n",
"25% 9.000000 -208.957500 NaN\n",
"50% 17.500000 0.000000 NaN\n",
"75% 26.000000 209.255000 NaN\n",
"max 34.000000 585.480000 NaN"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.describe(include=\"all\")"
]
},
{
"cell_type": "markdown",
"id": "553a108e",
"metadata": {},
"source": [
"# Finding outliers"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "51229178",
"metadata": {},
"outputs": [],
"source": [
"# Function to find the outliers\n",
"def findoutliers(column):\n",
" outliers=[]\n",
" Q1=column.quantile(.25)\n",
" Q3=column.quantile(.75)\n",
" IQR=Q3-Q1\n",
" lower_limit=Q1-(1.5*IQR)\n",
" upper_limit=Q3+(1.5*IQR)\n",
" for out1 in column:\n",
" if out1>upper_limit or out1 <lower_limit:\n",
" outliers.append(out1)\n",
" \n",
" return np.array(outliers) "
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "619f5bc4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(findoutliers(data.Voltage)) # Non of the rows have outliers"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "43cfccaa",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(findoutliers(data.Sample)) # Non of the rows have outliers"
]
},
{
"cell_type": "markdown",
"id": "5b9013f7",
"metadata": {},
"source": [
"# Data visualisation"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "f9b1a6f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Distribution plot of Voltage')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAFNCAYAAAA9yHnJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAABAs0lEQVR4nO3deZxcVZ3//9en9yW9ZOksnW6SNB1CEkSWhEVUUHACrQbc4waKDKLBYcbv/BwcnRnGGTUzzLgRFcFBzCjJMCImKgkyzOCCQhJ2EkjS2buT9JKk9736/P64tzqVSlWvVV1d1e/n41GPrjp3O7f63vupc+4555pzDhERkVhLS3QGREQkNSnAiIhIXCjAiIhIXCjAiIhIXCjAiIhIXCjAiIhIXCjAjIKZzTczZ2YZ47CtXDP7pZk1m9l/x3jdB8zsmliuUxLDzO41s7+L0brOMrM2M0v3Pz9lZrfEYt3++jab2U2xWt8ItuvMrHIctmNm9iMzO2lmW2O87pj+L+It6QKMf1Hs9E+Ak2b2azMrT3S+ojGzu8zsJ2NYxfuBWcB059wHwtb9Yf/7sLD0DDOrN7N3jWM+JU5CjvlWM2sysz+a2W1mNnD+Ouduc8790zDXNeiPCufcIefcFOdcIAZ5P+O4cs5d55z78VjXHS9m9gkz+8MYVvFm4B1AmXPukrB1X25m7WZWEGG7L5jZ7eOYz7hLugDje7dzbgowB6gD7klwfuJpHrDbOdcXYdqjQDFwZVj6tYADtsQ3azKO3u2cK8A7HtYAfwP8R6w3Mh6l8klgHnDAOdcePsE59yegBnhfaLqZnQcsAdaPSw7Hi3MuqV7AAeCakM9VeBfg4OciYB3QABwEvowXSKfh/WPf7c83BagGboyynaeArwNbgWZgIzDNnzYf7wKe4X8uBTYBJ/x1/rmffi3QA/QCbcBLUba12N9eE7ADWOmn/2PY8p+KsOx9wANhaQ8D3/Dfr/TX2eRvY3H4dxktn8AngdeAVmAf8Omw7XwBOAocAW7xv5NKf1o28G/AIbwfAfcCuYk+fpLxFX7M+2mXAP3Aef7nB4F/9t/PAH7l/89PAL/3z4H/9Jfp9P/PXwg5lj/l/69+F+H4HuxcuAqoiZTfQY6rp4Bb/PdpeOfoQaAe79wtCjvPbvLz1gh8aZDv6UH/OHvCP2Z/C8wLmR56fEa7TiwGuoCAn+emKNuKds5/Kmz5f4yw7N8C/xuW9q/Az/33bwK2+d/1NuBNIfM9hXeuRcwn8E7gBaAFOAzcFbadG/39PQ78HSHHlr//dwJ7/ekPB//Poz52E33yjOVkA/KAHwPrQqav80+AAv8A3Y1/YQb+DDgGzATuB342yHaeAmqB84B84BHgJ2EHfvAE/C3wPSAHuMA/aK/2p90VXC7KdjL9A/RvgSzg7f7JsWiYy1/hH0y5ISdOp5+Pc4B2vOJ6Jt4FpRrIivBdnrEd/2A9GzC8UlIHcJE/7Vr/u1zq/x/+k9NP4G/hnYDT/P/FL4GvJ/r4ScYXEQKMn34I+Iz//kFOBZiv411oM/3XWwCLtK6QY3mdf5znRji+BzsXriJKgBnkuHqKUwHmZv+YrMD70fdz4D/D8na/n683At2E/EgKW++DeOfOW/F+4Hwb+EPI9NDjc7DrxCdCl4uyrcHO+UGXB8rxgu5Z/uc0vB+/N/jny0ng40AG8GH/8/QI390Z2/H/H2/w13k+3o+7G/xpS/CC0ZvxrjX/5ucj+L/6S+AZoMz//n4ArB/TsZvok2eUJ1sb3q+zPrxfz2/wp6X7B+CSkPk/DTwV8vke4BV/uemDbOcpYE3I5yV4v8bSQw78DP9gCQAFIfN+HXgw2gkWtp234F2o00LS1uP/8hhqeX+ePcBH/Pd/zqlfin8HPBwyXxreheKqkO8y6oUgwnZ+Adzhv3+AkIABVPrfSSVeQGoHzg6ZfjmwP9HHTzK+iB5gnsH/Rc/pAeYreBfPyqHWFXIsV0RICw0w0c6FqxhbgHkS+GzItEV4F72MkHyUhUzfCqyK8j09CGwI+TwF79ws9z8Hj89BrxMML0AMds4Purw/z/8Af+u/fwde6SwTL7BsDZv3T8AnInx3w9nOt4Bv+u//npCAgffDsCfkf/UafpD0P88J/i9Ge+wm6z2YG5xzxXhR9nbgt2Y2G69qIAuvCBh0EJgb8vk+vF9iP3LOHR9iO4fD1pPpbyNUKXDCOdc6yDYHUwocds71j3J58H6N3ei//zheqS647oHvwt/G4eGu28yuM7NnzOyEmTXhVUcG97+U07+f0PcleAfvc/5N6Sa8+0ElI9gnGdpcvCqacHfjlQp+Y2b7zOzOYazr8AimRzsXRuO0Y9R/n4HXsCXoWMj7DrzAEc1APp1zbXjfT2nYPMO5TgyV57Gc8+Cdo6Hn7EPOuV7O/D5GtG4zu9TM/s/MGsysGbiNKOesc64DryosaB7waMg5+xpeIA39X4xIsgYYAJxzAefcz/G+hDfj/Qroxfuigs7C+9WO3+zyB3gX5M8Mo8liaOu0s/x1N4bNcwSYFtYqZGCbeL+aBnMEKA9tERS2/HCsA642s8uBy4CHQtY98F34rc3Ko6z7tHyaWTZeVci/AbP8gP4YXukEvHsvZSGLhH5XjXjVdEudc8X+q8h5DTMkBsxsOd5F54xWRM65Vufc/3POVQDvBj5vZlcHJ0dZ5VDHabRzoR3vx0QwX+mc/kNiOMd/+Pnah1e1MxoD+TSzKXhVTkfC5hn0OsHw8jzYOT8cPwfmmtnbgPfincPBdc8LmzfauiPl8yG8quly51wRXlVpxHPWzHKB6SHLHgauCzlni51zOc65kezXaZI6wPjtza8HpgKvOa9Z5cPAV82swMzmAZ8Hgs0k/9b/ezPehXNdsK1/FB8zsyVmlodX7fAzF9Z00zl3GPgj8HUzyzGz8/Fu9P3Un6UOmB8WQEI9i3eSfsHMMs3sKryLwobhfQvgnDuId6FZDzzhnAv+4nsYeKeZXW1mmcD/w6sa+GOE1YTnMwuvhNgA9JnZdXj3sIIeBj5pZov97+fvQ/LTj1dv/k0zmwlgZnPNbMVw90kiM7NCv/n5Bryqp1cizPMuM6v0f1C04P0ACx63dXj3O0Yq2rmwG8gxs3f6x9iX8Y6boKGO//XAX5nZAj8gfA34Lxe51eRwVJnZm80sC/gn4Fn/HB0wjOtEHVDmr+MMwzjnh+S8FmY/A34EHHTObfcnPQacY2Yf8bsbfAivSvJXEVYTKZ8FeKWrLjO7BPhIyLSfAe82szf5y/wjp4IPeMHoq/73gZmV+NfX0Rtt3VqiXnj1u8FWMK3Aq8BHQ6ZPxTtQGvAi8t/jBdKL8W6WBW/ypQNPE6VVCqe3nGnBu0k9w0Wuoy7DOwBO4LXAuC1kPdPxLv4ngeejbGsp3k3DZmAn8J6QaXcxxL0Rd6o+1gEfCkt/j7/OZn8bS8O+y2ui5RNYjXcQN+HdxN+AX8/vT/8iXvXFEeAz/vaD9d05eBeLff739xrwF4k+fpLxFXLMt/r/xz/5/5v0kHke5NQ9mL/yl2nHu3n8dyHzXY/XOKAJ+OvwYznK8R31XAg59o7itQL762EcV09xeiuyv8c7Vxvwzt2pkfIRvmyE7+lBTrUia8NrEbcgZLrj1Pkf8TrhT8sCfo13PjdG2dZg5/wnGOLeiD/fVX6e/iYs/c3Ac/7/+jngzZH2P1I+8frNHcQ7Vn4FrCXk+uHn7RCnWpHVAm8J+V98HtjlL78X+NpYjt1gyxIJY2ZP4f1jfpjovCQDM1uMF+yz3eh/fYqMmpk9iNfg4MuJzksy8EuMTcBC59z+eGwjqavIJLHM7D1mlmVmU4F/AX6p4CIycZnZu80sz8zy8W4TvIJX4owLBRgZi0/jVTHsxavj/0xisyMiQ7ger0r7CLAQr8l33KqxVEUmIiJxoRKMiIjEhQKMiIjExYhGTp0xY4abP39+nLIiEn/PPfdco3NuwowooHNKkt1g59SIAsz8+fPZvn370DOKTFBmFj4MR0LpnJJkN9g5pSoyERGJCwUYERGJCwUYERGJCwUYERGJCwUYSVlbtmxh0aJ
"text/plain": [
"<Figure size 504x360 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Voltage\n",
"fix, (ax1,ax2) =plt.subplots(1,2,figsize=(7,5))\n",
"sn.boxplot(data.Voltage, orient='v',ax=ax1)\n",
"ax1.set_ylabel=data.Voltage.name\n",
"ax1.set_title('Box plot of {}'.format(data.Voltage.name))\n",
"sn.distplot(data.Voltage,ax=ax2) \n",
"ax2.set_title('Distribution plot of {}'.format(data.Voltage.name))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "784eada8",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Harmonics 578\n",
"Swell 578\n",
"Sag 578\n",
"Normal 578\n",
"Flicker 544\n",
"Interruption 510\n",
"Name: Problem, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Problem\n",
"data.Problem.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "012d7672",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7ffee1309b50>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABaYAAAFuCAYAAAB6E89YAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAhoklEQVR4nO3de/TlZV0v8PfHmQRNTTgOHAQMU9LwhjqShcdrKWY5LBPFtNA4iy5461gtzLOO1jmsLC+n0rDDMXW8hWgak61QmuNdE0YlB1CSIwpzIBnMvGUo+Dl/7O/EdpgZfjCzn9/Pmddrrd/a3+/zfb7f/dmzeNh7v/ezn13dHQAAAAAAGOU2y10AAAAAAAD7FsE0AAAAAABDCaYBAAAAABhKMA0AAAAAwFCCaQAAAAAAhhJMAwAAAAAw1OpFXryq7pzktUnum6ST/HKSS5O8LckRSb6Q5Cnd/ZWp/wuTnJzkhiTP7e737Or6xx13XJ977rkLqh4AAAAAgN1UO2pc9IzpP05ybnffO8kDknwmyWlJNnb3kUk2TvupqqOSnJjkPkmOS3JGVa3a1cWvvfbaBZYOAAAAAMAiLCyYrqo7JXl4kj9Pku7+dnf/S5J1SdZP3dYnOX7aXpfkrO6+rrsvT3JZkmMWVR8AAAAAAMtjkTOmfyTJ1iSvr6pPVdVrq+oHkxzc3VcnyXR70NT/0CRXzp2/ZWr7HlV1SlVtqqpNW7duXWD5AAAAAAAswiKD6dVJHpTkNd39wCTfzLRsx07saK2RvklD95ndvba7165Zs2bPVAoAAAAAwDCLDKa3JNnS3R+f9t+RWVD9pao6JEmm22vm+h8+d/5hSa5aYH0AAAAAACyDhQXT3f1PSa6sqntNTY9JckmSDUlOmtpOSnLOtL0hyYlVtV9V3T3JkUnOX1R9AAAAAAAsj9ULvv5zkrylqm6b5PNJnpVZGH52VZ2c5IokJyRJd19cVWdnFl5fn+TU7r5hwfUBAAAAADBYdd9kGefvG2vXru1NmzYtdxkAAAAAAOzYjn5bcKFrTAMAAAAAwE0IpgEAAAAAGEowDQAAAADAUIJpAAAAAACGEkwDAAAAADCUYBoAAAAAgKFWL3cBK9mDf+uNy10CLNQnXvZLy13CrXbF791vuUuAhbrbf9u83CXcase+6tjlLgEW6iPP+chylwAADHT6M5683CXAQr3oze9Ylvs1YxoAAAAAgKEE0wAAAAAADCWYBgAAAABgKME0AAAAAABDCaYBAAAAABhKMA0AAAAAwFCCaQAAAAAAhhJMAwAAAAAwlGAaAAAAAIChBNMAAAAAAAwlmAYAAAAAYCjBNAAAAAAAQwmmAQAAAAAYSjANAAAAAMBQgmkAAAAAAIYSTAMAAAAAMJRgGgAAAACAoVYvdwEAAMAYH3j4I5a7BFioR3zwA8tdAgCwRGZMAwAAAAAwlGAaAAAAAIChBNMAAAAAAAwlmAYAAAAAYCjBNAAAAAAAQwmmAQAAAAAYSjANAAAAAMBQgmkAAAAAAIYSTAMAAAAAMJRgGgAAAACAoQTTAAAAAAAMJZgGAAAAAGAowTQAAAAAAEMJpgEAAAAAGEowDQAAAADAUIJpAAAAAACGEkwDAAAAADDU6uUuAAAAAPZ1r37BXy93CbBQz37Fzy13CcAKY8Y0AAAAAABDLTSYrqovVNXmqrqwqjZNbQdW1XlV9bnp9oC5/i+sqsuq6tKqetwiawMAAAAAYHmMmDH9qO4+urvXTvunJdnY3Ucm2Tjtp6qOSnJikvskOS7JGVW1akB9AAAAAAAMtBxLeaxLsn7aXp/k+Ln2s7r7uu6+PMllSY4ZXx4AAAAAAIu06GC6k7y3qj5RVadMbQd399VJMt0eNLUfmuTKuXO3TG3fo6pOqapNVbVp69atCywdAAAAAIBFWL3g6x/b3VdV1UFJzquqz+6ib+2grW/S0H1mkjOTZO3atTc5DgAAAADAyrbQGdPdfdV0e02Sd2W2NMeXquqQJJlur5m6b0ly+NzphyW5apH1AQAAAAAw3sKC6ar6waq647btJI9NclGSDUlOmrqdlOScaXtDkhOrar+qunuSI5Ocv6j6AAAAAABYHotcyuPgJO+qqm3389buPreqLkhydlWdnOSKJCckSXdfXFVnJ7kkyfVJTu3uGxZYHwAAAAAAy2BhwXR3fz7JA3bQ/uUkj9nJOacnOX1RNQEAAAAAsPwWusY0AAAAAABsTzANAAAAAMBQgmkAAAAAAIYSTAMAAAAAMJRgGgAAAACAoQTTAAAAAAAMJZgGAAAAAGAowTQAAAAAAEMJpgEAAAAAGEowDQAAAADAUIJpAAAAAACGEkwDAAAAADCUYBoAAAAAgKEE0wAAAAAADCWYBgAAAABgKME0AAAAAABDCaYBAAAAABhKMA0AAAAAwFCCaQAAAAAAhhJMAwAAAAAwlGAaAAAAAIChBNMAAAAAAAwlmAYAAAAAYCjBNAAAAAAAQwmmAQAAAAAYSjANAAAAAMBQgmkAAAAAAIYSTAMAAAAAMJRgGgAAAACAoQTTAAAAAAAMJZgGAAAAAGAowTQAAAAAAEMJpgEAAAAAGEowDQAAAADAUIJpAAAAAACGEkwDAAAAADCUYBoAAAAAgKEE0wAAAAAADCWYBgAAAABgKME0AAAAAABDCaYBAAAAABhKMA0AAAAAwFCCaQAAAAAAhhJMAwAAAAAw1MKD6apaVVWfqqp3T/sHVtV5VfW56faAub4vrKrLqurSqnrcomsDAAAAAGC8ETOmn5fkM3P7pyXZ2N1HJtk47aeqjkpyYpL7JDkuyRlVtWpAfQAAAAAADLTQYLqqDkvyhCSvnWtel2T9tL0+yfFz7Wd193XdfXmSy5Ics8j6AAAAAAAYb9Ezpv8oyW8n+e5c28HdfXWSTLcHTe2HJrlyrt+Wqe17VNUpVbWpqjZt3bp1IUUDAAAAALA4Cwumq+pnk1zT3Z9Y6ik7aOubNHSf2d1ru3vtmjVrdqtGAAAAAADGW73Aax+b5IlV9TNJ9k9yp6p6c5IvVdUh3X11VR2S5Jqp/5Ykh8+df1iSqxZYHwAAAAAAy2BhM6a7+4XdfVh3H5HZjxr+n+5+RpINSU6aup2U5Jxpe0OSE6tqv6q6e5Ijk5y/qPoAAAAAAFgei5wxvTMvTXJ2VZ2c5IokJyRJd19cVWcnuSTJ9UlO7e4blqE+AAAAAAAWaEgw3d3vT/L+afvLSR6zk36nJzl9RE0AAAAAACyPhS3lAQAAAAAAOyKYBgAAAABgKME0AAAAAABDCaYBAAAAABhKMA0AAAAAwFCCaQAAAAAAhhJMAwAAAAAwlGAaAAAAAIChBNMAAAAAAAwlmAYAAAAAYCjBNAAAAAAAQwmmAQAAAAAYSjANAAAAAMBQgmkAAAAAAIYSTAMAAAAAMJRgGgAAAACAoQTTAAAAAAAMJZgGAAAAAGAowTQAAAAAAEMJpgEAAAAAGEowDQAAAADAUIJpAAAAAACGEkwDAAAAADCUYBoAAAAAgKEE0wAAAAAADCWYBgAAAABgKME0AAAAAABDCaYBAAAAABhKMA0AAAAAwFCCaQAAAAAAhhJMAwAAAAAwlGAaAAAAAIChBNMAAAAAAAwlmAYAAAAAYKglBdNVtXEpbQAAAAAAcHNW7+pgVe2f5PZJ7lJVBySp6dCdktx1wbUBAAAAALAX2mUwneRXkjw/sxD6E7kxmP5akj9dXFkAAAAAAOytdhlMd/cfJ/njqnpOd79qUE0AAAAAAOzFbm7GdJKku19VVT+Z5Ij5c7r7jQuqCwAAAACAvdSSgumqelOSeyS5MMkNU3MnEUwDAAAAAHCLLCmYTrI2yVHd3YssBgAAAACAvd9tltjvoiT/cZGFAAAAAACwb1jqjOm7JLmkqs5Pct22xu5+4kKqAgAAAABgr7XUYPoliywCAAAAAIB9x5KC6e7+wC29cFXtn+SDSfab7ucd3f3iqjowyduSHJHkC0me0t1fmc55YZKTM/uBxed293tu6f0CAAAAALCyLWmN6ar6elV9bfr7t6q6oaq+djOnXZfk0d39gCRHJzmuqh6a5LQ
"text/plain": [
"<Figure size 1440x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"pl = sn.factorplot('Problem',data=data, aspect=4,kind='count')\n",
"pl.set_xticklabels()"
]
},
{
"cell_type": "markdown",
"id": "07cb4c79",
"metadata": {},
"source": [
"# Identifying the independent and dependent variables"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "d18b49b6",
"metadata": {},
"outputs": [],
"source": [
"X = data.iloc[:,:-1] # Independent variable\n",
"y = data.Problem # Dependent variable"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "a22d8277",
"metadata": {},
"outputs": [],
"source": [
"x_test = test.iloc[:,:-1]\n",
"y_test = test.Problem "
]
},
{
"cell_type": "markdown",
"id": "57dfd617",
"metadata": {},
"source": [
"# Decision Tree"
]
},
{
"cell_type": "markdown",
"id": "50e0afdc",
"metadata": {},
"source": [
"# Defining the model"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "56e08b15",
"metadata": {},
"outputs": [],
"source": [
"model_dt = DecisionTreeClassifier()"
]
},
{
"cell_type": "markdown",
"id": "0626248b",
"metadata": {},
"source": [
"# Training the model"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "91e5d0f2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier()"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_dt.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "7a358b81",
"metadata": {},
"source": [
"# Testing the model"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "5ef648e8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.95"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict_dt = model_dt.predict(x_test)\n",
"as_dt = accuracy_score(y_test,y_predict_dt)\n",
"as_dt"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "894932ae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 0.92 1.00 0.96 170\n",
" Harmonics 0.91 0.85 0.88 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 0.95 1.00 0.97 170\n",
" Sag 0.98 0.92 0.95 170\n",
" Swell 0.95 0.92 0.93 170\n",
"\n",
" accuracy 0.95 1020\n",
" macro avg 0.95 0.95 0.95 1020\n",
"weighted avg 0.95 0.95 0.95 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict_dt))"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "d9a53648",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 7, 145, 0, 5, 4, 9],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 3, 10, 0, 0, 157, 0],\n",
" [ 4, 5, 0, 4, 0, 157]])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict_dt)"
]
},
{
"cell_type": "markdown",
"id": "03be07e5",
"metadata": {},
"source": [
"# XGBoost"
]
},
{
"cell_type": "markdown",
"id": "10b72622",
"metadata": {},
"source": [
"# Defining the model"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "d9b1d0d6",
"metadata": {},
"outputs": [],
"source": [
"model_xgb = XGBClassifier(n_estimators=300)"
]
},
{
"cell_type": "markdown",
"id": "473fe685",
"metadata": {},
"source": [
"# Training the model"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "10d8cbdc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[21:26:24] WARNING: /opt/concourse/worker/volumes/live/7a2b9f41-3287-451b-6691-43e9a6c0910f/volume/xgboost-split_1619728204606/work/src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'multi:softprob' was changed from 'merror' to 'mlogloss'. Explicitly set eval_metric if you'd like to restore the old behavior.\n"
]
},
{
"data": {
"text/plain": [
"XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
" colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
" importance_type='gain', interaction_constraints='',\n",
" learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
" min_child_weight=1, missing=nan, monotone_constraints='()',\n",
" n_estimators=300, n_jobs=16, num_parallel_tree=1,\n",
" objective='multi:softprob', random_state=0, reg_alpha=0,\n",
" reg_lambda=1, scale_pos_weight=None, subsample=1,\n",
" tree_method='exact', validate_parameters=1, verbosity=None)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_xgb.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "8ffd720a",
"metadata": {},
"source": [
"# Testing the model"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "ab0a0c92",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.95"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict_xgb = model_dt.predict(x_test)\n",
"as_xgb = accuracy_score(y_test,y_predict_xgb)\n",
"as_xgb"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "f98f8111",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 0.92 1.00 0.96 170\n",
" Harmonics 0.91 0.85 0.88 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 0.95 1.00 0.97 170\n",
" Sag 0.98 0.92 0.95 170\n",
" Swell 0.95 0.92 0.93 170\n",
"\n",
" accuracy 0.95 1020\n",
" macro avg 0.95 0.95 0.95 1020\n",
"weighted avg 0.95 0.95 0.95 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict_xgb))"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "ebdecc0c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 7, 145, 0, 5, 4, 9],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 3, 10, 0, 0, 157, 0],\n",
" [ 4, 5, 0, 4, 0, 157]])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict_xgb)"
]
},
{
"cell_type": "markdown",
"id": "f5dc8e32",
"metadata": {},
"source": [
"# Random Forest"
]
},
{
"cell_type": "markdown",
"id": "d8d6a4f0",
"metadata": {},
"source": [
"# Defining the model"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "8a51ef49",
"metadata": {},
"outputs": [],
"source": [
"model_rf = RandomForestClassifier()"
]
},
{
"cell_type": "markdown",
"id": "11bf5e15",
"metadata": {},
"source": [
"# Training the model"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "8c568b29",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RandomForestClassifier()"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_rf.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "bb844013",
"metadata": {},
"source": [
"# Testing the model"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "654580cb",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9774509803921568"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict_rf = model_rf.predict(x_test)\n",
"as_rf = accuracy_score(y_test,y_predict_rf)\n",
"as_rf"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "afa6cdb1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 0.97 1.00 0.99 170\n",
" Harmonics 0.94 0.92 0.93 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 1.00 1.00 1.00 170\n",
" Sag 0.98 0.96 0.97 170\n",
" Swell 0.97 0.98 0.97 170\n",
"\n",
" accuracy 0.98 1020\n",
" macro avg 0.98 0.98 0.98 1020\n",
"weighted avg 0.98 0.98 0.98 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict_rf))"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "cba6c72b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 5, 157, 0, 0, 3, 5],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 0, 6, 0, 0, 164, 0],\n",
" [ 0, 4, 0, 0, 0, 166]])"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict_rf)"
]
},
{
"cell_type": "markdown",
"id": "80d38c0b",
"metadata": {},
"source": [
"# Model Evaluation"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "aa4afe29",
"metadata": {},
"outputs": [],
"source": [
"Accuracy_Score = [as_dt,as_rf,as_xgb]\n",
"Models = ['Decision Tree', 'Random Forest','XG Boost']"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "bae685ee",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAa4AAAEWCAYAAAA+bHOCAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAZPElEQVR4nO3de5RlZX3m8e9DNzQoCDSXSCNSiBe80HYARQIIIhMv4wUjjhfCzSgh4w0mRF2uGYPXoDHqUkYJIYIwMDoKIhC5iQqCojTQ0ICiaAsSUUGuIojd/uaPvSseK1Xdp5vqU/02389ateqcd+/97t9+u/o89e59ap9UFZIktWKdmS5AkqSVYXBJkppicEmSmmJwSZKaYnBJkppicEmSmmJwSZKaYnBJ0yzJN5LclWTOTNeyuiR5V5IlSX6d5NYkn5/pmvTIYXBJ0yjJGLAnUMDLRrzv2SPaz8HAgcC+VbUhsAtw0TTvYyTHojYZXNL0Ogi4HDgJOHhwQZJtkpyR5PYkv0py7MCyNyb5XpL7ktyQZKe+vZI8cWC9k5K8v3+8dz/beUeSnwMnJtk0yTn9Pu7qHz9uYPu5SU5M8rN++Zl9+3VJXjqw3rpJ7kiyYJJjfBZwflX9CKCqfl5Vx69oHwPHeVOSO5OclWTewLJK8qYkPwR+2Le9JMmiJHcn+VaS+UP/S2itZXBJ0+sg4NT+6wVJ/gQgySzgHOBmYAzYGvhcv+xVwNH9to+hm6n9asj9PRaYC2wLHEb3f/rE/vnjgQeAYwfWPwV4FPB0YEvgY337ycBfDqz3YuC2qlo0yT4vBw5K8ndJdumPbdCk+0iyD/APwH8DtqIbi89N2HY/YFfgaX14fwb4a2Az4J+Bs9bmU7AaUlX55Zdf0/AF7AH8Dti8f/594Mj+8W7A7cDsSbY7H3jbFH0W8MSB5ycB7+8f7w08BKy/nJoWAHf1j7cCfg9sOsl684D7gMf0z78IvH05/R4AfBW4ny5k3znEPv4V+PDA8w378RobONZ9BpZ/GnjfhD5uBPaa6X9rv2b2yxmXNH0OBi6oqjv656fxh9OF2wA3V9XSSbbbBvjRKu7z9qp6cPxJkkcl+eckNye5F7gE2KSfFW0D3FlVd03spKp+BlwGvDLJJsCL6GaNk6qqU6tqX2AT4HDgvUlesLx90IXjzQN9/Jou9LYeWOenA4+3Bf62P014d5K7+/7noUc0L4BK0yDJBnSnwGb115sA5tCFxjPpXpAfn2T2JOH1U2D7Kbr+Dd1pt3GPBW4deD7x4x3+FngKsGtV/by/RnU1kH4/c5NsUlV3T7KvzwJvoHtd+HZV/ftUx/sfO6/6HfCFJO8AnkEX1lPt42d0YQRAkkfTnQIc3M/g8fwU+EBVfWBFdeiRxRmXND32A5YBT6M7PbcAeCrwTbprV98FbgOOSfLoJOsn2b3f9gTgqCQ7p/PEJOMv8IuA1yWZleSFwF4rqGMjuutadyeZC/z9+IKqug04F/hU/yaOdZM8d2DbM4GdgLfRXfOaVJJDkvzXJBslWSfJi+iuZ31nBfs4DTg0yYL+OtUH+21+MsWu/gU4PMmu/bg8eny/KxgDreUMLml6HAycWFW3VPcuu59X1c/p3hhxAN2M56XAE4Fb6GZNrwaoqi8AH6B7Yb+PLkDm9v2+rd/u7r6fM1dQx8eBDYA76N5Ecd6E5QfSXVf6PvBL4IjxBVX1AHA6sB1wxnL2cS/wrv447gY+DPxNVV26vH1U1UXA/+r3cRvdLPM1U+2kqhYCb6Qbw7uAm4BDllOXHiFS5QdJSuokeTfw5Kr6yxWuLM0Qr3FJArq/vwL+im7GJK2xPFUoiSRvpHszxLlVdclM1yMtj6cKJUlNccYlSWqK17hGYPPNN6+xsbGZLkOSmnLllVfeUVVbTGw3uEZgbGyMhQsXznQZktSUJDdP1u6pQklSUwwuSVJTDC5JUlMMLklSUwwuSVJTDC5JUlN8O/wIPLR4MT/ZbruZLkOSps3YkiUztm9nXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKYYXJKkphhckqSmGFySpKY87OBKsizJoiTXJ7kmyf9Iskr9Jnlvkn2Xs/zwJAeterWQZMe+3kVJ7kyypH/81YfTryRpNGZPQx8PVNUCgCRbAqcBGwN/v7IdVdW7V7D8uFUpcEIfi4EFAElOAs6pqi8OrpNkdlUtfbj7kiRNv2k9VVhVvwQOA96czqwk/5jkiiTXJvnr8XWTvD3J4n6WdkzfdlKS/fvHxyS5od/uI33b0UmO6h8vSHJ5v/xLSTbt27+R5ENJvpvkB0n2HKb2frsPJrkYeFuSnZNcnOTKJOcn2apfb/sk5/Xt30yywzQOoSRpBaZjxvVHqurH/anCLYGXA/dU1bOSzAEuS3IBsAOwH7BrVf0mydzBPvrnrwB2qKpKsskkuzoZeEtVXZzkvXQzvCPGj6uqnp3kxX37lKcfJ9ikqvZKsi5wMfDyqro9yauBDwCvB44HDq+qHybZFfgUsM+Q/UuSHqZpD65e+u9/Dswfn0XRnUJ8El2QnFhVvwGoqjsnbH8v8CBwQpJ/A875o86TjelC5uK+6bPAFwZWOaP/fiUwthJ1f77//hTgGcCFSQBmAbcl2RD4M+ALfTvAnMk6SnIY3eyTebNmrUQJkqTlmfbgSvIEYBnwS7oAe0tVnT9hnRcCNVUfVbU0ybOB5wOvAd7Mys1qftt/X8bKHeP94yUC11fVboMLkzwGuHv8mt7yVNXxdLMz5s+ZM+WxSpJWzrRe40qyBXAccGxVFXA+8Df9qTeSPDnJo4ELgNcneVTfPvFU4YbAxlX1FbrTfwsGl1fVPcBdA9evDqQ7tTddbgS2SLJbX8+6SZ5eVfcCS5K8qm9PkmdO434lSSswHTOuDZIsAtYFlgKnAB/tl51Ad6ruqnTn1m4H9quq85IsABYmeQj4CvCugT43Ar6cZH262c+Rk+z3YOC4Pvx+DBw6DccCQFU91J/e/ER/WnI28HHgeuAA4NNJ/md/zJ8DrpmufUuSli/dxEir0/w5c+qsefNmugxJmjZjS5as9n0kubKqdpnY7p0zJElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTTG4JElNMbgkSU0xuCRJTZk90wU8Eqy3446MLVw402VI0lrBGZckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSneHX4EHlq8mJ9st91MlyFJIzW2ZMlq6dcZlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKQaXJKkpBpckqSkGlySpKSsMriTLkixKcl2Ss5NsMh07TnJIkmOno68J/X4jyY19zYuS7D/d++j3M5bkdaujb0nS1IaZcT1QVQuq6hnAncCbVnNN0+GAvuYFVfXFYTZIMnsl9zEGGFySNGIre6rw28DWAEmeneRbSa7uvz+lbz8kyRlJzkvywyQfHt84yaFJfpDkYmD3gfZtk1yU5Nr+++P79pOSfDrJ15P8OMleST6T5HtJThq26CRzk5zZ9395kvl9+9FJjk9yAXByki2SnJ7kiv5r9369vQZmcFcn2Qg4BtizbztyJcdRkrSKhp5lJJkFPB/4177p+8Bzq2ppkn2BDwKv7JctAP4U+C1wY5JPAkuB9wA7A/cAXweu7tc/Fji5qj6b5PXAJ4D9+mWbAvsALwPOpgu8NwBXJFlQVYsmKffUJA/0j58
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sn.barplot(Accuracy_Score, Models, color=\"r\")\n",
"plt.xlabel('Accuracy Score')\n",
"plt.title('Accuracy Score')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "e59d07ca",
"metadata": {},
"source": [
"Random Forest seems to have the highest accuracy score hence we go ahead with Random Forest algorithm."
]
},
{
"cell_type": "markdown",
"id": "f20c8195",
"metadata": {},
"source": [
"# Hyperparameter (using grid search)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "e513bf46",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting 5 folds for each of 120 candidates, totalling 600 fits\n"
]
},
{
"data": {
"text/plain": [
"GridSearchCV(cv=5, estimator=RandomForestClassifier(),\n",
" param_grid={'max_depth': [10, 15, 14, 13, 12],\n",
" 'n_estimators': [150, 160, 170, 180, 190, 200],\n",
" 'random_state': [4, 5, 6, 7]},\n",
" verbose=1)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"parameters = {'max_depth':[10,15,14,13,12],\n",
" 'random_state': [4,5,6,7],\n",
" 'n_estimators':[150,160,170,180,190,200]}\n",
"\n",
"grid = GridSearchCV(model_rf,parameters,cv=5,verbose=1)\n",
"grid.fit(X,y)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "10a4aca4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8773607700142415"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grid.best_score_"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "94bb569e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'max_depth': 13, 'n_estimators': 190, 'random_state': 5}"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grid.best_params_"
]
},
{
"cell_type": "markdown",
"id": "71b111fc",
"metadata": {},
"source": [
"# Training Random Forest with best parameters"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "0ee81152",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RandomForestClassifier(max_depth=13, n_estimators=190, random_state=5)"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_rf = RandomForestClassifier(n_estimators=190,max_depth=13,random_state=5)\n",
"model_rf.fit(X,y)"
]
},
{
"cell_type": "markdown",
"id": "77dd9b12",
"metadata": {},
"source": [
"# Testing Random Forest with best parameters"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "ed03903a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9892156862745098"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_predict = model_rf.predict(x_test)\n",
"accuracy_score(y_test,y_predict)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "96b24b5a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" Flicker 1.00 1.00 1.00 170\n",
" Harmonics 0.94 1.00 0.97 170\n",
"Interruption 1.00 1.00 1.00 170\n",
" Normal 1.00 1.00 1.00 170\n",
" Sag 1.00 0.96 0.98 170\n",
" Swell 1.00 0.98 0.99 170\n",
"\n",
" accuracy 0.99 1020\n",
" macro avg 0.99 0.99 0.99 1020\n",
"weighted avg 0.99 0.99 0.99 1020\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,y_predict))"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "9b77ef9f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[170, 0, 0, 0, 0, 0],\n",
" [ 0, 170, 0, 0, 0, 0],\n",
" [ 0, 0, 170, 0, 0, 0],\n",
" [ 0, 0, 0, 170, 0, 0],\n",
" [ 0, 7, 0, 0, 163, 0],\n",
" [ 0, 4, 0, 0, 0, 166]])"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,y_predict)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "596a373a",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}