Machine Learning Data
Now, we are going to apply some ML algorithms on lightcurves contained:
0: Confirmed Exoplanets
1: Eclipsing Binaries
2: Non Eclipsed
With that in mind, the data from (1) and (2) will be downloaded from the CoRoT Public Archive and transformed into CSV files, just like we did for (0): Confirmed Exoplanets on 01 - Manipulating fits files
[ ]:
SkTime
[ ]:
import pandas as pd
import numpy as np
import os
!pip install control
from tools import *
Preprocessing data
Creating matrix of features (CoRoT targets with confirmed exoplanets)
[ ]:
# DATA_DIR = 'C:/Users/guisa/Google Drive/01 - Iniciação Científica/02 - Datasets/csv_files'
# DATA_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/csv_files'
DATA_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/resampled_files'
[ ]:
X = pd.DataFrame()
for root_dir_path, sub_dirs, files in os.walk(DATA_DIR):
for j in range(0, len(files)):
if files[j] != ('desktop.ini' and 'csv_files.rar'):
# File path
path = root_dir_path + "/" + files[j]
# Reading data
# print(path)
data = pd.read_csv(path)
flux = data.WHITEFLUX
# Add timeseries to pd.DataFrame
X = X.append([[flux]], ignore_index=True)
[ ]:
X.columns = ['time_series']
X.head()
[ ]:
X.iloc[0][0]
[ ]:
X.shape
Labeling matrix of features
0: confirmed_exoplanets
1: eclipsing_binaries
2: none
[ ]:
labels = np.zeros(X.size, dtype='int')
labels
[ ]:
y = pd.Series(labels)
y.head()
[ ]:
y.shape
Creating dataset, X and y
[ ]:
# Creating pd.DataFrame with X data, and setted columns
df = pd.DataFrame(X, columns=['time_series', 'label'])
# Adding labels
df.label = y
df.head()
How many Labels we got ?
[ ]:
labels, counts = np.unique(y, return_counts=True)
print('Labels =', labels, '\nCounts =', counts)
Machine Learning - SkTime
https://github.com/alan-turing-institute/sktime/tree/v0.4.3
https://github.com/alan-turing-institute/sktime/blob/main/sktime/classification/compose/init.py
Preliminaries
[ ]:
# !pip install sktime[all_extras]
Splitting the dataset into the Training set and Test set
[ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(24, 1) (24,) (9, 1) (9,)
Time Series Classification
[ ]:
from sktime.classification.all import TimeSeriesForestClassifier
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
TimeSeriesForestClassifier()
[ ]:
from sklearn.metrics import accuracy_score
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
1.0
[ ]:
[ ]:
Feature extraction
[ ]:
import warnings
warnings.filterwarnings('ignore', 'statsmodels.tsa.ar_model.AR', FutureWarning)
[ ]:
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
transformer = TSFreshFeatureExtractor(default_fc_parameters="minimal")
extracted_features = transformer.fit_transform(X_train)
extracted_features.head()
Feature Extraction: 0%| | 0/5 [00:00<?, ?it/s]
Feature Extraction: 20%|██ | 1/5 [00:00<00:00, 7.32it/s]
Feature Extraction: 100%|██████████| 5/5 [00:00<00:00, 18.29it/s]
| time_series__sum_values | time_series__median | time_series__mean | time_series__length | time_series__standard_deviation | time_series__variance | time_series__root_mean_square | time_series__maximum | time_series__minimum | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2.124862e+09 | 141218.330072 | 141186.836848 | 15050.0 | 282.683706 | 79910.077814 | 141187.119841 | 142021.360654 | 138999.945791 |
| 4 | 6.136100e+08 | 40704.273331 | 40771.427763 | 15050.0 | 560.265206 | 313897.101078 | 40775.277055 | 44921.695568 | 39239.685767 |
| 16 | 6.258849e+08 | 41653.547836 | 41587.034197 | 15050.0 | 364.389989 | 132780.064141 | 41588.630578 | 42472.076073 | 40477.653465 |
| 5 | 4.512183e+09 | 299686.539820 | 299812.826420 | 15050.0 | 613.298072 | 376134.525086 | 299813.453701 | 302311.608084 | 297465.565232 |
| 13 | 5.857504e+08 | 38822.561668 | 38920.292034 | 15050.0 | 553.820584 | 306717.239608 | 38924.232160 | 40531.501716 | 37855.665849 |
[ ]:
# If the result is 1, it means that the entire dataset has de same lenght
extracted_features.time_series__length.nunique()
1
Time Series Classification with Feature Extraction
[ ]:
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestClassifier
classifier = make_pipeline(
TSFreshFeatureExtractor(show_warnings=False), RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
Feature Extraction: 0%| | 0/5 [00:00<?, ?it/s]
Feature Extraction: 20%|██ | 1/5 [01:28<05:53, 88.44s/it]
Feature Extraction: 40%|████ | 2/5 [03:13<04:40, 93.37s/it]
Feature Extraction: 60%|██████ | 3/5 [04:51<03:09, 94.94s/it]
Feature Extraction: 80%|████████ | 4/5 [06:30<01:36, 96.16s/it]
Feature Extraction: 100%|██████████| 5/5 [07:48<00:00, 93.60s/it]
Feature Extraction: 0%| | 0/5 [00:00<?, ?it/s]
Feature Extraction: 20%|██ | 1/5 [00:45<03:02, 45.55s/it]
Feature Extraction: 40%|████ | 2/5 [01:17<02:03, 41.33s/it]
Feature Extraction: 60%|██████ | 3/5 [01:52<01:18, 39.44s/it]
Feature Extraction: 80%|████████ | 4/5 [02:32<00:39, 39.77s/it]
Feature Extraction: 100%|██████████| 5/5 [02:50<00:00, 34.17s/it]
1.0
SkLearn
[ ]:
import pandas as pd
import numpy as np
import os
!pip install control
from tools import *
Preprocessing data
Creating matrix of features (CoRoT targets with confirmed exoplanets)
[ ]:
# DATA_DIR = 'C:/Users/guisa/Google Drive/01 - Iniciação Científica/02 - Datasets/csv_files'
# DATA_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/csv_files'
DATA_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/resampled_files'
[ ]:
X = pd.DataFrame()
for root_dir_path, sub_dirs, files in os.walk(DATA_DIR):
for j in range(0, len(files)):
if files[j] != ('desktop.ini' and 'csv_files.rar'):
# File path
path = root_dir_path + "/" + files[j]
# Reading data
# print(path)
data = pd.read_csv(path)
flux = data.WHITEFLUX
# Add timeseries to pd.DataFrame
X = X.append(flux, ignore_index=True)
[ ]:
X.head()
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 15010 | 15011 | 15012 | 15013 | 15014 | 15015 | 15016 | 15017 | 15018 | 15019 | 15020 | 15021 | 15022 | 15023 | 15024 | 15025 | 15026 | 15027 | 15028 | 15029 | 15030 | 15031 | 15032 | 15033 | 15034 | 15035 | 15036 | 15037 | 15038 | 15039 | 15040 | 15041 | 15042 | 15043 | 15044 | 15045 | 15046 | 15047 | 15048 | 15049 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.411572e+05 | 1.412424e+05 | 1.411326e+05 | 1.413731e+05 | 1.412133e+05 | 1.413579e+05 | 1.411959e+05 | 1.412497e+05 | 1.413829e+05 | 1.414096e+05 | 1.412703e+05 | 1.412327e+05 | 1.411492e+05 | 1.412873e+05 | 1.412144e+05 | 1.411442e+05 | 1.412001e+05 | 1.412225e+05 | 1.413235e+05 | 1.412076e+05 | 1.411856e+05 | 1.411647e+05 | 1.412128e+05 | 1.413366e+05 | 1.412635e+05 | 1.413088e+05 | 1.413307e+05 | 1.411438e+05 | 1.412988e+05 | 1.411502e+05 | 1.411364e+05 | 1.414083e+05 | 1.410876e+05 | 1.411729e+05 | 1.412924e+05 | 1.413673e+05 | 1.412330e+05 | 1.411206e+05 | 1.413310e+05 | 1.411721e+05 | ... | 1.411859e+05 | 1.411025e+05 | 1.410474e+05 | 1.411354e+05 | 1.411319e+05 | 1.412317e+05 | 1.411159e+05 | 1.412205e+05 | 1.411621e+05 | 1.411025e+05 | 1.412063e+05 | 1.411235e+05 | 1.411927e+05 | 1.412480e+05 | 1.412074e+05 | 1.411396e+05 | 1.411576e+05 | 1.410457e+05 | 1.411399e+05 | 1.408885e+05 | 1.409958e+05 | 1.411260e+05 | 1.411951e+05 | 1.411575e+05 | 1.411057e+05 | 1.411124e+05 | 1.412933e+05 | 1.411076e+05 | 1.410541e+05 | 1.411741e+05 | 1.409121e+05 | 1.409989e+05 | 1.412270e+05 | 1.411838e+05 | 1.413012e+05 | 1.411474e+05 | 1.411009e+05 | 1.413782e+05 | 1.412287e+05 | 1.413100e+05 |
| 1 | 2.605181e+04 | 2.611330e+04 | 2.601663e+04 | 2.614152e+04 | 2.587125e+04 | 2.587146e+04 | 2.602901e+04 | 2.604010e+04 | 2.611140e+04 | 2.607349e+04 | 2.612061e+04 | 2.601961e+04 | 2.608381e+04 | 2.615512e+04 | 2.605209e+04 | 2.615393e+04 | 2.595893e+04 | 2.613356e+04 | 2.608572e+04 | 2.604287e+04 | 2.609076e+04 | 2.603981e+04 | 2.604673e+04 | 2.605018e+04 | 2.605711e+04 | 2.603637e+04 | 2.602297e+04 | 2.610288e+04 | 2.599291e+04 | 2.599706e+04 | 2.596696e+04 | 2.610334e+04 | 2.616852e+04 | 2.615221e+04 | 2.600709e+04 | 2.604340e+04 | 2.602915e+04 | 2.623798e+04 | 2.596439e+04 | 2.610146e+04 | ... | 2.621469e+04 | 2.632878e+04 | 2.624308e+04 | 2.622704e+04 | 2.618278e+04 | 2.624397e+04 | 2.632099e+04 | 2.627644e+04 | 2.625360e+04 | 2.632872e+04 | 2.625611e+04 | 2.633979e+04 | 2.629440e+04 | 2.627620e+04 | 2.635345e+04 | 2.628232e+04 | 2.634446e+04 | 2.636739e+04 | 2.625771e+04 | 2.648313e+04 | 2.638753e+04 | 2.626968e+04 | 2.623826e+04 | 2.630487e+04 | 2.624348e+04 | 2.638560e+04 | 2.620561e+04 | 2.630678e+04 | 2.627786e+04 | 2.616603e+04 | 2.631367e+04 | 2.620188e+04 | 2.618542e+04 | 2.624512e+04 | 2.625785e+04 | 2.642315e+04 | 2.621238e+04 | 2.636027e+04 | 2.629231e+04 | 2.618336e+04 |
| 2 | 1.298393e+06 | 1.299550e+06 | 1.299725e+06 | 1.299612e+06 | 1.299747e+06 | 1.299215e+06 | 1.299576e+06 | 1.299769e+06 | 1.299262e+06 | 1.299409e+06 | 1.299280e+06 | 1.299889e+06 | 1.299150e+06 | 1.299826e+06 | 1.298902e+06 | 1.299552e+06 | 1.299346e+06 | 1.298708e+06 | 1.299628e+06 | 1.299107e+06 | 1.299239e+06 | 1.299363e+06 | 1.299605e+06 | 1.299160e+06 | 1.299955e+06 | 1.299210e+06 | 1.299477e+06 | 1.299130e+06 | 1.299318e+06 | 1.298997e+06 | 1.299127e+06 | 1.299335e+06 | 1.299339e+06 | 1.299389e+06 | 1.299585e+06 | 1.299507e+06 | 1.298837e+06 | 1.299754e+06 | 1.298997e+06 | 1.300436e+06 | ... | 1.295584e+06 | 1.296259e+06 | 1.295880e+06 | 1.296397e+06 | 1.295613e+06 | 1.295237e+06 | 1.295789e+06 | 1.295417e+06 | 1.295453e+06 | 1.295508e+06 | 1.295937e+06 | 1.294957e+06 | 1.295125e+06 | 1.294599e+06 | 1.294709e+06 | 1.295073e+06 | 1.295429e+06 | 1.295154e+06 | 1.295264e+06 | 1.295769e+06 | 1.295695e+06 | 1.295337e+06 | 1.295557e+06 | 1.295314e+06 | 1.295710e+06 | 1.295153e+06 | 1.295031e+06 | 1.295029e+06 | 1.295460e+06 | 1.295186e+06 | 1.294849e+06 | 1.295283e+06 | 1.294897e+06 | 1.294750e+06 | 1.294881e+06 | 1.294939e+06 | 1.295167e+06 | 1.295158e+06 | 1.295069e+06 | 1.294454e+06 |
| 3 | 1.125213e+05 | 1.127580e+05 | 1.129430e+05 | 1.125623e+05 | 1.127893e+05 | 1.125752e+05 | 1.127852e+05 | 1.126351e+05 | 1.126462e+05 | 1.126747e+05 | 1.128206e+05 | 1.126230e+05 | 1.127497e+05 | 1.127325e+05 | 1.127567e+05 | 1.127885e+05 | 1.126766e+05 | 1.127609e+05 | 1.125398e+05 | 1.127966e+05 | 1.126471e+05 | 1.126480e+05 | 1.128501e+05 | 1.128040e+05 | 1.127078e+05 | 1.128669e+05 | 1.126771e+05 | 1.127147e+05 | 1.127916e+05 | 1.126816e+05 | 1.127761e+05 | 1.126781e+05 | 1.127678e+05 | 1.127868e+05 | 1.125911e+05 | 1.127481e+05 | 1.127409e+05 | 1.126717e+05 | 1.126739e+05 | 1.125557e+05 | ... | 1.124980e+05 | 1.125445e+05 | 1.125411e+05 | 1.125625e+05 | 1.124583e+05 | 1.125062e+05 | 1.123094e+05 | 1.126122e+05 | 1.126024e+05 | 1.123122e+05 | 1.125150e+05 | 1.124099e+05 | 1.124539e+05 | 1.123920e+05 | 1.124828e+05 | 1.124690e+05 | 1.125900e+05 | 1.125468e+05 | 1.123983e+05 | 1.125112e+05 | 1.124692e+05 | 1.124070e+05 | 1.125379e+05 | 1.124257e+05 | 1.125522e+05 | 1.124184e+05 | 1.125103e+05 | 1.123842e+05 | 1.126233e+05 | 1.123789e+05 | 1.123924e+05 | 1.123847e+05 | 1.125097e+05 | 1.125469e+05 | 1.124996e+05 | 1.125005e+05 | 1.124207e+05 | 1.124281e+05 | 1.124471e+05 | 1.123491e+05 |
| 4 | 4.064368e+04 | 4.024597e+04 | 4.043663e+04 | 4.031514e+04 | 4.029457e+04 | 4.023700e+04 | 4.029928e+04 | 4.044337e+04 | 4.056287e+04 | 4.024665e+04 | 4.043309e+04 | 4.036776e+04 | 4.041253e+04 | 4.034370e+04 | 4.067833e+04 | 4.021538e+04 | 4.045781e+04 | 4.032538e+04 | 4.037169e+04 | 4.034290e+04 | 4.028767e+04 | 4.024320e+04 | 4.023320e+04 | 4.024396e+04 | 4.034883e+04 | 4.034518e+04 | 4.028057e+04 | 4.029106e+04 | 4.048474e+04 | 4.029657e+04 | 4.035592e+04 | 4.021696e+04 | 4.020960e+04 | 4.021597e+04 | 4.026372e+04 | 4.027219e+04 | 4.024717e+04 | 4.023869e+04 | 4.038073e+04 | 4.031858e+04 | ... | 4.300994e+04 | 4.303990e+04 | 4.306673e+04 | 4.301858e+04 | 4.267300e+04 | 4.247712e+04 | 4.262169e+04 | 4.277094e+04 | 4.303920e+04 | 4.267297e+04 | 4.267512e+04 | 4.263271e+04 | 4.274761e+04 | 4.247607e+04 | 4.258747e+04 | 4.217027e+04 | 4.206883e+04 | 4.236280e+04 | 4.214163e+04 | 4.226997e+04 | 4.200494e+04 | 4.193790e+04 | 4.209874e+04 | 4.221375e+04 | 4.254759e+04 | 4.279462e+04 | 4.288303e+04 | 4.289924e+04 | 4.287694e+04 | 4.282583e+04 | 4.266600e+04 | 4.277455e+04 | 4.272497e+04 | 4.275998e+04 | 4.296628e+04 | 4.300132e+04 | 4.278209e+04 | 4.265919e+04 | 4.269699e+04 | 4.274217e+04 |
5 rows × 15050 columns
[ ]:
X.shape
(33, 15050)
Labeling matrix of features
0: confirmed_exoplanets
1: eclipsing_binaries
2: none
[ ]:
labels = np.zeros(X.size, dtype='int')
labels
array([0, 0, 0, ..., 0, 0, 0])
[ ]:
y = pd.Series(labels)
y.head()
0 0
1 0
2 0
3 0
4 0
dtype: int64
[ ]:
y.shape
(496650,)
Creating dataset, X and y
[ ]:
# Creating pd.DataFrame with X data, and setted columns
df = pd.DataFrame(X)
# Adding labels
df['label'] = y
df.sample(5)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 15011 | 15012 | 15013 | 15014 | 15015 | 15016 | 15017 | 15018 | 15019 | 15020 | 15021 | 15022 | 15023 | 15024 | 15025 | 15026 | 15027 | 15028 | 15029 | 15030 | 15031 | 15032 | 15033 | 15034 | 15035 | 15036 | 15037 | 15038 | 15039 | 15040 | 15041 | 15042 | 15043 | 15044 | 15045 | 15046 | 15047 | 15048 | 15049 | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 30819.826000 | 30766.330239 | 30751.556425 | 30749.078411 | 30735.472031 | 30705.364719 | 30673.455704 | 30662.058557 | 30682.811774 | 30726.006400 | 30766.044428 | 30779.528462 | 30762.480430 | 30733.670298 | 30721.967523 | 30747.701120 | 30810.995926 | 30892.582773 | 30963.134137 | 30994.637790 | 30971.285524 | 30899.489245 | 30812.158646 | 30757.956592 | 30773.076489 | 30850.996851 | 30937.516699 | 30965.531315 | 30909.779524 | 30815.873666 | 30770.754493 | 30830.580119 | 30965.570660 | 31074.862790 | 31066.062397 | 30933.718685 | 30766.639144 | 30675.022829 | 30700.963010 | 30791.397016 | ... | 31257.034885 | 31264.336563 | 31300.065245 | 31342.684879 | 31375.525377 | 31386.046775 | 31362.258516 | 31301.359103 | 31225.968013 | 31185.069625 | 31225.200043 | 31348.360808 | 31494.844160 | 31574.995403 | 31532.203536 | 31387.986425 | 31230.251847 | 31151.297131 | 31183.599756 | 31282.844881 | 31368.501950 | 31386.865644 | 31347.153080 | 31306.651124 | 31321.815477 | 31405.234907 | 31518.720306 | 31602.739344 | 31617.101315 | 31564.063024 | 31481.233700 | 31413.127641 | 31382.205800 | 31377.323280 | 31364.368751 | 31310.075393 | 31203.441067 | 31062.404104 | 30923.186993 | 0 |
| 0 | 141157.216020 | 141242.434636 | 141132.564812 | 141373.143346 | 141213.262888 | 141357.927056 | 141195.854576 | 141249.723060 | 141382.882136 | 141409.646127 | 141270.347310 | 141232.742194 | 141149.203020 | 141287.347367 | 141214.443625 | 141144.196655 | 141200.096238 | 141222.533249 | 141323.471805 | 141207.648435 | 141185.552578 | 141164.668742 | 141212.764561 | 141336.627107 | 141263.521595 | 141308.810296 | 141330.700205 | 141143.809870 | 141298.757841 | 141150.205593 | 141136.397407 | 141408.260213 | 141087.606264 | 141172.925450 | 141292.442324 | 141367.329433 | 141232.994283 | 141120.579153 | 141330.990802 | 141172.077723 | ... | 141102.476944 | 141047.400969 | 141135.385006 | 141131.922378 | 141231.656466 | 141115.855797 | 141220.542768 | 141162.097924 | 141102.469348 | 141206.299004 | 141123.514021 | 141192.723871 | 141248.047431 | 141207.440681 | 141139.602971 | 141157.554097 | 141045.650680 | 141139.907409 | 140888.479585 | 140995.832130 | 141126.000693 | 141195.104186 | 141157.523028 | 141105.662748 | 141112.441533 | 141293.285098 | 141107.636577 | 141054.134890 | 141174.120525 | 140912.054405 | 140998.875288 | 141227.024526 | 141183.770462 | 141301.178908 | 141147.423729 | 141100.943892 | 141378.202818 | 141228.656766 | 141309.960163 | 0 |
| 28 | 62789.448650 | 63084.529078 | 62888.248116 | 62879.160690 | 62902.299203 | 62856.785176 | 62838.438216 | 62890.327459 | 62978.468824 | 62915.586396 | 63030.280126 | 62972.033937 | 63002.387872 | 62917.744686 | 62882.950698 | 62963.445695 | 63047.080991 | 62936.457158 | 62918.843671 | 62903.944131 | 62860.251827 | 62981.319394 | 62955.393306 | 62950.430855 | 62790.031253 | 62864.886726 | 62973.441523 | 62832.014189 | 62840.955762 | 62894.206974 | 62961.589149 | 63016.263609 | 62849.370593 | 62860.779191 | 62823.774972 | 62836.445243 | 62939.162851 | 63026.291359 | 63070.556424 | 62920.886694 | ... | 62681.452641 | 62818.463320 | 62658.489891 | 62751.509989 | 62673.658088 | 62645.568598 | 62681.461059 | 62508.688815 | 62410.438476 | 62357.747106 | 62424.788206 | 62287.583036 | 62463.003606 | 62461.858904 | 62661.955794 | 62695.748268 | 62763.846757 | 62617.561068 | 62713.372410 | 62667.105538 | 62616.660149 | 62616.715432 | 62717.656437 | 62704.556538 | 62715.951555 | 62749.903309 | 62744.565477 | 62690.301100 | 62640.328884 | 62562.900647 | 62681.503019 | 62595.074849 | 62785.115156 | 62726.137782 | 62662.004908 | 62639.195962 | 62595.452252 | 62665.525918 | 62690.144817 | 0 |
| 16 | 41432.330660 | 41693.673194 | 41615.060995 | 41733.206360 | 41407.073111 | 41575.178851 | 41481.167264 | 41461.696150 | 41574.132506 | 41545.980086 | 41589.135266 | 41631.806337 | 41678.861443 | 41666.674866 | 41573.258211 | 41595.650027 | 41704.488771 | 41548.087189 | 41514.108672 | 41592.132381 | 41587.728893 | 41642.447618 | 41577.163200 | 41561.930921 | 41516.422593 | 41608.456849 | 41626.788574 | 41504.738696 | 41597.093006 | 41628.753414 | 41551.683274 | 41666.509823 | 41582.059959 | 41581.388399 | 41583.151091 | 41794.523488 | 41602.794663 | 41608.601705 | 41609.603589 | 41571.278191 | ... | 40881.435582 | 40843.288013 | 40754.713626 | 40845.671971 | 40768.896734 | 40795.521882 | 40733.957968 | 40801.082108 | 40758.474369 | 40777.197609 | 40654.588433 | 40805.613933 | 40743.934771 | 40834.311881 | 40829.452126 | 40898.833956 | 40790.762090 | 40842.380033 | 40691.121858 | 40828.179011 | 40902.506017 | 40846.943964 | 40717.051200 | 40755.964071 | 40761.241218 | 40749.390892 | 40764.210918 | 40783.309633 | 40688.716724 | 40899.485940 | 40846.857493 | 40836.554608 | 40791.558598 | 40873.087750 | 40697.934377 | 40749.979366 | 40747.427558 | 40981.845986 | 40791.714704 | 0 |
| 8 | 75697.102041 | 75521.999008 | 75698.416356 | 75705.738417 | 75615.635709 | 75589.288889 | 75636.802773 | 75605.037888 | 75643.661031 | 75676.159731 | 75686.112903 | 75529.686928 | 75619.574181 | 75598.209878 | 75671.834220 | 75653.166464 | 75668.853037 | 75677.115840 | 75667.335173 | 75869.542965 | 75494.598934 | 75647.836979 | 75566.587210 | 75589.594379 | 75699.678912 | 75702.470303 | 75733.356330 | 75581.316242 | 75790.350131 | 75540.179441 | 75675.170180 | 75585.319226 | 75586.911065 | 75641.780776 | 75606.990303 | 75701.682365 | 75516.024677 | 75547.293254 | 75563.844719 | 75755.697044 | ... | 76268.041648 | 76281.419232 | 76284.440650 | 76394.796219 | 76213.197332 | 76266.147764 | 76103.233510 | 76216.735548 | 76239.222373 | 76242.488306 | 76374.396560 | 76234.833050 | 76231.972588 | 76326.670377 | 76266.439862 | 76203.261758 | 76211.216236 | 76228.558971 | 76245.008354 | 76174.797288 | 76321.157824 | 76245.700392 | 76255.793168 | 76386.641211 | 76193.530045 | 76262.074742 | 76154.024313 | 76303.203262 | 76377.507464 | 76282.547285 | 76156.168117 | 76283.238530 | 76317.845976 | 76357.921520 | 76262.509002 | 76403.950151 | 76152.819830 | 76197.209589 | 76313.579812 | 0 |
5 rows × 15051 columns
How many Labels we got ?
[ ]:
labels, counts = np.unique(y, return_counts=True)
print('Labels =', labels, '\nCounts =', counts)
Labels = [0]
Counts = [496650]
Machine Learning
Preliminaries
Splitting the dataset into the Training set and Test set
[ ]:
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
[ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=True, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(24, 15050) (24,) (9, 15050) (9,)
Time Series Classification
[ ]:
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
# from sklearn import svm
# classifier = svm.SVC()
classifier.fit(X_train, y_train)
KNeighborsClassifier()
[ ]:
y_pred = classifier.predict(X_test)
[ ]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
[[9]]
1.0
Decision Trees - 0.57
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Feature: Periodograms
[ ]:
import pandas as pd
FEATURES_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/features'
PERIODOGRAMS_DIR = FEATURES_DIR + '/feature_periodograms.csv'
data = pd.read_csv(PERIODOGRAMS_DIR)
data.sample(5)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 7487 | 7488 | 7489 | 7490 | 7491 | 7492 | 7493 | 7494 | 7495 | 7496 | 7497 | 7498 | 7499 | 7500 | 7501 | 7502 | 7503 | 7504 | 7505 | 7506 | 7507 | 7508 | 7509 | 7510 | 7511 | 7512 | 7513 | 7514 | 7515 | 7516 | 7517 | 7518 | 7519 | 7520 | 7521 | 7522 | 7523 | 7524 | 7525 | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 127 | 1.771419e-21 | 3.118507e+10 | 1.792958e+10 | 1.571999e+10 | 1.392014e+10 | 3.661464e+10 | 1.002512e+10 | 1.852784e+10 | 1.907169e+10 | 1.278961e+10 | 2.216953e+09 | 9.314578e+09 | 6.515043e+09 | 3.546007e+09 | 4.409342e+09 | 9.162630e+09 | 8.302242e+09 | 8.718788e+09 | 6.957977e+09 | 5.611064e+09 | 1.877066e+09 | 2.836627e+09 | 2.936736e+09 | 3.912005e+09 | 3.473252e+09 | 4.388687e+09 | 6.221741e+09 | 7.087409e+09 | 5.035225e+09 | 4.960782e+09 | 4.799346e+09 | 2.262591e+09 | 2.778995e+09 | 4.723886e+09 | 4.334889e+09 | 1.247480e+09 | 1.892225e+09 | 1.962624e+09 | 1.284465e+09 | 9.996783e+08 | ... | 1.383741e+03 | 1.383737e+03 | 1.383732e+03 | 1.383728e+03 | 1.383724e+03 | 1.383720e+03 | 1.383716e+03 | 1.383712e+03 | 1.383708e+03 | 1.383705e+03 | 1.383701e+03 | 1.383698e+03 | 1.383695e+03 | 1.383692e+03 | 1.383689e+03 | 1.383686e+03 | 1.383683e+03 | 1.383681e+03 | 1.383678e+03 | 1.383676e+03 | 1.383674e+03 | 1.383671e+03 | 1.383670e+03 | 1.383668e+03 | 1.383666e+03 | 1.383664e+03 | 1.383663e+03 | 1.383661e+03 | 1.383660e+03 | 1.383659e+03 | 1.383658e+03 | 1.383657e+03 | 1.383656e+03 | 1.383656e+03 | 1.383655e+03 | 1.383655e+03 | 1.383654e+03 | 1.383654e+03 | 6.918270e+02 | 1 |
| 92 | 3.003987e-22 | 2.193934e+10 | 3.476167e+10 | 5.756229e+09 | 3.338234e+09 | 1.596978e+09 | 3.901232e+09 | 3.096381e+09 | 1.385702e+08 | 2.057867e+08 | 1.498203e+09 | 2.778013e+08 | 1.972097e+09 | 1.002296e+08 | 4.203961e+08 | 5.003830e+08 | 9.213325e+08 | 3.639927e+07 | 1.532101e+08 | 7.530989e+08 | 5.727591e+07 | 3.229936e+08 | 1.240524e+08 | 1.682591e+08 | 4.352538e+08 | 1.146596e+08 | 2.886079e+07 | 1.750333e+07 | 2.888187e+07 | 1.230348e+08 | 5.430957e+06 | 1.471160e+08 | 1.814644e+08 | 1.891106e+07 | 1.950586e+08 | 1.176811e+08 | 3.555871e+07 | 9.392426e+07 | 7.405261e+09 | 2.248148e+08 | ... | 6.895641e+00 | 6.895618e+00 | 6.895596e+00 | 6.895575e+00 | 6.895554e+00 | 6.895534e+00 | 6.895515e+00 | 6.895496e+00 | 6.895477e+00 | 6.895460e+00 | 6.895443e+00 | 6.895426e+00 | 6.895410e+00 | 6.895395e+00 | 6.895380e+00 | 6.895366e+00 | 6.895352e+00 | 6.895340e+00 | 6.895327e+00 | 6.895315e+00 | 6.895304e+00 | 6.895294e+00 | 6.895284e+00 | 6.895275e+00 | 6.895266e+00 | 6.895258e+00 | 6.895250e+00 | 6.895243e+00 | 6.895237e+00 | 6.895231e+00 | 6.895226e+00 | 6.895222e+00 | 6.895218e+00 | 6.895215e+00 | 6.895212e+00 | 6.895210e+00 | 6.895208e+00 | 6.895207e+00 | 3.447603e+00 | 1 |
| 112 | 6.085895e-23 | 2.508199e+10 | 2.671433e+10 | 3.458125e+10 | 2.183046e+09 | 2.913235e+10 | 1.529532e+10 | 2.041126e+10 | 9.492592e+09 | 4.173585e+09 | 8.286883e+09 | 3.807406e+09 | 1.621156e+09 | 3.544895e+09 | 1.688447e+09 | 2.456768e+09 | 9.215532e+08 | 3.701348e+09 | 7.629576e+08 | 8.599160e+08 | 9.432845e+07 | 2.130174e+09 | 8.154774e+08 | 1.670346e+09 | 4.049332e+08 | 2.506906e+08 | 1.076488e+09 | 7.620423e+07 | 3.262416e+08 | 4.253027e+08 | 1.258279e+08 | 4.688591e+08 | 1.020505e+09 | 2.253089e+07 | 1.977092e+08 | 1.187127e+09 | 1.304308e+08 | 7.328006e+08 | 1.594333e+09 | 3.496810e+08 | ... | 1.450557e+03 | 1.450552e+03 | 1.450547e+03 | 1.450543e+03 | 1.450538e+03 | 1.450534e+03 | 1.450530e+03 | 1.450526e+03 | 1.450522e+03 | 1.450519e+03 | 1.450515e+03 | 1.450511e+03 | 1.450508e+03 | 1.450505e+03 | 1.450502e+03 | 1.450499e+03 | 1.450496e+03 | 1.450493e+03 | 1.450491e+03 | 1.450488e+03 | 1.450486e+03 | 1.450484e+03 | 1.450482e+03 | 1.450480e+03 | 1.450478e+03 | 1.450476e+03 | 1.450474e+03 | 1.450473e+03 | 1.450472e+03 | 1.450470e+03 | 1.450469e+03 | 1.450468e+03 | 1.450468e+03 | 1.450467e+03 | 1.450466e+03 | 1.450466e+03 | 1.450466e+03 | 1.450465e+03 | 7.252327e+02 | 1 |
| 18 | 7.520660e-23 | 6.582516e+10 | 1.176575e+09 | 5.726639e+10 | 5.566852e+09 | 2.512593e+10 | 6.645201e+09 | 2.633803e+09 | 2.373441e+09 | 2.738804e+09 | 1.108361e+10 | 1.011122e+09 | 1.796283e+10 | 4.548535e+09 | 4.153863e+09 | 5.986967e+09 | 7.591259e+08 | 4.204495e+09 | 2.445218e+09 | 2.999276e+09 | 2.288641e+08 | 5.048715e+09 | 3.212317e+09 | 1.767702e+08 | 3.379057e+09 | 5.470107e+08 | 1.534445e+06 | 8.561774e+07 | 2.678881e+08 | 5.831388e+08 | 2.250089e+09 | 2.945777e+09 | 2.354936e+09 | 4.394168e+08 | 5.295828e+08 | 1.999850e+09 | 3.491288e+08 | 9.152734e+07 | 6.722281e+08 | 2.420303e+09 | ... | 1.157560e+07 | 4.970938e+06 | 1.044088e+07 | 2.076864e+07 | 5.652504e+06 | 3.240631e+06 | 1.397898e+07 | 1.008912e+06 | 2.327728e+07 | 2.909226e+07 | 3.869044e+06 | 3.251341e+06 | 2.399123e+07 | 5.366716e+07 | 1.863748e+07 | 4.281250e+06 | 1.950633e+07 | 3.989715e+07 | 3.906992e+07 | 1.556306e+07 | 1.263990e+07 | 2.418054e+07 | 1.361411e+07 | 8.309645e+06 | 2.176511e+07 | 8.594606e+06 | 5.784685e+04 | 8.229419e+04 | 2.710502e+07 | 4.228367e+07 | 1.220436e+07 | 1.449948e+06 | 1.991303e+07 | 1.383334e+06 | 3.488514e+06 | 2.693400e+07 | 6.823640e+06 | 5.278711e+06 | 4.999322e+06 | 0 |
| 35 | 2.005767e-23 | 2.483386e+11 | 3.034385e+11 | 5.397407e+09 | 9.714590e+10 | 3.074372e+09 | 4.607364e+10 | 6.113658e+09 | 2.487312e+10 | 3.281864e+09 | 3.171297e+09 | 4.322904e+08 | 7.593255e+08 | 5.041733e+09 | 2.014204e+10 | 8.716218e+07 | 1.438071e+10 | 2.324841e+09 | 5.329362e+09 | 2.122016e+09 | 2.336579e+08 | 3.276469e+08 | 3.614434e+08 | 4.081245e+09 | 2.247445e+09 | 5.308821e+08 | 7.717831e+09 | 7.687563e+08 | 3.279740e+09 | 2.631919e+08 | 6.609370e+09 | 2.438674e+09 | 1.032339e+09 | 6.609401e+09 | 2.769244e+09 | 2.386537e+09 | 2.067472e+10 | 2.984367e+10 | 7.234662e+10 | 5.054561e+10 | ... | 3.646149e+06 | 3.104552e+06 | 6.063856e+07 | 7.379961e+06 | 4.232041e+06 | 8.361054e+06 | 1.008574e+07 | 2.675340e+07 | 2.353878e+06 | 3.933722e+07 | 1.294616e+07 | 6.858054e+07 | 1.564118e+07 | 1.564992e+07 | 1.929105e+07 | 1.895119e+07 | 2.445280e+06 | 1.220851e+07 | 1.957260e+07 | 3.177603e+07 | 9.153447e+06 | 3.882828e+07 | 6.555127e+07 | 3.676038e+06 | 5.286998e+07 | 2.373915e+07 | 3.191280e+06 | 1.308186e+08 | 1.445847e+07 | 2.031333e+07 | 2.910264e+07 | 3.628487e+07 | 8.872919e+07 | 1.077391e+07 | 5.264475e+06 | 2.564387e+07 | 1.219609e+07 | 2.248579e+06 | 4.098014e+06 | 1 |
5 rows × 7527 columns
[ ]:
X = data.iloc[:, :-1].values # Matrix of features (Independent variable), X: numpy.ndarray
y = data.iloc[:, -1].values # Dependent variable vector, y: numpy.ndarray
Preprocessing
1. Normalization
[ ]:
from sklearn import preprocessing
normalized_data = preprocessing.normalize(X)
2. PCA - Dimensionality Reduction
[ ]:
import numpy as np
from sklearn.decomposition import PCA
# What is the minimun value of `n_components` to keep 95% of variance on data ?
pca = PCA()
pca.fit(normalized_data)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1
print("The minimun value is:", d)
The minimun value is: 49
[ ]:
pca = PCA(n_components=d)
pca.fit(normalized_data)
X_reduced = pca.transform(normalized_data)
Best altenative… set the n_components to fluctuate between 0.0 and 1.0, indicating the rate of variance you want to preserve
[ ]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(normalized_data)
Splitting the dataset into the Training set and Test set
[ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.25, shuffle=True, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(98, 49) (98,) (33, 49) (33,)
Train model
[ ]:
from sklearn.tree import DecisionTreeClassifier
class_trees = DecisionTreeClassifier(random_state=42)
class_trees.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
max_depth=None, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=42, splitter='best')
Results
[ ]:
import matplotlib.pyplot as plt
from sklearn.metrics import plot_confusion_matrix
labels_formated = ['confirmed exoplanets', 'eclipsing binaries']
fig = plot_confusion_matrix(class_trees, X_test, y_test,
display_labels=labels_formated,
cmap=plt.cm.Blues,
normalize='true')
fig.ax_.set_title('Decision Trees Classifier - Confusion matrix')
plt.show()
XGBoost - 0.585
https://xgboost.readthedocs.io/en/latest/parameter.html
Feature: Periodograms
[ ]:
import pandas as pd
FEATURES_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/features'
PERIODOGRAMS_DIR = FEATURES_DIR + '/feature_periodograms.csv'
data = pd.read_csv(PERIODOGRAMS_DIR)
data.sample(5)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 7487 | 7488 | 7489 | 7490 | 7491 | 7492 | 7493 | 7494 | 7495 | 7496 | 7497 | 7498 | 7499 | 7500 | 7501 | 7502 | 7503 | 7504 | 7505 | 7506 | 7507 | 7508 | 7509 | 7510 | 7511 | 7512 | 7513 | 7514 | 7515 | 7516 | 7517 | 7518 | 7519 | 7520 | 7521 | 7522 | 7523 | 7524 | 7525 | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 97 | 6.567484e-24 | 1.178041e+10 | 5.298207e+09 | 9.132480e+09 | 1.002822e+10 | 3.540825e+09 | 4.228655e+09 | 4.095593e+09 | 2.244863e+08 | 1.012111e+09 | 2.164379e+09 | 1.789251e+09 | 1.365437e+09 | 1.876186e+09 | 2.027531e+09 | 7.245684e+12 | 1.720589e+09 | 7.179984e+08 | 1.050979e+09 | 9.693505e+08 | 1.354755e+09 | 1.007977e+09 | 3.747895e+08 | 2.814626e+08 | 4.937656e+08 | 2.295634e+09 | 3.026242e+09 | 3.029127e+09 | 1.860285e+09 | 2.587133e+09 | 1.779195e+13 | 8.697306e+08 | 1.296085e+09 | 1.402465e+08 | 1.454082e+09 | 4.639711e+08 | 1.589614e+08 | 1.291053e+09 | 1.517971e+08 | 7.411420e+08 | ... | 1.076525e+03 | 1.076522e+03 | 1.076519e+03 | 1.076515e+03 | 1.076512e+03 | 1.076509e+03 | 1.076506e+03 | 1.076503e+03 | 1.076500e+03 | 1.076497e+03 | 1.076495e+03 | 1.076492e+03 | 1.076489e+03 | 1.076487e+03 | 1.076485e+03 | 1.076483e+03 | 1.076480e+03 | 1.076478e+03 | 1076.476488 | 1.076475e+03 | 1.076473e+03 | 1.076471e+03 | 1076.469734 | 1.076468e+03 | 1.076467e+03 | 1.076466e+03 | 1.076464e+03 | 1.076463e+03 | 1.076462e+03 | 1.076462e+03 | 1.076461e+03 | 1.076460e+03 | 1.076459e+03 | 1076.458898 | 1.076458e+03 | 1.076458e+03 | 1.076458e+03 | 1.076458e+03 | 5.382289e+02 | 1 |
| 59 | 5.748580e-20 | 6.211685e+10 | 2.216964e+09 | 1.435994e+10 | 1.719760e+10 | 8.704939e+09 | 4.907214e+09 | 2.450727e+09 | 6.121095e+09 | 4.615305e+08 | 4.269709e+08 | 2.070437e+09 | 1.608734e+08 | 1.740199e+09 | 2.628103e+08 | 1.164062e+09 | 3.439496e+09 | 3.440799e+09 | 3.752920e+09 | 2.090612e+09 | 1.269435e+09 | 2.729621e+09 | 3.853905e+09 | 8.723574e+09 | 2.646283e+09 | 8.650304e+09 | 6.046799e+09 | 1.536936e+10 | 1.372588e+10 | 3.027827e+10 | 2.291537e+10 | 1.746399e+10 | 2.597201e+11 | 6.334982e+12 | 1.774388e+12 | 2.501299e+11 | 4.467781e+10 | 8.605329e+10 | 3.695356e+10 | 3.505263e+10 | ... | 2.414816e+04 | 2.414808e+04 | 2.414800e+04 | 2.414793e+04 | 2.414785e+04 | 2.414778e+04 | 2.414772e+04 | 2.414765e+04 | 2.414759e+04 | 2.414752e+04 | 2.414746e+04 | 2.414741e+04 | 2.414735e+04 | 2.414730e+04 | 2.414724e+04 | 2.414719e+04 | 2.414715e+04 | 2.414710e+04 | 24147.059038 | 2.414702e+04 | 2.414698e+04 | 2.414694e+04 | 24146.907523 | 2.414687e+04 | 2.414684e+04 | 2.414682e+04 | 2.414679e+04 | 2.414677e+04 | 2.414674e+04 | 2.414672e+04 | 2.414671e+04 | 2.414669e+04 | 2.414668e+04 | 24146.664472 | 2.414666e+04 | 2.414665e+04 | 2.414664e+04 | 2.414664e+04 | 1.207332e+04 | 1 |
| 24 | 1.752484e-21 | 9.756491e+10 | 3.995481e+10 | 1.105439e+10 | 3.336733e+09 | 3.161758e+09 | 3.782595e+09 | 5.017278e+09 | 1.508046e+09 | 1.459446e+09 | 5.922758e+08 | 4.391054e+08 | 1.723186e+09 | 1.404142e+09 | 9.070158e+08 | 4.035657e+08 | 8.472543e+08 | 2.859164e+08 | 2.233725e+08 | 4.385475e+08 | 2.653935e+07 | 1.300230e+09 | 1.120066e+09 | 1.688150e+08 | 1.741311e+07 | 5.441521e+08 | 2.447330e+07 | 6.426762e+07 | 2.427007e+08 | 6.739195e+07 | 6.132567e+08 | 1.442627e+08 | 9.345792e+07 | 1.235102e+08 | 4.746666e+08 | 1.192272e+08 | 2.053329e+08 | 2.421255e+08 | 9.038041e+07 | 4.684413e+07 | ... | 5.069784e+06 | 1.481839e+06 | 6.506158e+06 | 4.203149e+06 | 1.004154e+07 | 7.254977e+06 | 1.675987e+07 | 1.362856e+07 | 3.276551e+06 | 3.198822e+07 | 3.470285e+06 | 2.147650e+07 | 1.048925e+07 | 2.111648e+06 | 7.221949e+06 | 1.317321e+07 | 2.028878e+07 | 2.733158e+06 | 373819.128620 | 1.328582e+07 | 9.991568e+06 | 4.730944e+06 | 474384.659373 | 5.020889e+06 | 9.238025e+06 | 2.269643e+07 | 2.639189e+07 | 1.203043e+07 | 6.267373e+06 | 4.170246e+06 | 4.640980e+06 | 6.198973e+06 | 1.228420e+07 | 223599.331136 | 4.635344e+06 | 2.317602e+07 | 4.720558e+06 | 1.783050e+07 | 7.135303e+07 | 0 |
| 71 | 4.444052e-21 | 1.225255e+11 | 8.684239e+11 | 3.046022e+11 | 3.763060e+11 | 3.998026e+10 | 2.059423e+07 | 1.908615e+09 | 1.245052e+10 | 1.664964e+10 | 1.750149e+09 | 1.974743e+09 | 1.564496e+10 | 9.473577e+09 | 4.174117e+09 | 3.589926e+08 | 3.528037e+08 | 4.494764e+08 | 1.807520e+09 | 1.496267e+09 | 9.872100e+07 | 1.281638e+09 | 9.822630e+07 | 2.448930e+09 | 1.786816e+07 | 8.014748e+08 | 9.356294e+08 | 8.695086e+08 | 1.502131e+09 | 8.623401e+08 | 2.380475e+09 | 6.490083e+08 | 1.295033e+09 | 5.574792e+08 | 7.684658e+08 | 2.706459e+09 | 1.572336e+08 | 8.620298e+08 | 2.541334e+08 | 5.012295e+07 | ... | 3.042985e+04 | 3.042975e+04 | 3.042965e+04 | 3.042956e+04 | 3.042947e+04 | 3.042938e+04 | 3.042929e+04 | 3.042921e+04 | 3.042913e+04 | 3.042905e+04 | 3.042897e+04 | 3.042890e+04 | 3.042883e+04 | 3.042876e+04 | 3.042870e+04 | 3.042864e+04 | 3.042858e+04 | 3.042852e+04 | 30428.464346 | 3.042841e+04 | 3.042836e+04 | 3.042832e+04 | 30428.273418 | 3.042823e+04 | 3.042819e+04 | 3.042816e+04 | 3.042812e+04 | 3.042809e+04 | 3.042807e+04 | 3.042804e+04 | 3.042802e+04 | 3.042800e+04 | 3.042798e+04 | 30427.967140 | 3.042796e+04 | 3.042795e+04 | 3.042794e+04 | 3.042794e+04 | 1.521397e+04 | 1 |
| 32 | 4.534147e-21 | 4.810687e+11 | 7.156296e+11 | 7.907086e+10 | 1.827524e+11 | 3.066684e+10 | 2.281262e+10 | 2.472757e+10 | 2.943501e+10 | 3.255523e+10 | 3.103233e+10 | 2.130254e+10 | 3.009069e+10 | 7.761600e+09 | 1.505235e+10 | 6.974905e+09 | 1.466913e+10 | 5.582512e+08 | 1.547837e+10 | 8.222439e+09 | 8.399665e+08 | 1.454228e+09 | 3.245127e+09 | 4.154653e+09 | 1.935284e+09 | 5.700948e+09 | 3.176125e+09 | 3.062322e+09 | 3.942368e+08 | 1.405007e+09 | 5.606020e+08 | 8.074106e+08 | 8.842706e+08 | 1.022849e+08 | 2.209409e+09 | 3.976962e+09 | 3.228091e+08 | 1.296049e+09 | 1.387544e+09 | 3.794390e+08 | ... | 2.085999e+05 | 2.085992e+05 | 2.085985e+05 | 2.085979e+05 | 2.085973e+05 | 2.085966e+05 | 2.085961e+05 | 2.085955e+05 | 2.085949e+05 | 2.085944e+05 | 2.085939e+05 | 2.085934e+05 | 2.085929e+05 | 2.085924e+05 | 2.085920e+05 | 2.085916e+05 | 2.085911e+05 | 2.085908e+05 | 208590.383540 | 2.085900e+05 | 2.085897e+05 | 2.085894e+05 | 208589.074708 | 2.085888e+05 | 2.085885e+05 | 2.085883e+05 | 2.085881e+05 | 2.085878e+05 | 2.085877e+05 | 2.085875e+05 | 2.085873e+05 | 2.085872e+05 | 2.085871e+05 | 208586.975144 | 2.085869e+05 | 2.085868e+05 | 2.085868e+05 | 2.085868e+05 | 1.042934e+05 | 0 |
5 rows × 7527 columns
[ ]:
X = data.iloc[:, :-1].values # Matrix of features (Independent variable), X: numpy.ndarray
y = data.iloc[:, -1].values # Dependent variable vector, y: numpy.ndarray
Preprocessing
1. Normalization
[ ]:
from sklearn import preprocessing
normalized_data = preprocessing.normalize(X)
2. PCA - Dimensionality Reduction
[ ]:
import numpy as np
from sklearn.decomposition import PCA
# What is the minimun value of `n_components` to keep 95% of variance on data ?
pca = PCA()
pca.fit(normalized_data)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1
print("The minimun value is:", d)
The minimun value is: 49
[ ]:
pca = PCA(n_components=d)
pca.fit(normalized_data)
X_reduced = pca.transform(normalized_data)
Best altenative… set the n_components to fluctuate between 0.0 and 1.0, indicating the rate of variance you want to preserve
[ ]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(normalized_data)
Splitting the dataset into the Training set and Test set
[ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.25, shuffle=True, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(98, 49) (98,) (33, 49) (33,)
Train model
[ ]:
import xgboost as xgb
class_xgb = xgb.XGBClassifier(learning_rate=0.3, max_depth=6, verbosity=0, random_state=42)
class_xgb.fit(X_train, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
learning_rate=0.3, max_delta_step=0, max_depth=6,
min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
nthread=None, objective='binary:logistic', random_state=42,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=0)
Results
[ ]:
import matplotlib.pyplot as plt
from sklearn.metrics import plot_confusion_matrix
labels_formated = ['confirmed exoplanets', 'eclipsing binaries']
fig = plot_confusion_matrix(class_xgb, X_test, y_test,
display_labels=labels_formated,
cmap=plt.cm.Blues,
normalize='true')
fig.ax_.set_title('XGBoost Classifier - Confusion matrix')
plt.show()
Gaussian Mixture Models
https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html
Feature: Periodograms
[ ]:
import pandas as pd
FEATURES_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/features'
PERIODOGRAMS_DIR = FEATURES_DIR + '/feature_periodograms.csv'
data = pd.read_csv(PERIODOGRAMS_DIR)
data.sample(5)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 7487 | 7488 | 7489 | 7490 | 7491 | 7492 | 7493 | 7494 | 7495 | 7496 | 7497 | 7498 | 7499 | 7500 | 7501 | 7502 | 7503 | 7504 | 7505 | 7506 | 7507 | 7508 | 7509 | 7510 | 7511 | 7512 | 7513 | 7514 | 7515 | 7516 | 7517 | 7518 | 7519 | 7520 | 7521 | 7522 | 7523 | 7524 | 7525 | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 121 | 3.099730e-21 | 1.450455e+11 | 5.851971e+10 | 1.371264e+10 | 6.332105e+10 | 7.659462e+10 | 2.038631e+10 | 1.701385e+10 | 3.411757e+10 | 1.877369e+10 | 8.258293e+10 | 4.761234e+09 | 1.889632e+10 | 4.737847e+10 | 6.443619e+10 | 2.390949e+10 | 3.833547e+10 | 4.000678e+10 | 1.699051e+10 | 1.877011e+10 | 2.347108e+10 | 1.919521e+10 | 1.872913e+10 | 5.364253e+10 | 9.076523e+09 | 4.618874e+10 | 2.637520e+10 | 1.318005e+10 | 3.798276e+10 | 2.442435e+10 | 4.000654e+10 | 4.623934e+10 | 3.615225e+10 | 2.993127e+10 | 3.303417e+10 | 3.022440e+10 | 2.011551e+10 | 3.900373e+10 | 4.138960e+10 | 2.368466e+10 | ... | 9970.968395 | 9970.935809 | 9970.904092 | 9970.873244 | 9970.843265 | 9970.814155 | 9970.785914 | 9970.758543 | 9970.732040 | 9970.706406 | 9970.681642 | 9970.657746 | 9970.634719 | 9970.612562 | 9970.591273 | 9970.570853 | 9970.551303 | 9970.532621 | 9970.514808 | 9970.497864 | 9970.481790 | 9970.466584 | 9970.452247 | 9970.438779 | 9970.426180 | 9970.414449 | 9970.403588 | 9970.393596 | 9970.384472 | 9970.376218 | 9970.368832 | 9970.362315 | 9970.356668 | 9970.351889 | 9970.347979 | 9970.344938 | 9970.342765 | 9970.341462 | 4985.170514 | 1 |
| 104 | 2.544691e-21 | 9.069646e+10 | 6.643117e+10 | 5.117582e+10 | 3.413736e+10 | 4.787929e+10 | 1.866267e+10 | 1.943006e+10 | 2.532991e+09 | 9.193865e+09 | 2.942133e+08 | 7.123645e+09 | 1.065354e+10 | 2.174687e+10 | 2.695494e+11 | 2.432953e+11 | 1.574357e+10 | 3.968706e+09 | 2.984037e+09 | 4.355211e+09 | 4.921942e+09 | 6.969427e+09 | 9.030065e+09 | 6.355329e+09 | 9.656856e+09 | 4.718208e+09 | 3.562666e+09 | 1.698194e+09 | 8.258263e+09 | 7.824319e+11 | 8.895930e+08 | 1.499607e+09 | 1.739245e+10 | 2.566922e+09 | 9.158103e+09 | 6.630252e+10 | 1.937514e+10 | 1.056860e+11 | 1.433715e+10 | 1.082412e+10 | ... | 2453.104169 | 2453.096152 | 2453.088348 | 2453.080759 | 2453.073384 | 2453.066222 | 2453.059274 | 2453.052540 | 2453.046019 | 2453.039713 | 2453.033620 | 2453.027741 | 2453.022076 | 2453.016625 | 2453.011387 | 2453.006364 | 2453.001554 | 2452.996957 | 2452.992575 | 2452.988406 | 2452.984452 | 2452.980711 | 2452.977183 | 2452.973870 | 2452.970770 | 2452.967884 | 2452.965212 | 2452.962754 | 2452.960509 | 2452.958478 | 2452.956661 | 2452.955058 | 2452.953669 | 2452.952493 | 2452.951531 | 2452.950783 | 2452.950248 | 2452.949928 | 1226.474910 | 1 |
| 83 | 7.221651e-21 | 4.505717e+08 | 7.566718e+06 | 9.076482e+08 | 5.055431e+08 | 5.201719e+08 | 3.460490e+08 | 2.954526e+08 | 3.242125e+08 | 4.557197e+07 | 3.167933e+07 | 2.440220e+07 | 1.403096e+08 | 9.453829e+07 | 1.654833e+07 | 1.208286e+06 | 3.350352e+07 | 5.316386e+07 | 3.449430e+07 | 1.229629e+08 | 1.553346e+08 | 2.783925e+08 | 8.946271e+07 | 7.675415e+06 | 1.505378e+07 | 6.678298e+05 | 1.965005e+08 | 4.079288e+08 | 9.018535e+07 | 3.032997e+07 | 5.564186e+07 | 1.726893e+08 | 2.075812e+08 | 7.274091e+08 | 1.861569e+08 | 2.016792e+08 | 4.417020e+08 | 2.089950e+08 | 3.106678e+08 | 9.919087e+08 | ... | 121.901107 | 121.900708 | 121.900321 | 121.899943 | 121.899577 | 121.899221 | 121.898876 | 121.898541 | 121.898217 | 121.897904 | 121.897601 | 121.897309 | 121.897027 | 121.896756 | 121.896496 | 121.896247 | 121.896008 | 121.895779 | 121.895561 | 121.895354 | 121.895158 | 121.894972 | 121.894797 | 121.894632 | 121.894478 | 121.894334 | 121.894202 | 121.894079 | 121.893968 | 121.893867 | 121.893777 | 121.893697 | 121.893628 | 121.893570 | 121.893522 | 121.893485 | 121.893458 | 121.893442 | 60.946718 | 1 |
| 126 | 2.472828e-21 | 8.480713e+10 | 3.326409e+10 | 1.004276e+10 | 1.105842e+10 | 9.378111e+09 | 5.750926e+08 | 2.931827e+09 | 7.503412e+09 | 2.734914e+09 | 7.226858e+09 | 8.336502e+08 | 3.357986e+09 | 9.016144e+08 | 1.089214e+09 | 2.594842e+09 | 2.696162e+09 | 1.423190e+09 | 3.079399e+09 | 1.936290e+09 | 4.090007e+06 | 3.107745e+08 | 4.000904e+08 | 3.199724e+07 | 3.106817e+08 | 1.349838e+09 | 1.368688e+07 | 2.230739e+08 | 9.444796e+08 | 5.780070e+08 | 3.091359e+08 | 1.091311e+08 | 7.396224e+07 | 1.655690e+09 | 2.581546e+09 | 8.553230e+06 | 3.868140e+08 | 1.264882e+09 | 1.846532e+08 | 4.150413e+08 | ... | 8732.550481 | 8732.521943 | 8732.494165 | 8732.467148 | 8732.440893 | 8732.415398 | 8732.390665 | 8732.366693 | 8732.343482 | 8732.321032 | 8732.299343 | 8732.278416 | 8732.258249 | 8732.238843 | 8732.220199 | 8732.202315 | 8732.185193 | 8732.168832 | 8732.153231 | 8732.138392 | 8732.124313 | 8732.110996 | 8732.098440 | 8732.086645 | 8732.075610 | 8732.065337 | 8732.055825 | 8732.047073 | 8732.039083 | 8732.031854 | 8732.025386 | 8732.019678 | 8732.014732 | 8732.010546 | 8732.007122 | 8732.004459 | 8732.002556 | 8732.001415 | 4366.000517 | 1 |
| 13 | 1.750600e-22 | 7.424050e+09 | 1.358892e+09 | 3.309422e+08 | 5.295420e+08 | 2.036254e+09 | 8.268143e+08 | 1.861110e+09 | 9.343690e+07 | 1.156138e+08 | 2.263122e+07 | 2.557775e+08 | 4.054398e+07 | 3.574917e+08 | 1.151586e+08 | 1.266685e+08 | 2.280232e+08 | 1.757015e+08 | 6.112853e+07 | 4.058141e+06 | 3.930772e+07 | 2.389294e+08 | 1.018442e+07 | 2.026846e+08 | 8.033815e+07 | 5.084299e+07 | 2.686861e+08 | 2.468639e+07 | 7.259111e+07 | 6.207652e+07 | 1.287359e+06 | 1.152442e+07 | 1.407730e+08 | 1.035752e+08 | 4.983365e+07 | 7.823538e+07 | 3.741193e+06 | 2.443456e+07 | 5.221536e+07 | 5.451975e+07 | ... | 82.035061 | 82.034793 | 82.034532 | 82.034278 | 82.034032 | 82.033792 | 82.033560 | 82.033335 | 82.033117 | 82.032906 | 82.032702 | 82.032505 | 82.032316 | 82.032134 | 82.031958 | 82.031790 | 82.031630 | 82.031476 | 82.031329 | 82.031190 | 82.031058 | 82.030932 | 82.030815 | 82.030704 | 82.030600 | 82.030504 | 82.030414 | 82.030332 | 82.030257 | 82.030189 | 82.030128 | 82.030075 | 82.030028 | 82.029989 | 82.029957 | 82.029932 | 82.029914 | 82.029903 | 41.014950 | 0 |
5 rows × 7527 columns
[ ]:
X = data.iloc[:, :-1].values # Matrix of features (Independent variable), X: numpy.ndarray
y = data.iloc[:, -1].values # Dependent variable vector, y: numpy.ndarray
Preprocessing
1. Normalization
[ ]:
from sklearn import preprocessing
normalized_data = preprocessing.normalize(X)
2. PCA - Dimensionality Reduction
[ ]:
import numpy as np
from sklearn.decomposition import PCA
# What is the minimun value of `n_components` to keep 95% of variance on data ?
pca = PCA()
pca.fit(normalized_data)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1
print("The minimun value is:", d)
The minimun value is: 49
[ ]:
pca = PCA(n_components=d)
pca.fit(normalized_data)
X_reduced = pca.transform(normalized_data)
Best altenative… set the n_components to fluctuate between 0.0 and 1.0, indicating the rate of variance you want to preserve
[ ]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(normalized_data)
Splitting the dataset into the Training set and Test set
[ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.25, shuffle=True, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(98, 49) (98,) (33, 49) (33,)
Train model
[ ]:
# from sklearn.mixture import GaussianMixture
# classifier = GaussianMixture(n_components=2, random_state=42)
# classifier.fit(X_train, y_train)
Results
[ ]:
# import matplotlib.pyplot as plt
# from sklearn.metrics import plot_confusion_matrix
# labels_formated = ['confirmed exoplanets', 'eclipsing binaries']
# fig = plot_confusion_matrix(classifier, X_test, y_test,
# display_labels=labels_formated,
# cmap=plt.cm.Blues,
# normalize='true')
# fig.ax_.set_title('Gaussian Mixture Classifier - Confusion matrix')
# plt.show()
Lazy Predict
https://lazypredict.readthedocs.io/en/latest/usage.html#classification
Feature: Periodograms
[1]:
import pandas as pd
FEATURES_DIR = '/content/drive/MyDrive/01 - Iniciação Científica/02 - Datasets/features'
PERIODOGRAMS_DIR = FEATURES_DIR + '/feature_periodograms.csv'
data = pd.read_csv(PERIODOGRAMS_DIR)
data.sample(5)
[1]:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | ... | 7487 | 7488 | 7489 | 7490 | 7491 | 7492 | 7493 | 7494 | 7495 | 7496 | 7497 | 7498 | 7499 | 7500 | 7501 | 7502 | 7503 | 7504 | 7505 | 7506 | 7507 | 7508 | 7509 | 7510 | 7511 | 7512 | 7513 | 7514 | 7515 | 7516 | 7517 | 7518 | 7519 | 7520 | 7521 | 7522 | 7523 | 7524 | 7525 | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20 | 5.244937e-22 | 1.656106e+11 | 4.241954e+10 | 1.295592e+10 | 1.441850e+10 | 7.333357e+09 | 3.654204e+09 | 8.038780e+08 | 2.345564e+09 | 3.428295e+08 | 1.486851e+08 | 4.355384e+07 | 1.922543e+09 | 1.190937e+09 | 1.042278e+09 | 2.177381e+09 | 2.186346e+08 | 2.266618e+08 | 1.933830e+08 | 1.024137e+09 | 6.308557e+07 | 3.175023e+08 | 3.831862e+08 | 1.347591e+08 | 4.991692e+07 | 6.616579e+07 | 3.089654e+08 | 2.338376e+08 | 1.757878e+08 | 1.022475e+08 | 1.619658e+08 | 2.161862e+07 | 7.733964e+07 | 1.881846e+08 | 2.409856e+07 | 1.216204e+08 | 5.106637e+07 | 2.844362e+07 | 1.700110e+08 | 1.287244e+07 | ... | 2.901623e+06 | 9.709432e+06 | 6.766949e+06 | 3.311118e+07 | 3.102628e+06 | 6.014224e+06 | 1.052508e+07 | 1.792224e+06 | 5.597886e+06 | 8.938615e+06 | 2.654949e+06 | 3.223749e+05 | 1.365606e+07 | 5.680955e+06 | 1.315171e+07 | 1.306965e+07 | 1.334836e+06 | 2.123376e+07 | 3.334838e+04 | 4.742393e+06 | 6.822047e+05 | 4.105110e+07 | 9.704217e+06 | 1.313942e+06 | 7.024191e+06 | 3.920979e+05 | 1.611153e+07 | 8.167207e+06 | 1.913767e+07 | 3.936906e+06 | 4.087972e+06 | 2.124753e+07 | 1.382830e+07 | 3.013891e+06 | 1.112224e+07 | 1.568576e+07 | 6.561321e+06 | 2.489063e+06 | 2.142671e+05 | 0 |
| 109 | 5.222241e-23 | 7.189921e+10 | 9.901414e+10 | 1.476100e+10 | 1.398462e+10 | 1.147834e+10 | 6.674100e+09 | 4.094150e+09 | 3.790864e+10 | 4.962443e+09 | 9.110133e+08 | 1.529316e+09 | 2.659406e+08 | 5.565563e+09 | 2.297864e+06 | 2.821536e+09 | 2.618699e+09 | 2.476775e+09 | 2.353264e+08 | 3.537825e+09 | 4.176115e+07 | 1.394874e+08 | 2.349405e+09 | 4.841487e+07 | 4.160097e+09 | 3.921834e+08 | 1.123126e+09 | 2.430828e+08 | 1.808513e+09 | 4.106579e+08 | 5.394205e+07 | 1.265701e+09 | 2.936783e+08 | 4.351451e+06 | 5.507590e+08 | 2.131361e+08 | 1.194628e+09 | 4.435846e+08 | 6.775218e+07 | 6.498562e+07 | ... | 1.676719e+07 | 1.069171e+06 | 1.431339e+07 | 7.417587e+06 | 2.241231e+07 | 4.796347e+06 | 7.506176e+05 | 2.630452e+06 | 9.664950e+05 | 2.038954e+07 | 1.462608e+07 | 2.389343e+07 | 1.720616e+06 | 4.120930e+05 | 3.470017e+06 | 3.228340e+06 | 5.715047e+05 | 1.583645e+07 | 2.447099e+07 | 2.556097e+07 | 4.713815e+06 | 1.216467e+07 | 1.930779e+07 | 1.756057e+07 | 2.729222e+07 | 1.440361e+07 | 1.199978e+07 | 5.090961e+06 | 4.848588e+06 | 1.571179e+07 | 3.844894e+06 | 2.743101e+06 | 4.370159e+05 | 5.439133e+06 | 7.427570e+04 | 1.760637e+06 | 2.320080e+06 | 6.450329e+06 | 5.866100e+06 | 1 |
| 12 | 1.741909e-23 | 7.970192e+09 | 3.650109e+09 | 1.582624e+09 | 3.266343e+10 | 3.377317e+10 | 1.471925e+09 | 1.665996e+08 | 8.887137e+09 | 4.330463e+10 | 2.479880e+09 | 2.478121e+08 | 7.060146e+08 | 1.036437e+10 | 3.542269e+08 | 4.027595e+06 | 3.590214e+08 | 1.353871e+09 | 4.792989e+08 | 5.705030e+08 | 8.400022e+08 | 7.151057e+08 | 2.327222e+08 | 2.219457e+08 | 1.349461e+08 | 2.387996e+09 | 5.975332e+09 | 3.989845e+08 | 1.002950e+08 | 3.872512e+07 | 9.498571e+07 | 2.608256e+08 | 1.725462e+08 | 5.317832e+08 | 1.437279e+08 | 1.145169e+08 | 8.002391e+07 | 2.828761e+08 | 6.701132e+09 | 1.186455e+09 | ... | 1.947367e+03 | 1.947360e+03 | 1.947354e+03 | 1.947348e+03 | 1.947342e+03 | 1.947337e+03 | 1.947331e+03 | 1.947326e+03 | 1.947320e+03 | 1.947315e+03 | 1.947311e+03 | 1.947306e+03 | 1.947301e+03 | 1.947297e+03 | 1.947293e+03 | 1.947289e+03 | 1.947285e+03 | 1.947282e+03 | 1.947278e+03 | 1.947275e+03 | 1.947272e+03 | 1.947269e+03 | 1.947266e+03 | 1.947263e+03 | 1.947261e+03 | 1.947258e+03 | 1.947256e+03 | 1.947254e+03 | 1.947253e+03 | 1.947251e+03 | 1.947250e+03 | 1.947248e+03 | 1.947247e+03 | 1.947246e+03 | 1.947245e+03 | 1.947245e+03 | 1.947244e+03 | 1.947244e+03 | 9.736221e+02 | 0 |
| 107 | 1.537079e-21 | 2.334249e+10 | 1.246708e+11 | 7.693346e+10 | 3.661008e+10 | 9.257208e+08 | 1.661148e+09 | 4.462807e+09 | 2.758899e+09 | 2.238848e+09 | 8.078527e+09 | 3.787129e+09 | 5.939399e+09 | 3.755190e+09 | 3.769742e+09 | 4.052975e+08 | 5.671221e+09 | 6.723360e+09 | 1.641852e+09 | 5.226829e+09 | 6.062483e+08 | 1.519938e+09 | 1.789390e+09 | 2.119600e+07 | 1.049363e+09 | 1.143246e+08 | 3.384769e+08 | 5.382716e+08 | 7.376932e+08 | 8.038340e+08 | 8.351175e+08 | 2.170651e+08 | 6.091711e+06 | 1.795124e+09 | 8.766678e+07 | 6.037906e+07 | 9.266256e+08 | 3.747338e+08 | 1.156053e+09 | 5.333725e+07 | ... | 7.612817e+00 | 7.612792e+00 | 7.612768e+00 | 7.612744e+00 | 7.612721e+00 | 7.612699e+00 | 7.612677e+00 | 7.612656e+00 | 7.612636e+00 | 7.612617e+00 | 7.612598e+00 | 7.612580e+00 | 7.612562e+00 | 7.612545e+00 | 7.612529e+00 | 7.612513e+00 | 7.612498e+00 | 7.612484e+00 | 7.612470e+00 | 7.612457e+00 | 7.612445e+00 | 7.612434e+00 | 7.612423e+00 | 7.612412e+00 | 7.612403e+00 | 7.612394e+00 | 7.612385e+00 | 7.612378e+00 | 7.612371e+00 | 7.612365e+00 | 7.612359e+00 | 7.612354e+00 | 7.612350e+00 | 7.612346e+00 | 7.612343e+00 | 7.612341e+00 | 7.612339e+00 | 7.612338e+00 | 3.806169e+00 | 1 |
| 90 | 1.676619e-22 | 6.268024e+10 | 1.892612e+09 | 4.589268e+09 | 2.553798e+09 | 1.436860e+09 | 1.871399e+09 | 6.763201e+08 | 8.569126e+08 | 4.346068e+08 | 3.224972e+09 | 6.266573e+08 | 5.095192e+08 | 2.118835e+07 | 2.047743e+09 | 1.136897e+10 | 9.023897e+10 | 1.642913e+10 | 1.140840e+10 | 8.220013e+09 | 7.671658e+09 | 3.662531e+09 | 7.649781e+09 | 7.552201e+09 | 9.279831e+09 | 6.586815e+09 | 8.289219e+09 | 1.490018e+10 | 2.078385e+10 | 3.645875e+10 | 4.881829e+10 | 1.412379e+11 | 1.441203e+12 | 9.568005e+11 | 1.428168e+11 | 5.584073e+10 | 2.796076e+10 | 1.304114e+10 | 1.200738e+10 | 8.144469e+09 | ... | 1.037832e+03 | 1.037829e+03 | 1.037825e+03 | 1.037822e+03 | 1.037819e+03 | 1.037816e+03 | 1.037813e+03 | 1.037810e+03 | 1.037807e+03 | 1.037805e+03 | 1.037802e+03 | 1.037800e+03 | 1.037797e+03 | 1.037795e+03 | 1.037793e+03 | 1.037791e+03 | 1.037789e+03 | 1.037787e+03 | 1.037785e+03 | 1.037783e+03 | 1.037781e+03 | 1.037780e+03 | 1.037778e+03 | 1.037777e+03 | 1.037776e+03 | 1.037774e+03 | 1.037773e+03 | 1.037772e+03 | 1.037771e+03 | 1.037770e+03 | 1.037770e+03 | 1.037769e+03 | 1.037768e+03 | 1.037768e+03 | 1.037767e+03 | 1.037767e+03 | 1.037767e+03 | 1.037767e+03 | 5.188833e+02 | 1 |
5 rows × 7527 columns
[2]:
X = data.iloc[:, :-1].values # Matrix of features (Independent variable), X: numpy.ndarray
y = data.iloc[:, -1].values # Dependent variable vector, y: numpy.ndarray
Preprocessing
1. Normalization
[3]:
from sklearn import preprocessing
normalized_data = preprocessing.normalize(X)
2. PCA - Dimensionality Reduction
[4]:
import numpy as np
from sklearn.decomposition import PCA
# What is the minimun value of `n_components` to keep 95% of variance on data ?
pca = PCA()
pca.fit(normalized_data)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1
print("The minimun value is:", d)
The minimun value is: 49
[23]:
pca = PCA(n_components=d)
pca.fit(normalized_data)
X_reduced = pca.transform(normalized_data)
Best altenative… set the n_components to fluctuate between 0.0 and 1.0, indicating the rate of variance you want to preserve
[25]:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(normalized_data)
Splitting the dataset into the Training set and Test set
[26]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_reduced, y, test_size=0.25, shuffle=True, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(98, 49) (98,) (33, 49) (33,)
Train model
[ ]:
!pip install lazypredict
[18]:
from lazypredict.Supervised import LazyClassifier
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
100%|██████████| 29/29 [00:01<00:00, 24.28it/s]
Results
[19]:
print(models.Accuracy.sort_values(ascending=False))
Model
QuadraticDiscriminantAnalysis 0.70
SVC 0.70
AdaBoostClassifier 0.70
XGBClassifier 0.70
ExtraTreesClassifier 0.70
RandomForestClassifier 0.70
CalibratedClassifierCV 0.70
KNeighborsClassifier 0.67
LGBMClassifier 0.67
BaggingClassifier 0.67
NearestCentroid 0.67
DecisionTreeClassifier 0.64
BernoulliNB 0.64
GaussianNB 0.64
DummyClassifier 0.61
SGDClassifier 0.58
LinearDiscriminantAnalysis 0.58
RidgeClassifier 0.55
ExtraTreeClassifier 0.55
LogisticRegression 0.55
Perceptron 0.52
PassiveAggressiveClassifier 0.52
LinearSVC 0.52
RidgeClassifierCV 0.52
LabelSpreading 0.33
LabelPropagation 0.33
Name: Accuracy, dtype: float64