Skip to content Skip to sidebar Skip to footer

Scaling Features For Machine Learning

I have a question about how to scale my dataset properly. It consists of A date which I currently store as seconds A value that can be between 1 and 5 And about 240 bool values 1

Solution 1:

Scaling is in most cases applied to each feature seperately, and that's what StandardScaler is doing. Therefore it is totally natural that some 0s stay zero while others are transformed. Look at the following code

int_mat = np.array([[0,0],[0,1],[0,2]])

Output

array([[0, 0],
   [0, 1],
   [0, 2]])

Now we do scaling

from sklearn.preprocessing import StandardScaler

ssc = StandardScaler()
int_scaled = ssc.fit_transform(int_mat)
inverse_scaling = ssc.inverse_transform(int_scaled)

int_scaled

array([[ 0.        , -1.22474487],
       [ 0.        ,  0.        ],
       [ 0.        ,  1.22474487]])

as you see, the first feature (first column) stays the same because it has already zero mean.

Inverse transformation results in the original matrix

inverse_scaling

array([[0.00000000e+00, 1.11022302e-16],
       [0.00000000e+00, 1.00000000e+00],
       [0.00000000e+00, 2.00000000e+00]])

Post a Comment for "Scaling Features For Machine Learning"