Scaling Features For Machine Learning
I have a question about how to scale my dataset properly. It consists of A date which I currently store as seconds A value that can be between 1 and 5 And about 240 bool values 1
Solution 1:
Scaling is in most cases applied to each feature seperately, and that's what StandardScaler is doing. Therefore it is totally natural that some 0s stay zero while others are transformed. Look at the following code
int_mat = np.array([[0,0],[0,1],[0,2]])
Output
array([[0, 0],
[0, 1],
[0, 2]])
Now we do scaling
from sklearn.preprocessing import StandardScaler
ssc = StandardScaler()
int_scaled = ssc.fit_transform(int_mat)
inverse_scaling = ssc.inverse_transform(int_scaled)
int_scaled
array([[ 0. , -1.22474487],
[ 0. , 0. ],
[ 0. , 1.22474487]])
as you see, the first feature (first column) stays the same because it has already zero mean.
Inverse transformation results in the original matrix
inverse_scaling
array([[0.00000000e+00, 1.11022302e-16],
[0.00000000e+00, 1.00000000e+00],
[0.00000000e+00, 2.00000000e+00]])
Post a Comment for "Scaling Features For Machine Learning"