ValueError while fitting Decision Tree Classifier on a dataset












2















I have created features X and labels y for the dataset I am working on.



At this point, I want to train a random forest classifier on it but I am facing a ValueError while fitting the classifier on the training data: setting an array element with a sequence.



Below the X and y features and the error details:



X:



(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-0.00050612, -0.00057967, -0.00035985, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32),
array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ...,
1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32),
array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ...,
-8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32),
array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
8.2463991e-09, 0.0000000e+00], dtype=float32),
array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
-1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ...,
-1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32),
array([-0.00103188, -0.00075814, -0.00051426, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ...,
1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32),
array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ...,
1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32),
array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
0. ], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32),
array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ...,
4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32),
array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
-0.0017666 , 0. ], dtype=float32),
array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ...,
6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32),
array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ...,
-2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32),
array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ...,
-1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00,
0.000000e+00, 0.000000e+00], dtype=float32),
array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ...,
4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ...,
-1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
1.2123945e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
-1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32),
array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ...,
1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32),
array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
4.0220186e-10, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))


y below



('08',
'08',
'06',
'05',
'05',
'04',
'06',
'07',
'01',
'04',
'03',
'07',
'03',
'01',
'03',
'03',
'02',
'02',
'02',
'02',
'05',
'06',
'04',
'08',
'07',
'06',
'04',
'05',
'07',
'02',
'08',
'01',
'08',
'03',
'08',
'02',
'03',
'06',
'04',
'07',
'04',
'07',
'05',
'06',
'08',
'08',
'04',
'05',
'05',
'04',
'06',
'07',
'05',
'07',
'01',
'06',
'02',
'02',
'03',
'03')


Code for the classifier plus the train/test split:



from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)


Error:



---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:

ValueError: setting an array element with a sequence.


EDIT1: I converted both X and y into numpy arrays but the error I am receiving is the same, details below



import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape


Output:



((60,), (60,))









share|improve this question




















  • 1





    check out this answer: stackoverflow.com/questions/36115472/…

    – Tyson
    Nov 17 '18 at 13:55






  • 1





    There is something wrong with your X or y. You should try first and report the result: import numpy as np X = np.array(X) print(X.shape) y = np.array(y) print(y.shape)

    – Luca Massaron
    Nov 17 '18 at 16:01













  • I was trying exactly that and this is the outcome after the conversion of both X and y in numpy arrays: X.shape, y.shape -> ((60,), (60,)),

    – Marco G. de Pinto
    Nov 17 '18 at 16:03








  • 1





    The problem is the X. Now just try: np.array(X).dtype

    – Luca Massaron
    Nov 17 '18 at 16:07






  • 1





    You X is a sequence of strings, that's the problem. You have to check it carefully because or there is a string in it or some of the arrays you put it has a different length than the others. I will post an answer for you.

    – Luca Massaron
    Nov 17 '18 at 16:10


















2















I have created features X and labels y for the dataset I am working on.



At this point, I want to train a random forest classifier on it but I am facing a ValueError while fitting the classifier on the training data: setting an array element with a sequence.



Below the X and y features and the error details:



X:



(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-0.00050612, -0.00057967, -0.00035985, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32),
array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ...,
1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32),
array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ...,
-8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32),
array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
8.2463991e-09, 0.0000000e+00], dtype=float32),
array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
-1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ...,
-1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32),
array([-0.00103188, -0.00075814, -0.00051426, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ...,
1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32),
array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ...,
1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32),
array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
0. ], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32),
array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ...,
4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32),
array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
-0.0017666 , 0. ], dtype=float32),
array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ...,
6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32),
array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ...,
-2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32),
array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ...,
-1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00,
0.000000e+00, 0.000000e+00], dtype=float32),
array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ...,
4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ...,
-1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
1.2123945e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
-1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32),
array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ...,
1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32),
array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
4.0220186e-10, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))


y below



('08',
'08',
'06',
'05',
'05',
'04',
'06',
'07',
'01',
'04',
'03',
'07',
'03',
'01',
'03',
'03',
'02',
'02',
'02',
'02',
'05',
'06',
'04',
'08',
'07',
'06',
'04',
'05',
'07',
'02',
'08',
'01',
'08',
'03',
'08',
'02',
'03',
'06',
'04',
'07',
'04',
'07',
'05',
'06',
'08',
'08',
'04',
'05',
'05',
'04',
'06',
'07',
'05',
'07',
'01',
'06',
'02',
'02',
'03',
'03')


Code for the classifier plus the train/test split:



from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)


Error:



---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:

ValueError: setting an array element with a sequence.


EDIT1: I converted both X and y into numpy arrays but the error I am receiving is the same, details below



import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape


Output:



((60,), (60,))









share|improve this question




















  • 1





    check out this answer: stackoverflow.com/questions/36115472/…

    – Tyson
    Nov 17 '18 at 13:55






  • 1





    There is something wrong with your X or y. You should try first and report the result: import numpy as np X = np.array(X) print(X.shape) y = np.array(y) print(y.shape)

    – Luca Massaron
    Nov 17 '18 at 16:01













  • I was trying exactly that and this is the outcome after the conversion of both X and y in numpy arrays: X.shape, y.shape -> ((60,), (60,)),

    – Marco G. de Pinto
    Nov 17 '18 at 16:03








  • 1





    The problem is the X. Now just try: np.array(X).dtype

    – Luca Massaron
    Nov 17 '18 at 16:07






  • 1





    You X is a sequence of strings, that's the problem. You have to check it carefully because or there is a string in it or some of the arrays you put it has a different length than the others. I will post an answer for you.

    – Luca Massaron
    Nov 17 '18 at 16:10
















2












2








2








I have created features X and labels y for the dataset I am working on.



At this point, I want to train a random forest classifier on it but I am facing a ValueError while fitting the classifier on the training data: setting an array element with a sequence.



Below the X and y features and the error details:



X:



(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-0.00050612, -0.00057967, -0.00035985, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32),
array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ...,
1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32),
array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ...,
-8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32),
array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
8.2463991e-09, 0.0000000e+00], dtype=float32),
array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
-1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ...,
-1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32),
array([-0.00103188, -0.00075814, -0.00051426, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ...,
1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32),
array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ...,
1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32),
array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
0. ], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32),
array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ...,
4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32),
array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
-0.0017666 , 0. ], dtype=float32),
array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ...,
6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32),
array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ...,
-2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32),
array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ...,
-1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00,
0.000000e+00, 0.000000e+00], dtype=float32),
array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ...,
4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ...,
-1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
1.2123945e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
-1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32),
array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ...,
1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32),
array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
4.0220186e-10, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))


y below



('08',
'08',
'06',
'05',
'05',
'04',
'06',
'07',
'01',
'04',
'03',
'07',
'03',
'01',
'03',
'03',
'02',
'02',
'02',
'02',
'05',
'06',
'04',
'08',
'07',
'06',
'04',
'05',
'07',
'02',
'08',
'01',
'08',
'03',
'08',
'02',
'03',
'06',
'04',
'07',
'04',
'07',
'05',
'06',
'08',
'08',
'04',
'05',
'05',
'04',
'06',
'07',
'05',
'07',
'01',
'06',
'02',
'02',
'03',
'03')


Code for the classifier plus the train/test split:



from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)


Error:



---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:

ValueError: setting an array element with a sequence.


EDIT1: I converted both X and y into numpy arrays but the error I am receiving is the same, details below



import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape


Output:



((60,), (60,))









share|improve this question
















I have created features X and labels y for the dataset I am working on.



At this point, I want to train a random forest classifier on it but I am facing a ValueError while fitting the classifier on the training data: setting an array element with a sequence.



Below the X and y features and the error details:



X:



(array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-0.00050612, -0.00057967, -0.00035985, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 6.8139506e-08, -2.3837963e-05, -2.4622474e-05, ...,
3.1678758e-06, -2.4535689e-06, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
6.9306935e-07, -6.6020442e-07, 0.0000000e+00], dtype=float32),
array([-7.30260945e-05, -1.18022966e-04, -1.08280736e-04, ...,
8.83421380e-05, 4.97258679e-06, 0.00000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 2.3406714e-05, 3.1186773e-05, 4.9467826e-06, ...,
1.2180173e-07, -9.2944845e-08, 0.0000000e+00], dtype=float32),
array([ 1.1845550e-06, -1.6399191e-06, 2.5565218e-06, ...,
-8.7445065e-09, 5.9859917e-09, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
5.0694009e-08, -3.4546797e-08, 0.0000000e+00], dtype=float32),
array([ 1.5591205e-07, -1.5845627e-07, 1.5362870e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 1.1608539e-05,
8.2463991e-09, 0.0000000e+00], dtype=float32),
array([-3.6192148e-07, -1.4590451e-05, -5.3999561e-06, ...,
-1.9935460e-05, -3.4417746e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5319534e-07, 2.6521766e-07, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.5055220e-08, 1.2936166e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.3387315e-05, 6.0913658e-07, -5.6471418e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.7200684e-02, 3.2272514e-02, 3.2961801e-02, ...,
-1.6286784e-06, -8.5592075e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-3.3923173e-11, 2.8026699e-11, 0.0000000e+00], dtype=float32),
array([-0.00103188, -0.00075814, -0.00051426, ..., 0. ,
0. , 0. ], dtype=float32),
array([ 7.6278877e-07, 2.1624428e-05, 1.1150542e-05, ...,
1.8263392e-09, -1.5558380e-09, 0.0000000e+00], dtype=float32),
array([-1.2111740e-07, 6.3130176e-07, -1.8378003e-06, ...,
1.1309878e-05, 5.4562256e-06, 0.0000000e+00], dtype=float32),
array([0.00026949, 0.00028119, 0.00020081, ..., 0.00032586, 0.00046612,
0. ], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.8796054e-09, 1.7431153e-08, 0.0000000e+00], dtype=float32),
array([1.42000988e-06, 1.30781755e-05, 2.77493709e-05, ...,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00], dtype=float32),
array([ 2.9161662e-10, -6.3629275e-11, -3.0565092e-10, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.2051008e-05, 1.6838792e-05, 3.5639907e-05, ...,
4.5767497e-06, -1.2002213e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.0104826e-10, 1.6824393e-10, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-4.8303300e-06, -1.2008861e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-2.7673337e-07, 2.8604177e-07, 0.0000000e+00], dtype=float32),
array([-0.00066044, -0.0009837 , -0.00090796, ..., -0.00171516,
-0.0017666 , 0. ], dtype=float32),
array([ 3.2218946e-11, -5.5296181e-11, 8.9530647e-11, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-1.3284328e-05, -7.4090644e-07, 7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 4.9886359e-05, 1.4642075e-04, 4.4365996e-04, ...,
6.3584002e-07, -6.2395281e-07, 0.0000000e+00], dtype=float32),
array([-3.2826196e-04, 4.5522624e-03, -8.2306744e-04, ...,
-2.2519816e-07, -6.2417300e-08, 0.0000000e+00], dtype=float32),
array([ 3.1686827e-04, 4.6282235e-04, 1.0160641e-04, ...,
-1.4605960e-05, 6.6572487e-05, 0.0000000e+00], dtype=float32),
array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ...,
-7.1763244e-09, -2.8297892e-08, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.5870585e-07, 4.6514080e-07, -9.5607948e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 5.788035e-07, -6.493598e-07, 7.111379e-07, ..., 0.000000e+00,
0.000000e+00, 0.000000e+00], dtype=float32),
array([ 2.5118000e-04, 1.4220485e-03, 3.9536849e-04, ...,
4.5242754e-04, -3.1405249e-05, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([ 1.1985266e-07, 2.1360799e-07, -1.1951373e-06, ...,
-1.3043609e-04, 1.2107374e-06, 0.0000000e+00], dtype=float32),
array([0.0000000e+00, 0.0000000e+00, 0.0000000e+00, ..., 2.5944988e-08,
1.2123945e-07, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32),
array([-2.4280996e-06, -1.2362683e-05, -8.5034850e-07, ...,
-1.0113516e-11, 5.1403621e-12, 0.0000000e+00], dtype=float32),
array([9.6098862e-05, 1.6449913e-04, 1.1942573e-04, ..., 0.0000000e+00,
0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 1.3284328e-05, 7.4090644e-07, -7.2679302e-07, ...,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype=float32),
array([ 2.4700081e-05, 2.9454704e-05, 8.0751715e-06, ...,
1.2746801e-07, -1.6574201e-06, 0.0000000e+00], dtype=float32),
array([8.4619669e-06, 9.7476968e-06, 2.0182479e-05, ..., 2.1081217e-11,
4.0220186e-10, 0.0000000e+00], dtype=float32),
array([0., 0., 0., ..., 0., 0., 0.], dtype=float32))


y below



('08',
'08',
'06',
'05',
'05',
'04',
'06',
'07',
'01',
'04',
'03',
'07',
'03',
'01',
'03',
'03',
'02',
'02',
'02',
'02',
'05',
'06',
'04',
'08',
'07',
'06',
'04',
'05',
'07',
'02',
'08',
'01',
'08',
'03',
'08',
'02',
'03',
'06',
'04',
'07',
'04',
'07',
'05',
'06',
'08',
'08',
'04',
'05',
'05',
'04',
'06',
'07',
'05',
'07',
'01',
'06',
'02',
'02',
'03',
'03')


Code for the classifier plus the train/test split:



from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree.fit(X_train, y_train)


Error:



---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-70-b6417fbfb8de> in <module>()
1 from sklearn.tree import DecisionTreeClassifier
2 dtree = DecisionTreeClassifier()
----> 3 dtree.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
788 sample_weight=sample_weight,
789 check_input=check_input,
--> 790 X_idx_sorted=X_idx_sorted)
791 return self
792

/usr/local/lib/python3.6/dist-packages/sklearn/tree/tree.py in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
114 random_state = check_random_state(self.random_state)
115 if check_input:
--> 116 X = check_array(X, dtype=DTYPE, accept_sparse="csc")
117 y = check_array(y, ensure_2d=False, dtype=None)
118 if issparse(X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:

ValueError: setting an array element with a sequence.


EDIT1: I converted both X and y into numpy arrays but the error I am receiving is the same, details below



import numpy as np
X = np.asarray(X)
y = np.asarray(y)


X.shape, y.shape


Output:



((60,), (60,))






python machine-learning scikit-learn random-forest






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 '18 at 16:05







Marco G. de Pinto

















asked Nov 17 '18 at 13:41









Marco G. de PintoMarco G. de Pinto

1701215




1701215








  • 1





    check out this answer: stackoverflow.com/questions/36115472/…

    – Tyson
    Nov 17 '18 at 13:55






  • 1





    There is something wrong with your X or y. You should try first and report the result: import numpy as np X = np.array(X) print(X.shape) y = np.array(y) print(y.shape)

    – Luca Massaron
    Nov 17 '18 at 16:01













  • I was trying exactly that and this is the outcome after the conversion of both X and y in numpy arrays: X.shape, y.shape -> ((60,), (60,)),

    – Marco G. de Pinto
    Nov 17 '18 at 16:03








  • 1





    The problem is the X. Now just try: np.array(X).dtype

    – Luca Massaron
    Nov 17 '18 at 16:07






  • 1





    You X is a sequence of strings, that's the problem. You have to check it carefully because or there is a string in it or some of the arrays you put it has a different length than the others. I will post an answer for you.

    – Luca Massaron
    Nov 17 '18 at 16:10
















  • 1





    check out this answer: stackoverflow.com/questions/36115472/…

    – Tyson
    Nov 17 '18 at 13:55






  • 1





    There is something wrong with your X or y. You should try first and report the result: import numpy as np X = np.array(X) print(X.shape) y = np.array(y) print(y.shape)

    – Luca Massaron
    Nov 17 '18 at 16:01













  • I was trying exactly that and this is the outcome after the conversion of both X and y in numpy arrays: X.shape, y.shape -> ((60,), (60,)),

    – Marco G. de Pinto
    Nov 17 '18 at 16:03








  • 1





    The problem is the X. Now just try: np.array(X).dtype

    – Luca Massaron
    Nov 17 '18 at 16:07






  • 1





    You X is a sequence of strings, that's the problem. You have to check it carefully because or there is a string in it or some of the arrays you put it has a different length than the others. I will post an answer for you.

    – Luca Massaron
    Nov 17 '18 at 16:10










1




1





check out this answer: stackoverflow.com/questions/36115472/…

– Tyson
Nov 17 '18 at 13:55





check out this answer: stackoverflow.com/questions/36115472/…

– Tyson
Nov 17 '18 at 13:55




1




1





There is something wrong with your X or y. You should try first and report the result: import numpy as np X = np.array(X) print(X.shape) y = np.array(y) print(y.shape)

– Luca Massaron
Nov 17 '18 at 16:01







There is something wrong with your X or y. You should try first and report the result: import numpy as np X = np.array(X) print(X.shape) y = np.array(y) print(y.shape)

– Luca Massaron
Nov 17 '18 at 16:01















I was trying exactly that and this is the outcome after the conversion of both X and y in numpy arrays: X.shape, y.shape -> ((60,), (60,)),

– Marco G. de Pinto
Nov 17 '18 at 16:03







I was trying exactly that and this is the outcome after the conversion of both X and y in numpy arrays: X.shape, y.shape -> ((60,), (60,)),

– Marco G. de Pinto
Nov 17 '18 at 16:03






1




1





The problem is the X. Now just try: np.array(X).dtype

– Luca Massaron
Nov 17 '18 at 16:07





The problem is the X. Now just try: np.array(X).dtype

– Luca Massaron
Nov 17 '18 at 16:07




1




1





You X is a sequence of strings, that's the problem. You have to check it carefully because or there is a string in it or some of the arrays you put it has a different length than the others. I will post an answer for you.

– Luca Massaron
Nov 17 '18 at 16:10







You X is a sequence of strings, that's the problem. You have to check it carefully because or there is a string in it or some of the arrays you put it has a different length than the others. I will post an answer for you.

– Luca Massaron
Nov 17 '18 at 16:10














1 Answer
1






active

oldest

votes


















1














It appears that the problem is your X. Probably one of the arrays constituting it has a different length, that causes the tuple that you have build, and that is transformed into a Numpy array by Scikit-learn when processed by the DecisionTreeClassifier, to transform into a vector of strings, which are not what the decision tree function expects to process.



Just check this code snippet:



X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "nX2:", np.array(X2).dtype)


By just changing the second element of X2 with the addition of a further number causes the X2 array to turn into a string array (object type).






share|improve this answer



















  • 1





    Thank you Luca!

    – Marco G. de Pinto
    Nov 17 '18 at 16:21











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53351781%2fvalueerror-while-fitting-decision-tree-classifier-on-a-dataset%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














It appears that the problem is your X. Probably one of the arrays constituting it has a different length, that causes the tuple that you have build, and that is transformed into a Numpy array by Scikit-learn when processed by the DecisionTreeClassifier, to transform into a vector of strings, which are not what the decision tree function expects to process.



Just check this code snippet:



X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "nX2:", np.array(X2).dtype)


By just changing the second element of X2 with the addition of a further number causes the X2 array to turn into a string array (object type).






share|improve this answer



















  • 1





    Thank you Luca!

    – Marco G. de Pinto
    Nov 17 '18 at 16:21
















1














It appears that the problem is your X. Probably one of the arrays constituting it has a different length, that causes the tuple that you have build, and that is transformed into a Numpy array by Scikit-learn when processed by the DecisionTreeClassifier, to transform into a vector of strings, which are not what the decision tree function expects to process.



Just check this code snippet:



X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "nX2:", np.array(X2).dtype)


By just changing the second element of X2 with the addition of a further number causes the X2 array to turn into a string array (object type).






share|improve this answer



















  • 1





    Thank you Luca!

    – Marco G. de Pinto
    Nov 17 '18 at 16:21














1












1








1







It appears that the problem is your X. Probably one of the arrays constituting it has a different length, that causes the tuple that you have build, and that is transformed into a Numpy array by Scikit-learn when processed by the DecisionTreeClassifier, to transform into a vector of strings, which are not what the decision tree function expects to process.



Just check this code snippet:



X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "nX2:", np.array(X2).dtype)


By just changing the second element of X2 with the addition of a further number causes the X2 array to turn into a string array (object type).






share|improve this answer













It appears that the problem is your X. Probably one of the arrays constituting it has a different length, that causes the tuple that you have build, and that is transformed into a Numpy array by Scikit-learn when processed by the DecisionTreeClassifier, to transform into a vector of strings, which are not what the decision tree function expects to process.



Just check this code snippet:



X1 = (array([-8.1530527e-10,  8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

X2 = (array([-8.1530527e-10, 8.9952795e-10, -9.1185753e-10,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00], dtype='float32'),
array([0., 0., 0., 0., 0., 0., 1], dtype='float32'),
array([0., 0., 0., 0., 0., 0.], dtype='float32'))

print("X1:", np.array(X1).dtype, "nX2:", np.array(X2).dtype)


By just changing the second element of X2 with the addition of a further number causes the X2 array to turn into a string array (object type).







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 '18 at 16:13









Luca MassaronLuca Massaron

670415




670415








  • 1





    Thank you Luca!

    – Marco G. de Pinto
    Nov 17 '18 at 16:21














  • 1





    Thank you Luca!

    – Marco G. de Pinto
    Nov 17 '18 at 16:21








1




1





Thank you Luca!

– Marco G. de Pinto
Nov 17 '18 at 16:21





Thank you Luca!

– Marco G. de Pinto
Nov 17 '18 at 16:21


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53351781%2fvalueerror-while-fitting-decision-tree-classifier-on-a-dataset%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







這個網誌中的熱門文章

Tangent Lines Diagram Along Smooth Curve

Yusuf al-Mu'taman ibn Hud

Zucchini