-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Closed
Description
Why there is overlap between dev_idx and t_idx in the following code? It should have been no overlap.
```
train_test_split = StratifiedShuffleSplit(labels, n_iter=1, test_size=0.2, random_state=0)
for train_idx, test_idx in train_test_split:
train_tmp = set(train_idx)
test_tmp = set(test_idx)
assert_equal(train_tmp.intersection(test_tmp), set())
X_train = np.copy(feats[train_idx])
y_train = np.copy(labels[train_idx])
trans_train = np.copy(trans[train_idx])
X_valid = np.copy(feats[test_idx])
y_valid = np.copy(labels[test_idx])
trans_valid = np.copy(trans[test_idx])
del feats
del labels
del trans
dev_test_split = StratifiedShuffleSplit(y_valid, n_iter=1, test_size=0.5, random_state=0)
for dev_idx, t_idx in dev_test_split:
dev_tmp = set(dev_idx)
t_tmp = set(t_idx)
assert_equal(dev_tmp.intersection(t_tmp), set())
X_dev = np.copy(X_valid[dev_idx])
y_dev = np.copy(y_valid[dev_idx])
trans_dev = np.copy(trans_valid[dev_idx])
X_test = np.copy(X_valid[t_idx])
y_test = np.copy(y_valid[t_idx])
trans_test = np.copy(trans_valid[t_idx])
del X_valid
del y_valid
del trans_valid
```
The second assert_equal() test prompted a error as follows:
assert_equal(dev_tmp.intersection(t_tmp), set())
File "/home/xyang45/miniconda2/lib/python2.7/unittest/case.py", line 513, in assertEqual
assertion_func(first, second, msg=msg)
File "/home/xyang45/miniconda2/lib/python2.7/unittest/case.py", line 796, in assertSetEqual
self.fail(self._formatMessage(msg, standardMsg))
File "/home/xyang45/miniconda2/lib/python2.7/unittest/case.py", line 410, in fail
raise self.failureException(msg)
AssertionError: Items in the first set but not the second:
1160
1161
907
1070
1747
2232
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels