With these it helps imblearn.SMOTE has no transform method. Docs is here. But all steps except the last in a pipeline should have it, along with fit.
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as imbPipeline
# This doesn't work with sklearn.pipeline.Pipeline because
# SMOTE doesn't have a .tranform() method.
# (It has .fit_sample() or .sample().)
pipe = imbPipeline([
it helps some times The only difference is that make_pipeline generates names for steps automatically. Step names are needed e.g. if you want to use a pipeline with model selection utilities (e.g. GridSearchCV). With grid search you need to specify parameters for various steps of a pipeline:
With these it helps It seems that the output by printing the pipeline directly is truncated, and doesn't show the whole output. For example, the arguments shuffle, tol, validation_fraction, verbose, and warm_start belongs to the SGDClassifier. As you have found yourself in the comments, to avoid truncation, you can print the steps directly using pipeline.steps.
Cross Validating With Imblearn Pipeline And GridSearchCV
I wish did fix the issue. It does not necessarily make sense to include feature selection in a pipeline where your model is a random forest(RF). This is because the max_depth and max_features arguments of the RF model essentially control the amounts of features included when building the individual trees (the max depth of n just says that each tree in your forest will be built for n nodes, each with a split consisting of a combination of max_features amount of features). Check https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. You can simply investigate your trained model for the top ranked features. When training an individual tree, it can be computed how much each feature decreases the weighted impurity in a tree. For a forest, the impurity decrease from each feature can be averaged and the features are ranked according to this measure. So then you actually don't need to retrain the forest for different feature sets, because the feature importance (already computed in the sklearn model) tells you all the info you'd need.