logo
down
shadow

Feature selection with LinearSVC


Feature selection with LinearSVC

Content Index :

Feature selection with LinearSVC
Tag : python , By : Dennizzz
Date : November 28 2020, 08:01 AM


Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

scikit-learn - how to force selection of at least a single label in LinearSVC


Tag : python , By : user91848
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , to "lejlot", this was extremely close to what I wanted. I didn't want to override the cases where I had one or more predictions though. This is what I came up with that seems to be working:
lb = preprocessing.MultiLabelBinarizer()
Y = lb.fit_transform(y_train_text)

classifier = Pipeline([
    ('vectorizer', CountVectorizer(stop_words="english")),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])

classifier.fit(X_train, Y)
predicted = classifier.predict(X_test)
x = classifier.decision_function(X_test)
predicted_all = sp.sign(x - x.max(1).reshape(x.shape[0], 1) + 1e-20)
predicted_all = (predicted_all + 1)/2
for i in range(0, len(predicted)):
    #if we never came up with a prediction, use our "forced" single prediction
    if (all(v == 0 for v in predicted[i])):
        predicted[i] = predicted_all[i]
all_labels = lb.inverse_transform(predicted)

Using feature selection with LinearSVC in python


Tag : python , By : Vodkat
Date : March 29 2020, 07:55 AM
seems to work fine Declare it the same way you did in for NLTKPreprocessor but just above the classifier inside the pipeline.
Declare your pipeline as below:
model = Pipeline([
        ('preprocessor', NLTKPreprocessor()),
        ('vectorizer', TfidfVectorizer(
            tokenizer=identity, preprocessor=None, ngram_range = (1,2), min_df = 4, lowercase=False
        )),
        ('selector', SelectKBest(chi2, k=10)),
        ('classifier', classifier),
    ])

LinearSVC Feature Selection returns different coef_ in Python


Tag : python , By : Timbo
Date : March 29 2020, 07:55 AM
With these it helps Since you set dual=False, you should be getting the same coefficients. What is your sklearn version?
Run this and check if you get the same output:
from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification

X, y = make_classification(n_features=4, random_state=0)
for i in range(10):
    lsvc = LinearSVC(C=.01, penalty="l1", dual= False).fit(X, y)
    sscores = lsvc.coef_[0]
    print(sscores)
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]
[0.         0.         0.27073732 0.        ]

Error in Feature selection with Recursive feature elimination in random forest model


Tag : r , By : PepeM
Date : March 29 2020, 07:55 AM
hop of those help? sizes refer to the number of features you would like to try and retain, it should numeric but you provided something weird in df[,1:1002].
See something like below, where i simulate a dataset and setting the sizes correctly ensures it runs along to choose the optimal number of features (from what you provide):
library(caret)
library(mlbench)
library(Hmisc)
library(randomForest) 

set.seed(101)
df = data.frame(samples=paste0("Samples",1:99),
                Class=paste0("Class",rep(1:3,33)),
                matrix(rnorm(99*1000),ncol=1000))

colnames(df)[3:ncol(df)]=paste0("Gene",1:1000)

# we create like 100 informative genes for Class1 and Class2
df[df$Class=="Class1",3:103] = df[df$Class=="Class1",3:103] + rpois(33*100,1.5)
df[df$Class=="Class2",104:203] = df[df$Class=="Class2",104:203] + rpois(33*100,1.5)

control <- rfeControl(functions=rfFuncs, method="cv", number=2)

# run the RFE algorithm
results <- rfe(df[,3:1002], df[,2], sizes = c(50,100,200), 
               rfeControl=control)
results
Recursive feature selection

Outer resampling method: Cross-Validated (2 fold) 

Resampling performance over subset size:

 Variables Accuracy  Kappa AccuracySD KappaSD Selected
        50   0.9792 0.9688    0.02946 0.04419         
       100   0.9896 0.9844    0.01473 0.02210         
       200   1.0000 1.0000    0.00000 0.00000        *
      1000   1.0000 1.0000    0.00000 0.00000         

The top 5 variables (out of 200):
   Gene94, Gene198, Gene137, Gene136, Gene158

> results$optsize
[1] 200

Difference between feature selection, feature extraction, feature weights


Tag : development , By : sayuki288
Date : March 29 2020, 07:55 AM
Related Posts Related QUESTIONS :
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • Why does this not run into an infinite loop?
  • Python Multithreading no current event loop
  • Element Tree - Seaching for specific element value without looping
  • Ignore Nulls in pandas map dictionary
  • How do I get scrap data from web pages using beautifulsoup in python
  • Variable used, golobal or local?
  • I have a regex statement to pull all numbers out of a text file, but it only finds 77 out of the 81 numbers in the file
  • How do I create a dataframe of jobs and companies that includes hyperlinks?
  • Detect if user has clicked the 'maximized' button
  • Does flask_login automatically set the "next" argument?
  • Indents in python 3
  • How to create a pool of threads
  • Pandas giving IndexError on one dataframe but not on another similar dataframe
  • Django Rest Framework - Testing client.login doesn't login user, ret anonymous user
  • Running dag without dag file in airflow
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com