Linear Regression\Gradient Descent python implementation
Tag : python , By : tanminivan
Date : March 29 2020, 07:55 AM
will be helpful for those in need What is happening is that python is computing the list zip(x,y), then each iteration of your for loop is overwriting (x,y) with the corresponding element of zip(x,y). When your for loop terminates (x,y) contains zip(x,y)[1]. Try theta[0] = theta[0]  alpha*1/m*sum([((theta[0]+theta[1]*xi)  yi)**2 for (xi,yi) in zip(x,y)])

Issue with gradient descent implementation of linear regression
Date : March 29 2020, 07:55 AM
like below fixes the issue Try decreasing the value of eta. Gradient descent can diverge if eta is too high.

Coefficient paths for Ridge Regression in scikitlearn
Date : March 29 2020, 07:55 AM
Hope that helps You need to have as many weights w as you have number of features (since you learn a single weight per feature), but in your code the dimension of the weight vector is 774 (which is number of rows in the training dataset), that's why it did not work. Modify the code to the following (to have 4 weights instead) and everything will work: w = np.full((4,), 3, dtype=float) # number of features = 4, namely p1, p2, p3, p4
print X.shape, type(X), y.shape, type(y), w.shape, type(w)
#(774L, 4L) <type 'numpy.ndarray'> (774L,) <type 'numpy.ndarray'> (4L,) <type 'numpy.ndarray'>

Linear Regression updation of bias term and coefficient
Date : March 29 2020, 07:55 AM
like below fixes the issue You don't need iteration or gradient descent on a simple linear regression with only one (or a few) features. You can just use the normal equation. This, however, doesn't scale when you have many features because finding the inverse of a large matrix is expensive. It's not uncommon in machine learning to have problems with hundreds (or even thousands) of features.

Linear regression on 415 files, output just filename, regression coefficient, significance
Date : March 29 2020, 07:55 AM
Does that help I am a beginner in R, I am learning the basics to analyse some biological data. I have 415 .csv files, each is a fungal species. Each file has 5 columns  (YEAR, FFD, LFD, MEAN, RANGE) , First I simulate like 5 csv files, with columns that look like yours: for(i in 1:5){
tab=data.frame(
YEAR=1950:2014,
FFD= rpois(65,100),
LFD= rnorm(65,100,10),
RAN= rnbinom(65,mu=100,size=1),
MEAN = runif(65,min=50,max=150)
)
write.csv(tab,paste0("data",i,".csv"))
}
csvfiles = dir(pattern="data[09]*.csv$")
library(dplyr)
library(purrr)
library(broom)
csvfiles %>%
map_df(function(i){df = read.csv(i);df$data = i;df}) %>%
group_by(data) %>%
do(tidy(lm(FFD ~ YEAR,data=.))) %>%
filter(term!="(Intercept)")
# A tibble: 5 x 6
# Groups: data [5]
data term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 data1.csv YEAR 0.0228 0.0731 0.311 0.756
2 data2.csv YEAR 0.139 0.0573 2.42 0.0182
3 data3.csv YEAR 0.175 0.0650 2.70 0.00901
4 data4.csv YEAR 0.0478 0.0628 0.762 0.449
5 data5.csv YEAR 0.0204 0.0648 0.315 0.754

