What is the difference between Gradient Descent and Newton's Gradient Descent?
Date : March 29 2020, 07:55 AM
Any of those help At a local minimum (or maximum) x, the derivative of the target function f vanishes: f'(x) = 0 (assuming sufficient smoothness of f). Gradient descent tries to find such a minimum x by using information from the first derivative of f: It simply follows the steepest descent from the current point. This is like rolling a ball down the graph of f until it comes to rest (while neglecting inertia).
|
Stochastic gradient descent from gradient descent implementation in R
Date : March 29 2020, 07:55 AM
should help you out I have a working implementation of multivariable linear regression using gradient descent in R. I'd like to see if I can use what I have to run a stochastic gradient descent. I'm not sure if this is really inefficient or not. For example, for each value of α I want to perform 500 SGD iterations and be able to specify the number of randomly picked samples in each iteration. It would be nice to do this so I could see how the number of samples influences the results. I'm having trouble through with the mini-batching and I want to be able to easily plot the results. , Sticking with what you have now ## all of this is the same
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3x.dat", "ex3x.dat", method="curl")
x <- read.table('ex3x.dat')
x <- scale(x)
download.file("https://raw.githubusercontent.com/dbouquin/IS_605/master/sgd_ex_data/ex3y.dat", "ex3y.dat", method="curl")
y <- read.table('ex3y.dat')
data3 <- cbind(x,y)
colnames(data3) <- c("area_sqft", "bedrooms","price")
x1 <- rep(1, length(data3$area_sqft))
x <- as.matrix(cbind(x1,x))
y <- as.matrix(y)
L <- length(y)
cost <- function(x,y,theta){
gradient <- (1/L)* (t(x) %*% ((x%*%t(theta)) - y))
return(t(gradient))
}
GD <- function(x, y, alpha){
theta <- matrix(c(0,0,0), nrow=1)
theta_r <- NULL
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
myGoD <- function(x, y, alpha, n = nrow(x)) {
idx <- sample(nrow(x), n)
y <- y[idx, , drop = FALSE]
x <- x[idx, , drop = FALSE]
GD(x, y, alpha)
}
all.equal(GD(x, y, 0.001), myGoD(x, y, 0.001))
# [1] TRUE
set.seed(1)
head(myGoD(x, y, 0.001, n = 20), 2)
# x1 V1 V2
# V1 147.5978 82.54083 29.26000
# V1 295.1282 165.00924 58.48424
set.seed(1)
head(myGoD(x, y, 0.001, n = 40), 2)
# x1 V1 V2
# V1 290.6041 95.30257 59.66994
# V1 580.9537 190.49142 119.23446
alphas <- c(0.001,0.01,0.1,1.0)
ns <- c(47, 40, 30, 20, 10)
par(mfrow = n2mfrow(length(alphas)))
for(i in 1:length(alphas)) {
# result <- myGoD(x, y, alphas[i]) ## original
result <- myGoD(x, y, alphas[i], ns[i])
# red = price
# blue = sq ft
# green = bedrooms
plot(result[,1],ylim=c(min(result),max(result)),col="#CC6666",ylab="Value",lwd=0.35,
xlab=paste("alpha=", alphas[i]),xaxt="n") #suppress auto x-axis title
lines(result[,2],type="b",col="#0072B2",lwd=0.35)
lines(result[,3],type="b",col="#66CC99",lwd=0.35)
}
GD <- function(x, y, alpha, n = nrow(x)){
idx <- sample(nrow(x), n)
y <- y[idx, , drop = FALSE]
x <- x[idx, , drop = FALSE]
theta <- matrix(c(0,0,0), nrow=1)
theta_r <- NULL
for (i in 1:500) {
theta <- theta - alpha*cost(x,y,theta)
theta_r <- rbind(theta_r,theta)
}
return(theta_r)
}
|
TensorFlow weights increasing when using the full dataset for the gradient descent
Tag : python , By : dyarborough
Date : March 29 2020, 07:55 AM
wish helps you The problem isn't the Optimizer, it's your loss. It should return the mean loss, not the sum. If you're doing an L2 regression, for instance, it should look like this: l_value = tf.pow(tf.abs(ground_truth - predict), 2) # distance for each individual position of the output matrix of shape = (n_examples, example_data_size)
regression_loss = tf.reduce_sum(l_value, axis=1) # distance per example, shape = (n_examples, 1)
total_regression_loss = tf.reduce_mean(regression_loss) # mean distance of all examples, shape = (1)
|
Keras MNIST Gradient Descent Stuck / Learning Very Slowly
Date : March 29 2020, 07:55 AM
I wish this help you You are using a ReLU activation which basically cuts off the activations below 0, and using a default random_normal initialisation which has the parameters keras.initializers.RandomUniform(minval=-0.05, maxval=0.05, seed=None) by default. As you can see, the initialisation values are very close to 0 and half of them (-.05 to 0) don't get activated at all. And the ones that do get activated (0 to 0.05) propagate the gradients very very slowly. My guess is to change the initialisation to be around 0 and n (which is the operating range for ReLUs) and your model should converge quickly.
|
What is gradient descent.does gradient descent can give better result than sklearn linear regression algorithm
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , https://scikit-learn.org/stable/modules/sgd.html if you want to use Gradient Descent approach, you should consider using SDRClassifier in SKlearn because SKlearn gives two Approaches to using Linear Regression. The first is LinearRegression class and is using Ordinary Least Squares solver from scipy the other one is SDRClassifier class which is an Implementation of the Gradient Descent Algorithm. So to answer your Question if you are using SDRClassifier in SKlearn then you are using an Implementation of Gradient Descent Algorithm behind the Scene.
|