Fetching the website with Jsoup - page view source and Jsoup shows different content
Tag : java , By : ChaseVoid
Date : March 29 2020, 07:55 AM
hope this fix your issue I use Jsoup to scrap the website: , Short answer Jsoup can't execute the Javascript. Long answer http://www.yelp.com/search?find_desc=restaurant&find_loc=willowbrook%2C+IL&ns=1#l=p:IL:Willowbrook::&sortby=rating&rpp=40
|
Getting element with no attributes using Jsoup
Date : March 29 2020, 07:55 AM
it should still fix some issue I think you can use the JSOUP CSS selector p:not([^]), which would select any p that does not match having an attribute starting with anything. String html = "<div id=\"intro\">"
+ "<h1 class=\"some class\">"
+ "<p id=\"some_id\">"
+ "Some text 1"
+ "</p>"
+ "<p name=\"some_name\">"
+ "Some text A"
+ "</p>"
+ "<p data>"
+ "Some text B"
+ "</p>"
+"<p>"
+ "Some text 2"
+"</p>"
+"</div> ";
Document doc = Jsoup.parse(html);
Elements els = doc.select("p:not([^])");
for (Element el:els){
System.out.println(el.text());
}
Some text 2
|
Get Some Attributes with JSoup
Date : November 17 2020, 09:01 AM
I wish this helpful for you I was having some practices with programming, and I got stuck (also because of my lacking knowledge of web programming) in this part: I was to get some information from this page: http://db.fowtcg.us/index.php?p=card&code=VS01-003+R , but only the card properties, and I'm struggling a little with JSoup, I was able to fetch the data with: , Here, use this instead: Elements property = doc.select("div.col-xs-12.col-sm-7.box.card-props");
|
method filling array with Jsoup not waiting for jsoup to complete website request
Tag : java , By : user179271
Date : March 29 2020, 07:55 AM
wish helps you Jsoup.connect().get() is a synchronous call, so when it returns it will have connected and retrieved the response. The issue in your code - which you have correctly identifed as being somehow related to 'waiting for something to finish' - is because you invoke Jsoup.connect().get() inside a separate thread and then you do not wait for that thread to complete before attempting to use what Jsoup returns. At this line: .start()
for (int i = rawHours.size() - 1; i >= 0; i--) {
hrrrLabels[23 - i] = rawHours.get(i);
}
...
Thread t = new Thread(new Runnable() {
...
});
t.start();
// wait for the 'Jsoup thread' to complete before continuing
t.join();
for (int i = rawHours.size() - 1; i >= 0; i--) {
hrrrLabels[23 - i] = rawHours.get(i);
}
...
hrrrLabels = new String[24];
final LinkedList<String> rawHours = new LinkedList<>();
final StringBuilder builder = new StringBuilder();
try {
Document doc =
Jsoup.connect("http://mag.ncep.noaa.gov/model-guidance-model-parameter.php?group=Model%20Guidance&model=HRRR&area=CONUS&ps=model").get();
Elements links = doc.select("tr");
int superi = 0;
for (int i = 22; i < 26; i++) {
Element link = links.get(i);
Elements lin = link.select("td");
Element time;
for (int j = 0; j < lin.size(); j++) {
time = lin.get(j);
rawHours.add(time.text());
builder.append(time.text() + "\n");
}
superi++;
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}
for (int i = rawHours.size() - 1; i >= 0; i--) {
hrrrLabels[23 - i] = rawHours.get(i);
}
String[] SplitTime;
String[] hrrrTimes = new String[hrrrLabels.length];
System.out.println("rewtimes, length=" + hrrrLabels.length);
for (int i = 0; i < hrrrLabels.length; i++) {
System.out.println("rewtimes, i=" + i + " :" + hrrrTimes[i]);
SplitTime = hrrrLabels[i].split(" ");
hrrrTimes[i] = SplitTime[1].substring(0, 2);
}
|
Problem scraping website using Java Jsoup, website not "scrolling"
Tag : java , By : user183275
Date : March 29 2020, 07:55 AM
Hope that helps I would suggest opening the browser's developer tab for trying to find out which url/endpoint the website uses for fetching new items for the infinite scroll, as JSoup does not execute Javascript itself. Then you can call the endpoint with JSoup and parse the results. In case it does not work, It would be probably better to move to HtmlUnit or Selenium as both of them are full-featured web browser APIs which you can control with Java.
|