Issue with div`s in Scraping with Python and Beautiful Soup

Issue with div`s in Scraping with Python and Beautiful Soup

Content Index :

Issue with div`s in Scraping with Python and Beautiful Soup
Tag : python-3.x , By : DK.
Date : January 11 2021, 05:14 PM

may help you . I'm learning a bit of web scraping and I'm having trouble accessing to the list I want to go address. , You can perform a find for that particular div from the container.
item_branding_div = container.find('div', {'class': 'item-branding'}) 

No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

Issue scraping with Beautiful Soup

Tag : python , By : Antony Briggs
Date : March 29 2020, 07:55 AM
it fixes the issue
but I want to know why I am getting a gif accesing the url like that and when I access it via my browser I get the website perfectly.
>>> import urllib2
>>> opener = urllib2.build_opener()
>>> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>>> url = "http://www.weatheronline.co.uk/weather/maps/current?LANG=en&DATE=1354104000&CONT=euro&LAND=UK&KEY=UK&SORT=1&INT=06&TYP=sonne&ART=tabelle&RUBRIK=akt&R=310&CEL=C"
>>> response = opener.open(url)
>>> page = response.read()
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(page)

Issue in scraping data from a website using beautiful soup

Tag : python-2.7 , By : Myatus
Date : March 29 2020, 07:55 AM
seems to work fine You can use a comparator function, to make custom comparison and pass it to your findAll().
So if you modify your line with prices assignment to:
prices = soup.findAll('div', class_=match_both)
def match_both(arg):
    if arg == "listGrid-price" or arg == "listGrid-price-outOfStock":
        return True
    return False
# -*- coding: cp1252 -*-
import csv
import urllib2
import sys
import time
from bs4 import BeautifulSoup

def match_both(arg):
    if arg == "listGrid-price" or arg == "listGrid-price-outOfStock":
        return True
    return False

def not_review(arg):
    if not arg:
        return arg
    return "Write a review" not in arg

page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.deviceListGridView.xhr.flowtype-NEW.deviceGroupType-Cellphone.paymentType-postpaid.packageType-undefined.html?taxoStyle=SMARTPHONES&showMoreListSize=1000').read()
soup = BeautifulSoup(page)
with open('AT&T_2012-12-28.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')
    spamwriter.writerow(["Date","Month","Day of Week","Device Name","Price"])
    items = soup.findAll('a', {"class": "clickStreamSingleItem"},text=not_review)
    prices = soup.findAll('div', class_=match_both)
    for item, price in zip(items, prices):
        textcontent = u' '.join(price.stripped_strings)
        if textcontent:
                spamwriter.writerow([time.strftime("%Y-%m-%d"),time.strftime("%B"),time.strftime("%A") ,unicode(item.string).encode('utf8').replace('™','').replace('®','').strip(),textcontent])

Issue with html tags while scraping data using beautiful soup

Tag : python-2.7 , By : user181706
Date : March 29 2020, 07:55 AM
it fixes the issue The page uses a large JavaScript structure to load the prices. You can load just that structure:
scripts = soup.find_all('script')
script = next(s.text for s in scripts if s.string and 'window.rates' in s.string)
datastring = script.split('phones=')[1].split(';window.')[0]
{sku844082:{name:"Samsung Galaxy SII",image:"/images/m677391_300468.jpg",deliveryTime:"Vorauss. verfügbar ab Anfang Januar",sku1444291:{p:"prod954312",e:"19.90"},sku1444286:{p:"prod954312",e:"19.90"},sku1444283:{p:"prod954312",e:"39.90"},sku1444275:{p:"prod954312",e:"59.90"},sku1104261:{p:"prod954312",e:"99.90"}},sku894279:{name:"BlackBerry Torch 9810",image:"/images/m727477_300464.jpg",deliveryTime:"Lieferbar innerhalb 48 Stunden",sku1444275:{p:"prod1004495",e:"179.90"},sku1104261:{p:"prod1004495",e:"259.90"},sku1444291:{p:"prod1004495",e:"29.90"},sku1444286:{p:"prod1004495",e:"29.90"},sku1444283:{p:"prod1004495",e:"49.90"}},sku864221:{name:"BlackBerry Bold 9900",image:"/images/m707491_300465.jpg",deliveryTime:"Lieferbar innerhalb 48 Stunden",sku1444275:{p:"prod974431",e:"129.90"},sku1104261:{p:"prod974431",e:"169.90"},sku1444291:{p:"prod974431",e:"49.90"},sku1444286:{p:"prod974431",e:"49.90"},sku1444283:{p:"prod974431",e:"89.90"}}
import re
import json

datastring = re.sub(ur'([{,])([a-z]\w*):', ur'\1"\2":', datastring)
data = json.loads(datastring)
>>> from pprint import pprint
>>> pprint(data['sku864221'])
{u'deliveryTime': u'Lieferbar innerhalb 48 Stunden',
 u'image': u'/images/m707491_300465.jpg',
 u'name': u'BlackBerry Bold 9900',
 u'sku1104261': {u'e': u'169.90', u'p': u'prod974431'},
 u'sku1444275': {u'e': u'129.90', u'p': u'prod974431'},
 u'sku1444283': {u'e': u'89.90', u'p': u'prod974431'},
 u'sku1444286': {u'e': u'49.90', u'p': u'prod974431'},
 u'sku1444291': {u'e': u'49.90', u'p': u'prod974431'}}

Issue with scraping data using beautiful soup

Tag : python , By : gopal
Date : March 29 2020, 07:55 AM
I wish this helpful for you Rewrote the thing from scratch. There's no comments, but it's pretty self explanatory. The lambda in the dictionary is for finding attributes that start with a certain string. I referenced this answer for that: https://stackoverflow.com/a/2830550/541208
I had thought that you were using findAll on soup, when you should have been using plan.findAll instead, but then it didn't help anything, so I just rewrote the whole thing.
import urllib2
import sys
from bs4 import BeautifulSoup

page = urllib2.urlopen('http://www.att.com/shop/wireless/plans-new.html#fbid=U-XD_DHOGEp').read()
soup = BeautifulSoup(page)

#find the container for all the plans
tabcontent = soup.find('div', {"id": "smartphonePlans", "class": "tabcontent"})
containers = tabcontent.findAll('div', {"class": "innerContainer"})

for plan in containers:
     planTitle = plan.find("div", {"class": "planTitle"})
     if planTitle:
          title = planTitle.find("a").text     
          print title          

     voiceBoxes = plan.find("div", {"class": "whiteBox"})     
     if voiceBoxes:
               box3 = voiceBoxes.findAll("div", {"class": lambda x: x and x.startswith("boxes_")})
               if box3:
                    for box in box3:
                         top = box.findAll("p")
                         minutes = u" ".join([tag.text for tag in top])
                         print "\t", minutes
AT&T Individual Plans
    450 Minutes $39.99/mo.
    900 Minutes $59.99/mo.
    Unlimited Minutes $69.99/mo.
AT&T Family Plans
    550 Minutes $59.99/mo.
    700 Minutes $69.99/mo.
    1,400 Minutes $89.99/mo.
    2,100 Minutes $109.99/mo.
    Unlimited Minutes $119.99/mo.
AT&T Mobile Share Plans
    1GB $40/mo. + $45/smartphone
    4GB $70/mo. + $40/smartphone
    6GB $90/mo. + $35/smartphone
    10GB $120/mo.
    15GB $160/mo. + $30/smartphone
    20GB $200/mo.

Python Web Scraping (Beautiful Soup, Selenium and PhantomJS): Only scraping part of full page

Tag : python-2.7 , By : geo
Date : March 29 2020, 07:55 AM
Related Posts Related QUESTIONS :
  • How to insert text in multiple files using python
  • Python merging excel files in directory
  • How to put the every start time as 0 in every day for specific column input data using panda python
  • Data Frame Error: UndefinedVariableError: name is not defined
  • Why won't a new line be created in this string? is it too long?
  • Python 3 - files imported as dictionary, but the values are lists - how to resolve?
  • Flask Tutorial: Could Not Import app in Visual Studio Code 1.37.1
  • 'TypeError: decoding str is not supported' when appending str in for loop within a for loop
  • How to scale a data using Python 3
  • How to create a matrix of characters with numpy broadcasting, meshgrid or other method
  • Is there any way of getting values from keys inside other keys?
  • Conditional Statements for dataframes
  • Python implementation of BFS to solve 8-puzzle takes too long to find a solution
  • Operand for matching any one of multiple cases
  • Is the rear item in a Queue the last item added or the item at the end of a Queue?
  • I am trying slicing but I have the following error message: slice indices must be integers or None or have an __index__
  • How to represent Binary tree into an array using python?
  • Vectorized implementation of field-aware factorization
  • 'Float' object has no attribute 'log'
  • pathlib mkdir creates a folder by filename
  • SyntaxError: invalid syntax for if statement
  • math.gcd() vs Euclidean Algo
  • Simplest way to read CSV file in a python function
  • How can I sort two lists identically?
  • Getting NaNs in X_train and X_test after training/splitting data
  • How to add extra information points to a Matplotlib plot?
  • How to Sort Alphabets
  • How could I fetch a secret from Secrets Manager and Pass it to my SSM Run Command Document via lambda?
  • I am getting failed to make TCP connection to port 8080: connection refused
  • How to get related field value from database in odoo 11 and postgresql?
  • How to remove the duplicates from a list
  • Rounding floating points in python
  • how to fix "There is at least 1 reference to internal data in the interpreter in the form of a numpy array or slice
  • calculate the arithmetic mean
  • ValueError: A merge layer should be called on a list of inputs. Tensorflow Keras
  • Generate random number with n digits and avoid using 0 as first digit?
  • Creating presigned url for a S3 folder in python
  • Is there a usecase for overriding __hash__?
  • Concatenating columns in pandas
  • How to create a dictionary using the the list of letters as keys with values being the uppercase version of the letters
  • Installing cwiid with Python 3 extension
  • sqlalchemy ORM query object returns result of different type depending on context
  • Concatenation of Lambda functions in Python 3
  • When Scraping got html with "encoded" part, is it possible to get it
  • Factor Analysis using Python Factor_Analyzer
  • opening csv file in a numpy.txt in python3
  • i tried installing tensorflow using 'pip install tensorflow ' in anaconda prompt and command prompt. its showing followi
  • Keras EarlyStopping is not recognized
  • Parallel processes overwriting progress bars (tqdm)
  • Even though strings in python are immutable how is that sort or sorted function works on it?
  • How to apply default value to python dataclass field when None was passed?
  • How to Fix Labels and Entries Inside Tab
  • Flask container is not up and running using docker
  • How can I import thread package in Python 3?
  • Extract text from .txt file and save into .csv files with columns and header
  • Structuring Google Cloud Platform project
  • Problem playing audio with playsound on python3
  • Problem while reading public key from .pem certificate into a variable in Python
  • ipysheet and dataframe. How modify value in a ipysheet when a checkbox is checked
  • How to check list in list with another list
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com