How do I get scrap data from web pages using beautifulsoup in python

How do I get scrap data from web pages using beautifulsoup in python

Content Index :

How do I get scrap data from web pages using beautifulsoup in python
Tag : python , By : MJRider
Date : January 11 2021, 05:14 PM

To fix the issue you can do Problem is in line movies=list(name.text) - you are creating list, where each item is character from the string name.text.
Instead of this list(), you can use list-comprehension movies = [name.text for name in names.find_all('a')]:
from bs4 import BeautifulSoup
import requests
import csv

url = 'https://www.imdb.com/chart/top'
res = requests.get(url)
soup = BeautifulSoup(res.text)
movie = soup.find_all(class_='titleColumn')

for names in movie:
    movies = [name.text for name in names.find_all('a')]
    # print(movies)

    # IN CSV
    with open('TopMovies.csv', 'a') as csvFile:
        writer = csv.writer(csvFile, delimiter = ' ')

print("Successfully inserted")

No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

xbmc/kodi python scrap data using BeautifulSoup

Tag : python , By : Kiltec
Date : March 29 2020, 07:55 AM
Hope that helps i want to edit some kodi addon that use re.compile to scrap data into BeautifulSoup4 the original code is like this
from bs4 import BeautifulSoup

html = """<div id="content">
  <span class="someclass">
    <span class="sec">
      <a class="frame" href="http://somlink.com/section/name-here" title="name here">
         <img src="http://www.somlink.com/thumb/imgsection/thumbnail.jpg" >
    <h3 class="title">
        <a href="http://somlink.com/section/name-here">name here</a>
    <span class="details"><span class="length">Length: 99:99</span>

soup = BeautifulSoup(html, "lxml")
sec = soup.find("span", {"class": "someclass"})
# get a tag with frame class
fr = sec.find("a", {"class": "frame"})

# pull img src and href from the a/frame
url, img = fr["href"], fr.find("img")["src"]

# get h3 with title class and extract the text from the anchor
name =  sec.select("h3.title a")[0].text

# "size" is in the span with the details class
size = sec.select("span.details")[0].text.split(None,1)[-1]

print(url, img, name.strip(), size.split(None,1)[1].strip())
('http://somlink.com/section/name-here', 'http://www.somlink.com/thumb/imgsection/thumbnail.jpg', u'name here', u'99:99')
def secs():
    soup = BeautifulSoup(html, "lxml")
    sections = soup.find_all("span", {"class": "someclass"})
    for sec in sections:
        fr = sec.find("a", {"class": "frame"})
        url, img = fr["href"], fr.find("img")["src"]
        name, size =  sec.select("h3.title a")[0].text, sec.select("span.details")[0].text.split(None,1)[-1]
        yield url, name, img,size

Scrap the article with Python 3.4 and BeautifulSoup ,Requests

Tag : python , By : gbodunski
Date : March 29 2020, 07:55 AM
I hope this helps you . The desired data is not actually located inside the element with status-list class. If you would inspect the source, you would find an empty container instead:
<div class="status_bd">
    <div id="statusLists" class="allStatuses no-head"></div>
import json
import re
import requests
from bs4 import BeautifulSoup

url = 'https://xueqiu.com/yaodewang'
headers = {
    'user-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36'
r = requests.get(url, headers=headers).content
soup = BeautifulSoup(r, 'lxml')

pattern = re.compile(r"SNB\.data\.statuses = ({.*?});", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)

data = json.loads(pattern.search(script.text).group(1))
for item in data["statuses"]:
The best advice: Remember common courtesy and act toward others as you want them to act toward you.
Lighten up! It&#39;s the weekend. we&#39;re just having a little fun! Industrial Bank is expected to rise,next week...
点.点.点... 点到这个,学位、学历、成绩单翻译一下要50块、100块的...

Data Scrap from a website to a csv file format using python and beautifulsoup

Tag : python , By : user169463
Date : March 29 2020, 07:55 AM
Hope that helps This should get you started. I'll break it down a bit too for you so you can modify and play while you're learning. I'm also suggesting to use Pandas, as it's a popular library for data manipulation and you'll be using in the near future if you're already not using it
I first initialize a results dataframe to store all the data you'll be parsing:
import bs4
import requests
import pandas as pd

results = pd.DataFrame()
my_url = 'https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description=graphics+card&N=-1&isNodeId=1'

response = requests.get(my_url)
html = response.text

soup = bs4.BeautifulSoup(html, 'html.parser')
Container_Main = soup.find_all("div",{"class":"item-container"})
for container in Container_Main:
results.to_csv('path/file.csv', index=False)
import bs4
import requests
import pandas as pd

results = pd.DataFrame()

my_url = 'https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=BESTMATCH&Description=graphics+card&N=-1&isNodeId=1'

response = requests.get(my_url)
html = response.text

soup = bs4.BeautifulSoup(html, 'html.parser')

Container_Main = soup.find_all("div",{"class":"item-container"})
for container in Container_Main:

    item_features = container.find("ul",{"class":"item-features"})

    # if there are no item-fetures, move on to the next container
    if item_features == None:

    temp_df = pd.DataFrame(index=[0])
    features_list = item_features.find_all('li')
    for feature in features_list:
        split_str = feature.text.split(':')        
        header = split_str[0]
        data = split_str[1].strip()
        temp_df[header] = data

    promo = container.find_all("p",{"class":"item-promo"})[0].text
    temp_df['promo'] = promo

    results = results.append(temp_df, sort = False).reset_index(drop = True)

results.to_csv('path/file.csv', index=False)

scrap top 100 job results from indeed using BeautifulSoup python

Tag : python , By : stu73
Date : March 29 2020, 07:55 AM
should help you out Do it in batches of 10 changing the start value in the url. You can loop incrementing and adding the add variable
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
results = []
url = 'https://www.indeed.co.in/jobs?q=software+developer&l=Bengaluru,+Karnataka&start={}'
with requests.Session() as s:
    for page in range(5):
        res = s.get(url.format(page))
        soup = bs(res.content, 'lxml')
        titles = [item.text.strip() for item in soup.select('[data-tn-element=jobTitle]')]
        companies = [item.text.strip() for item in soup.select('.company')]
        data = list(zip(titles, companies))
newList = [item for sublist in results for item in sublist]
df = pd.DataFrame(newList)

how to scrap the data from the url in python using beautifulsoup

Tag : python-3.x , By : itsmegb
Date : March 29 2020, 07:55 AM
help you fix your problem Using the following url:
import requests
from bs4 import BeautifulSoup
import csv

links = []
    for item in range(1, 372):
        print(f"Extraction Page# {item}")
        r = requests.get(
        if r.status_code == 200:
            soup = BeautifulSoup(r.text, 'html.parser')
            for item in soup.findAll('span', attrs={'class': 'captions-field'}):
                for a in item.findAll('a'):
                    a = a.get('href')
                    if a not in links:
except KeyboardInterrupt:
    print("Good Bye!")

data = []
    for link in links:
        r = requests.get(link)
        if r.status_code == 200:
            soup = BeautifulSoup(r.text, 'html.parser')
            for item in soup.findAll('div', attrs={'class': 'compProfileInfo'}):
                a = [a.text.strip() for a in item.findAll('span')]
                if a[6] == '':
                    a[6] = 'N/A'
except KeyboardInterrupt:
    print("Good Bye!")

while True:
        with open('output.csv', 'w+', newline='') as file:
            writer = csv.writer(file)
            writer.writerow(['Name', 'Phone', 'Email', 'Website'])
            print("Operation Completed")
    except PermissionError:
        print("Please Close The File")
    except KeyboardInterrupt:
        print("Good Bye")
Related Posts Related QUESTIONS :
  • How can I scrape text within paragraph tag with some other tags then within the paragraph text?
  • Custom entity ruler with SpaCy did not return a match
  • Logging with two handlers - one to file and one to stderr
  • How to do pivot_table in dask with aggfunc 'min'?
  • This for loop displays only the last entry of the student record
  • How to split a string by a specific pattern in number of characters?
  • Python 3: how to scrape research results from a website using CSFR?
  • Setting the scoring parameter of RandomizedSeachCV to r2
  • How to send alert or message from view.py to template?
  • How to add qml ScatterSeries to existing qml defined ChartView?
  • Django + tox: Apps aren't loaded yet
  • My css and images arent showing in django
  • Probability mass function sum 2 dice roll?
  • Cannot call ubuntu 'ulimit' from python subprocess without using shell option
  • Dataframe Timestamp Filter for new/repeating value
  • Problem with clicking select2 dropdownlist in selenium
  • pandas dataframe masks to write values into new column
  • How to click on item in navigation bar on top of page using selenium python?
  • Add multiple EntityRuler with spaCy (ValueError: 'entity_ruler' already exists in pipeline)
  • error when replacing missing ')' using negative look ahead regex in python
  • Is there a way to remove specific strings from indexes using a for loop?
  • select multiple tags by position in beautifulSoup
  • pytest: getting AttributeError: 'CaptureFixture' object has no attribute 'readouterror' capturing stdout
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • Python script to deduplicate lines in multiple files
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com