logo
down
shadow

Python 3: how to scrape research results from a website using CSFR?


Python 3: how to scrape research results from a website using CSFR?

Content Index :

Python 3: how to scrape research results from a website using CSFR?
Tag : python , By : CHeMoTaCTiC
Date : January 12 2021, 09:11 PM

Hope that helps I would alter the page param in the post requests during a loop. Do an initial request to find out number of pages
from bs4 import BeautifulSoup as bs
import requests, re, math
import pandas as pd

headers = {
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
    'Referer': 'https://www.orias.fr/web/guest/search'
}

params = [['p_p_id', 'intermediaryDetailedSearch_WAR_oriasportlet'],
    ['p_p_lifecycle', '0'],
    ['p_p_state', 'normal'],
    ['p_p_mode', 'view'],
    ['p_p_col_id', 'column-1'],
    ['p_p_col_count', '1'],
    ['_intermediaryDetailedSearch_WAR_oriasportlet_d-16544-p', '1'],
    ['_intermediaryDetailedSearch_WAR_oriasportlet_implicitModel', 'true'],
    ['_intermediaryDetailedSearch_WAR_oriasportlet_spring_render', 'searchResult']]

data = {
  'searchString': '',
  'address': '',
  'zipCodeOrCity': '',
  '_coa': 'on',
  '_aga': 'on',
  '_ma': 'on',
  '_mia': 'on',
  '_euIAS': 'on',
  'mandatorDenomination': '',
  'wantsMandator': 'no',
  '_cobsp': 'on',
  '_mobspl': 'on',
  '_mobsp': 'on',
  '_miobsp': 'on',
  '_bankActivities': '1',
  '_euIOBSP': 'on',
  '_cif': 'on',
  '_alpsi': 'on',
  '_cip': 'on',
  'ifp': 'true',
  '_ifp': 'on',
  'submit': 'Search'
}

p = re.compile(r'(\d+)\s+intermediaries found')

with requests.Session() as s:
    r= requests.post('https://www.orias.fr/search', headers=headers, params= params, data=data)
    soup = bs(r.content, 'lxml') 
    num_results = int(p.findall(r.text)[0])
    results_per_page = 20
    num_pages = math.ceil(num_results/results_per_page)
    df = pd.read_html(str(soup.select_one('.table')))[0]

    for i in range(2, num_pages + 1):
        params[6][1] = str(i)
        r= requests.post('https://www.orias.fr/search', headers=headers, params= params, data=data)
        soup = bs(r.content, 'lxml') 
        df_next = pd.read_html(str(soup.select_one('.table')))[0]
        df = pd.concat([df, df_next])

df.drop('Unnamed: 6', axis = 1, inplace = True)
df.reset_index(drop=True) 
print(len(df['Siren Number'].unique()))
#245

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

How to scrape all possible results from a search bar of a website


Tag : development , By : boonchew
Date : March 29 2020, 07:55 AM
hop of those help? Please scroll down to UPDATE 2
The website enforces you to enter at least one search parameter, so you may loop through all items for Arbejdsområde list, making request for each of them. Here is the example, showing how that could be done in Excel VBA (open VBE, create standard module, paste the code and run Test()):
Option Explicit

Sub Test()

    Dim sResponse As String
    Dim oItems As Object
    Dim vItem
    Dim aData
    Dim sContent As String
    Dim lPage As Long
    Dim i As Long
    Dim j As Long

    ' Retrieve search page HTML content
    XmlHttpRequest "GET", "http://www.advokatnoeglen.dk/", "", "", "", sResponse
    ' Extract work areas items
    ExtractOptions sResponse, "ctl00$ContentPlaceHolder$Search$AreaSelect", oItems
    oItems.Remove oItems.Keys()(0)
    sContent = ""
    ' Process each work area item
    For Each vItem In oItems.Items()
        Debug.Print "Item [" & vItem & "]"
        lPage = 0
        ' Process each results page
        Do
            Debug.Print vbTab & "Page [" & lPage & "]"
            ' Retrieve result page HTML content
            XmlHttpRequest "GET", "http://www.advokatnoeglen.dk/sog.aspx?s=1&t=0&a=" & vItem & "&p=" & lPage, "", "", "", sResponse
            ' Extract result table
            ParseResponse _
                "<table\b[^>]*?id=""ctl00_ContentPlaceHolder_Grid""[^>]*>([\s\S]*?)</table>", _
                sResponse, _
                aData, _
                False
            ' Store parsed table
            sContent = sContent & aData(0)
            Debug.Print vbTab & "Parsed " & Len(sContent)
            lPage = lPage + 1
            DoEvents
        Loop Until InStr(sResponse, "<a class=""next""") = 0
    Next
    ' Extract data from the whole content
    ParseResponse _
        "<tr.*?onclick=""location.href='([^']*)'"">\s*" & _
        "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
        "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
        "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
        "</tr>", _
        sContent, _
        aData, _
        False
    ' Rebuild nested arrays to 2d array for output
    aData = Denestify(aData)
    ' Decode HTML
    For i = 1 To UBound(aData, 1)
        For j = 2 To 4
            aData(i, j) = GetInnerText((aData(i, j)))
        Next
    Next
    ' Output
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
        Output2DArray .Cells(1, 1), aData
        .Columns.AutoFit
        .Rows.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub XmlHttpRequest(sMethod, sUrl, aSetHeaders, sFormData, sRespHeaders, sRespText)

    Dim aHeader

    'With CreateObject("MSXML2.ServerXMLHTTP")
        '.SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
    With CreateObject("MSXML2.XMLHTTP")
        .Open sMethod, sUrl, False ' , "u051772", "fy17janr"
        If IsArray(aSetHeaders) Then
            For Each aHeader In aSetHeaders
                .SetRequestHeader aHeader(0), aHeader(1)
            Next
        End If
        .Send (sFormData)
        sRespHeaders = .GetAllResponseHeaders
        sRespText = .ResponseText
    End With

End Sub

Sub ExtractOptions(sContent As String, ByVal sName As String, oOptions As Object)

    Dim aTmp0
    Dim vItem

    ' Escape RegEx special characters
    For Each vItem In Array("\", "*", "+", "?", "^", "$", ".", "[", "]", "{", "}", "(", ")", "|", "/")
        sName = Replace(sName, vItem, "\" & vItem)
    Next
    ' Extract the whole <select> for parameter
    ParseResponse "<select[^>]* name=""?" & sName & """?[^>]*>[^<]*((?:<option[^>]*>[^<]*</option>[^<]*)+)[^<]*</[^>]*>", sContent, aTmp0, False
    ' Extract each parameter <option>
    ParseResponse "<option[^>]*value=(""[^""]*""|[^\s>]*)[^>]*>([^<]*)</option>", (aTmp0(0)), aTmp0, False
    ' Put each parameter and value into dictionary
    Set oOptions = CreateObject("Scripting.Dictionary")
    For Each vItem In aTmp0
        oOptions(GetInnerText((vItem(1)))) = GetInnerText(Replace(vItem(0), """", ""))
    Next

End Sub

Sub ParseResponse(sPattern, sResponse, aData, Optional bAppend As Boolean = True, Optional bGlobal = True, Optional bMultiLine = True, Optional bIgnoreCase = True)

    Dim oMatch
    Dim aTmp0()
    Dim sSubMatch

    If Not (IsArray(aData) And bAppend) Then aData = Array()
    With CreateObject("VBScript.RegExp")
        .Global = bGlobal
        .MultiLine = bMultiLine
        .IgnoreCase = bIgnoreCase
        .Pattern = sPattern
        For Each oMatch In .Execute(sResponse)
            If oMatch.SubMatches.Count = 1 Then
                PushItem aData, oMatch.SubMatches(0)
            Else
                aTmp0 = Array()
                For Each sSubMatch In oMatch.SubMatches
                    PushItem aTmp0, sSubMatch
                Next
                PushItem aData, aTmp0
            End If
        Next
    End With

End Sub

Sub PushItem(aData, vItem, Optional bAppend As Boolean = True)

    If Not (IsArray(aData) And bAppend) Then aData = Array()
    ReDim Preserve aData(UBound(aData) + 1)
    aData(UBound(aData)) = vItem

End Sub

Function GetInnerText(sText As String) As String

    Static oHtmlfile As Object
    Static oDiv As Object

    If oHtmlfile Is Nothing Then
        Set oHtmlfile = CreateObject("htmlfile")
        oHtmlfile.Open
        Set oDiv = oHtmlfile.createElement("div")
    End If
    oDiv.innerHTML = sText
    GetInnerText = oDiv.innerText

End Function

Function Denestify(aRows)

    Dim aData()
    Dim aItems()
    Dim i As Long
    Dim j As Long

    If UBound(aRows) = -1 Then Exit Function
    ReDim aData(1 To UBound(aRows) + 1, 1 To 1)
    For j = 0 To UBound(aRows)
        If IsArray(aRows(j)) Then
            aItems = aRows(j)
            For i = 0 To UBound(aItems)
                If i + 1 > UBound(aData, 2) Then ReDim Preserve aData(1 To UBound(aRows) + 1, 1 To i + 1)
                aData(j + 1, i + 1) = aItems(i)
            Next
        Else
            aData(j + 1, 1) = aRows(j)
        End If
    Next
    Denestify = aData

End Function

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub
Option Explicit

Sub Test()

    Dim sResponse As String
    Dim oItems As Object
    Dim vKey
    Dim sItem As String
    Dim aTmp
    Dim aData
    Dim lPage As Long
    Dim i As Long
    Dim j As Long

    ' Retrieve search page HTML content
    XmlHttpRequest "GET", "http://www.advokatnoeglen.dk/", "", "", "", sResponse
    ' Extract Retskreds items
    ExtractOptions sResponse, "ctl00$ContentPlaceHolder$Search$CourtSelect", oItems
    oItems.Remove oItems.Keys()(0)
    i = 0
    ' Process each Retskreds item
    For Each vKey In oItems
        sItem = oItems(vKey)
        Debug.Print "Area " & sItem & " " & vKey
        lPage = 0
        ' Process each results page
        Do
            Debug.Print vbTab & "Page " & lPage
            ' Retrieve results page
            XmlHttpRequest "GET", "http://www.advokatnoeglen.dk/sog.aspx?s=1&t=0&c=" & sItem & "&p=" & lPage, "", "", "", sResponse
            ' Extract table
            ParseResponse _
                "<table\b[^>]*?id=""ctl00_ContentPlaceHolder_Grid""[^>]*>([\s\S]*?)</table>", _
                sResponse, _
                aTmp, _
                False
            ' Extract data from the table
            ParseResponse _
                "<tr.*?onclick=""location.href='([^']*)'"">\s*" & _
                "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
                "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
                "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
                "</tr>", _
                aTmp(0), _
                aData, _
                True
            ' Add Retskreds name
            For i = i To UBound(aData)
                aTmp = aData(i)
                PushItem aTmp, vKey
                aData(i) = aTmp
            Next
            Debug.Print vbTab & "Parsed " & UBound(aData)
            lPage = lPage + 1
            DoEvents
        Loop Until InStr(sResponse, "<a class=""next""") = 0
    Next
    ' Retrieve detailed info for each entry
    For i = 0 To UBound(aData)
        aTmp = aData(i)
        ' Retrieve details page
        aTmp(0) = "http://www.advokatnoeglen.dk" & aTmp(0)
        ' Extract details
        XmlHttpRequest "GET", aTmp(0), "", "", "", sResponse
        ParseResponse _
            DecodeUriComponent( _
                "Arbejdsomr%C3%A5der\: [\s\S]*?</h2>[\s\S]*?" & _
                "Beskikkelses%C3%A5r\: ([^<]*)[\s\S]*?" & _
                "F%C3%B8dsels%C3%A5r\: ([^<]*)[\s\S]*?" & _
                "M%C3%B8deret for landsret\: ([^<]*)[\s\S]*?" & _
                "M%C3%B8deret for h%C3%B8jesteret\: ([^<]*)[\s\S]*?" & _
                "E-mail\: [\s\S]*?href='\/email\.aspx\?e\=(.*?)'[\s\S]*?" & _
                "Mobiltlf\.\: ([\d\(\)\-+ ]*?)\s*<"), _
            sResponse, _
            aTmp, _
            True, _
            False
        aTmp(9) = StrReverse(aTmp(9))
        aData(i) = aTmp
        Debug.Print vbTab & "Details " & i
        DoEvents
    Next
    ' Rebuild nested arrays to 2d array for output
    aData = Denestify(aData)
    ' Decode HTML
    For i = 1 To UBound(aData, 1)
        For j = 2 To 4
            aData(i, j) = Trim(Replace(GetInnerText((aData(i, j))), vbCrLf, ""))
        Next
    Next
    ' Output
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
        OutputArray .Cells(1, 1), _
            Array("URL", _
                "Navn", _
                "Firma", _
                DecodeUriComponent("Arbejdsomr%C3%A5der"), _
                DecodeUriComponent("Retskreds"), _
                DecodeUriComponent("Beskikkelses%C3%A5r"), _
                DecodeUriComponent("F%C3%B8dsels%C3%A5r"), _
                DecodeUriComponent("M%C3%B8deret for landsret"), _
                DecodeUriComponent("M%C3%B8deret for h%C3%B8jesteret"), _
                "E-mail", _
                "Mobiltlf." _
            )
        Output2DArray .Cells(2, 1), aData
        .Columns.AutoFit
        .Rows.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub XmlHttpRequest(sMethod, sUrl, aSetHeaders, sFormData, sRespHeaders, sRespText)

    Dim aHeader

    'With CreateObject("MSXML2.ServerXMLHTTP")
        '.SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
    With CreateObject("MSXML2.XMLHTTP")
        .Open sMethod, sUrl, False
        If IsArray(aSetHeaders) Then
            For Each aHeader In aSetHeaders
                .SetRequestHeader aHeader(0), aHeader(1)
            Next
        End If
        .Send (sFormData)
        sRespHeaders = .GetAllResponseHeaders
        sRespText = .ResponseText
    End With

End Sub

Sub ExtractOptions(sContent As String, ByVal sName As String, oOptions As Object)

    Dim aTmp0
    Dim vItem

    ' Escape RegEx special characters
    For Each vItem In Array("\", "*", "+", "?", "^", "$", ".", "[", "]", "{", "}", "(", ")", "|", "/")
        sName = Replace(sName, vItem, "\" & vItem)
    Next
    ' Extract the whole <select> for parameter
    ParseResponse "<select[^>]* name=""?" & sName & """?[^>]*>[^<]*((?:<option[^>]*>[^<]*</option>[^<]*)+)[^<]*</[^>]*>", sContent, aTmp0, False
    ' Extract each parameter <option>
    ParseResponse "<option[^>]*value=(""[^""]*""|[^\s>]*)[^>]*>([^<]*)</option>", (aTmp0(0)), aTmp0, False
    ' Put each parameter and value into dictionary
    Set oOptions = CreateObject("Scripting.Dictionary")
    For Each vItem In aTmp0
        oOptions(GetInnerText((vItem(1)))) = GetInnerText(Replace(vItem(0), """", ""))
    Next

End Sub

Sub ParseResponse(sPattern, sResponse, aData, Optional bAppend As Boolean = True, Optional bNestSubMatches = True, Optional bGlobal = True, Optional bMultiLine = True, Optional bIgnoreCase = True)

    Dim oMatch
    Dim aTmp0()
    Dim sSubMatch

    If Not (IsArray(aData) And bAppend) Then aData = Array()
    With CreateObject("VBScript.RegExp")
        .Global = bGlobal
        .MultiLine = bMultiLine
        .IgnoreCase = bIgnoreCase
        .Pattern = sPattern
        For Each oMatch In .Execute(sResponse)
            If oMatch.SubMatches.Count = 1 Then
                PushItem aData, oMatch.SubMatches(0)
            Else
                If bNestSubMatches Then
                    aTmp0 = Array()
                    For Each sSubMatch In oMatch.SubMatches
                        PushItem aTmp0, sSubMatch
                    Next
                    PushItem aData, aTmp0
                Else
                    For Each sSubMatch In oMatch.SubMatches
                        PushItem aData, sSubMatch
                    Next
                End If
            End If
        Next
    End With

End Sub

Sub PushItem(aData, vItem, Optional bAppend As Boolean = True)

    If Not (IsArray(aData) And bAppend) Then aData = Array()
    ReDim Preserve aData(UBound(aData) + 1)
    aData(UBound(aData)) = vItem

End Sub

Function DecodeUriComponent(sEncoded As String) As String

    Static objHtmlfile As Object

    If objHtmlfile Is Nothing Then
        Set objHtmlfile = CreateObject("htmlfile")
        objHtmlfile.parentWindow.execScript "function decode(s) {return decodeURIComponent(s)}", "jscript"
    End If
    DecodeUriComponent = objHtmlfile.parentWindow.decode(sEncoded)

End Function

Function GetInnerText(sText As String) As String

    Static oHtmlfile As Object
    Static oDiv As Object

    If oHtmlfile Is Nothing Then
        Set oHtmlfile = CreateObject("htmlfile")
        oHtmlfile.Open
        Set oDiv = oHtmlfile.createElement("div")
    End If
    oDiv.innerHTML = sText
    GetInnerText = oDiv.innerText

End Function

Function Denestify(aRows)

    Dim aData()
    Dim aItems()
    Dim i As Long
    Dim j As Long

    If UBound(aRows) = -1 Then Exit Function
    ReDim aData(1 To UBound(aRows) + 1, 1 To 1)
    For j = 0 To UBound(aRows)
        If IsArray(aRows(j)) Then
            aItems = aRows(j)
            For i = 0 To UBound(aItems)
                If i + 1 > UBound(aData, 2) Then ReDim Preserve aData(1 To UBound(aRows) + 1, 1 To i + 1)
                aData(j + 1, i + 1) = aItems(i)
            Next
        Else
            aData(j + 1, 1) = aRows(j)
        End If
    Next
    Denestify = aData

End Function

Sub OutputArray(oDstRng As Range, aCells As Variant, Optional sFormat As String = "@")

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = sFormat
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant, Optional sFormat As String = "@")

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = sFormat
            .Value = aCells
        End With
    End With

End Sub
Option Explicit

Sub Test()

    Dim sResponse As String
    Dim aTmp
    Dim aData
    Dim lPage As Long
    Dim i As Long
    Dim j As Long

    lPage = 0
    ' Process each results page
    Do
        Debug.Print vbTab & "Page " & lPage
        ' Retrieve results page
        XmlHttpRequest "GET", "http://www.advokatnoeglen.dk/sog.aspx?s=1&t=0&firm=%20&p=" & lPage, "", "", "", sResponse
        ' Extract table
        ParseResponse _
            "<table\b[^>]*?id=""ContentPlaceHolder_Grid""[^>]*>([\s\S]*?)</table>", _
            sResponse, _
            aTmp, _
            False
        ' Extract data from the table
        ParseResponse _
            "<tr.*?onclick=""location.href=&#39;(.*?)&#39;"">\s*" & _
            "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
            "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
            "<td[^>]*>\s*([\s\S]*?)\s*</td>\s*" & _
            "</tr>", _
            aTmp(0), _
            aData, _
            True
        Debug.Print vbTab & "Parsed " & (UBound(aData) + 1)
        lPage = lPage + 1
        DoEvents
    Loop Until InStr(sResponse, "<a class=""next""") = 0
    ' Retrieve detailed info for each entry
    For i = 0 To UBound(aData)
        aTmp = aData(i)
        ' Retrieve details page
        aTmp(0) = "http://www.advokatnoeglen.dk" & aTmp(0)
        ' Extract details
        Do
            XmlHttpRequest "GET", aTmp(0), "", "", "", sResponse
            If InStr(sResponse, "<title>Runtime Error</title>") = 0 Then Exit Do
            DoEvents
        Loop
        ParseResponse _
            DecodeUriComponent( _
                "Arbejdsomr%C3%A5der\: [\s\S]*?</h2>[\s\S]*?" & _
                "Beskikkelses%C3%A5r\: ([^<]*)[\s\S]*?" & _
                "(:?F%C3%B8dsels%C3%A5r\: ([^<]*)[\s\S]*?)?" & _
                "M%C3%B8deret for landsret\: ([^<]*)[\s\S]*?" & _
                "M%C3%B8deret for h%C3%B8jesteret\: ([^<]*)[\s\S]*?" & _
                "(:?E-mail [\s\S]*?href='\/email\.aspx\?e\=(.*?)'[\s\S]*?)?" & _
                "Mobiltlf\.\: ([\d\(\)\-+ ]*?)\s*<"), _
            sResponse, _
            aTmp, _
            True, _
            False
        aTmp(8) = StrReverse(aTmp(8))
        aData(i) = aTmp
        Debug.Print vbTab & "Details " & i
        DoEvents
    Next
    ' Rebuild nested arrays to 2d array for output
    aData = Denestify(aData)
    ' Decode HTML
    For i = 1 To UBound(aData, 1)
        For j = 2 To 4
            aData(i, j) = Trim(Replace(GetInnerText((aData(i, j))), vbCrLf, ""))
        Next
    Next
    ' Output
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
        OutputArray .Cells(1, 1), _
            Array("URL", _
                "Navn", _
                "Firma", _
                DecodeUriComponent("Arbejdsomr%C3%A5der"), _
                DecodeUriComponent("Beskikkelses%C3%A5r"), _
                DecodeUriComponent("F%C3%B8dsels%C3%A5r"), _
                DecodeUriComponent("M%C3%B8deret for landsret"), _
                DecodeUriComponent("M%C3%B8deret for h%C3%B8jesteret"), _
                "E-mail", _
                "Mobiltlf." _
            )
        Output2DArray .Cells(2, 1), aData
        .Columns.AutoFit
        .Rows.AutoFit
    End With
    MsgBox "Completed"

End Sub

Scrapy - use website's search engine to scrape results


Tag : python , By : drbillll
Date : March 29 2020, 07:55 AM
like below fixes the issue If the search term is not reflected in the URL, it means that it is transmitted to the server as a POST reqest. This means that your Scrapy code also needs to make a POST request in order to submit the desired search term.
The Scrapy request documentation has examples for making a POST request, simulating a form submission:
return [FormRequest(url="http://www.example.com/post/action",
                formdata={'name': 'John Doe', 'age': '27'},
                callback=self.after_post)]

I'm trying to scrape data from the at the races website but the scraper is not returning any results


Tag : python , By : Search Classroom
Date : March 29 2020, 07:55 AM
around this issue This will work. Leave a comment if you need the output to be different.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait

url = 'http://www.attheraces.com/racecard/Wolverhampton/6-October-2018/1715'

driver = webdriver.Chrome()
driver.get(url)
driver.implicitly_wait(2)
driver.find_element_by_xpath('//*[@id="racecard-tabs-1061960"]/div[1]/div/div[1]/ul/li[2]/a').click()

WebDriverWait(driver, 5).until(expected_conditions.presence_of_element_located((By.XPATH, '//*[@id="tab-racecard-sectional-times"]/div/div[1]/div[1]/div[2]/div/button')))

# method 1
for horse in driver.find_elements_by_class_name('card-item'):
    horseName = horse.find_element_by_class_name('form-link').text
    times = horse.find_elements_by_class_name('sectionals-time')
    times = [time.text for time in times]
    print('{}: {}'.format(horseName, times))

print()

# method 2
for horse in driver.find_elements_by_class_name('card-item'):
    for time in horse.find_elements_by_class_name('sectionals-time'):
        print(time.text)
    print()

driver.close()

Python + BeautifulSoup: Can't seem to scrape the specific data that I want from a website due to the website's formattin


Tag : python-3.x , By : apple
Date : March 29 2020, 07:55 AM
will be helpful for those in need If I understood you correctly, you need a number that comes after "Dividend / Dividend Yield" phrase. If so, then you can do something like this:
...(your code above)...
share_dividend_yield = page_soup.find("table", {"class": "name-value-pair hide-for-960"})

tds = share_dividend_yield.find_all('td')

for i in tds:
    if 'Dividend' in i.text:
        print(i.find_next('td').text)

Python Selenium: how to scrape results after clicking each record in a table on website


Tag : python , By : Tink
Date : March 29 2020, 07:55 AM
will be helpful for those in need The website has a table containing multiple elements that can be clicked. , Try this below.
all_df = pd.DataFrame() #<====== created over all df
for i in range(table_pd2.shape[0]):

driver.find_element_by_link_text(table_pd2[0][i]).click()
driver.switch_to.window(driver.window_handles[1])

bs = BeautifulSoup(driver.page_source, 'html.parser')
table = bs.find_all('table', id='xxx')
table_pd = pd.read_html(str(table))
table_pd = table_pd[0]
all_df.append(table_pd) #<====== appending to over all df
driver.close()
driver.switch_to.window(driver.window_handles[0])
Related Posts Related QUESTIONS :
  • Shipping PyGObject/GTK+ app on Windows with MingW
  • Python script to deduplicate lines in multiple files
  • How to prevent window and widgets in a pyqt5 application from changing size when the visibility of one widget is altered
  • How to draw stacked bar plot from df.groupby('feature')['label'].value_counts()
  • Python subprocess doesn't work without sleep
  • How can I adjust 'the time' in python with module Re
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • Why does this not run into an infinite loop?
  • Python Multithreading no current event loop
  • Element Tree - Seaching for specific element value without looping
  • Ignore Nulls in pandas map dictionary
  • How do I get scrap data from web pages using beautifulsoup in python
  • Variable used, golobal or local?
  • I have a regex statement to pull all numbers out of a text file, but it only finds 77 out of the 81 numbers in the file
  • How do I create a dataframe of jobs and companies that includes hyperlinks?
  • Detect if user has clicked the 'maximized' button
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com