logo
down
shadow

Scrapy with dynamic captcha


Scrapy with dynamic captcha

Content Index :

Scrapy with dynamic captcha
Tag : python , By : dantino
Date : November 27 2020, 04:01 AM

I wish this help you Here's a complete solution to bypass the specified captcha using anticaptcha and PIL.
Due to the dynamic of this captcha, we need to grab a print screen of the img element containing the captcha. For that we use save_screenshot() and PIL to crop and save from PIL import Image from python_anticaptcha import AnticaptchaClient, ImageToTextTask from selenium import webdriver def get_captcha(): captcha_fn = "captcha.png" element = driver.find_element_by_name("imagen") # element name containing the catcha image location = element.location size = element.size driver.save_screenshot("temp.png") x = location['x'] y = location['y'] w = size['width'] h = size['height'] width = x + w height = y + h im = Image.open('temp.png') im = im.crop((int(x), int(y), int(width), int(height))) im.save(captcha_fn) # request anti-captcha service to decode the captcha api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' # api key -> https://anti-captcha.com/ captcha_fp = open(captcha_fn, 'rb') client = AnticaptchaClient(api_key) task = ImageToTextTask(captcha_fp) job = client.createTask(task) job.join() return job.get_captcha_text() start_url = "YOU KNOW THE URL" driver = webdriver.Chrome() driver.get(start_url) captcha = get_captcha() print( captcha )
ifds

Comments
No Comments Right Now !

Boards Message :
You Must Login Or Sign Up to Add Your Comments .

Share : facebook icon twitter icon

How do I set up Scrapy to deal with a captcha


Tag : python , By : user185144
Date : March 29 2020, 07:55 AM
will help you It's a really deep topic with a bunch of solutions. But if you want to apply the logic you've defined in your post you can use scrapy Downloader Middlewares.
Something like:
class CaptchaMiddleware(object):
    max_retries = 5
    def process_response(request, response, spider):
        if not request.meta.get('solve_captcha', False):
            return response  # only solve requests that are marked with meta key
        catpcha = find_catpcha(response)
        if not captcha:  # it might not have captcha at all!
            return response
        solved = solve_captcha(captcha)
        if solved:
            response.meta['catpcha'] = captcha
            response.meta['solved_catpcha'] = solved
            return response
        else:
            # retry page for new captcha
            # prevent endless loop
            if request.meta.get('catpcha_retries', 0) == 5:
                logging.warning('max retries for captcha reached for {}'.format(request.url))
                raise IgnoreRequest 
            request.meta['dont_filter'] = True
            request.meta['captcha_retries'] = request.meta.get('captcha_retries', 0) + 1
            return request
class MySpider(scrapy.Spider):
    def parse(self, response):
        url = ''# url that requires captcha
        yield Request(url, callback=self.parse_captchad, meta={'solve_captcha': True},
                      errback=self.parse_fail)

    def parse_captchad(self, response):
        solved = response['solved']
        # do stuff

    def parse_fail(self, response):
        # failed to retrieve captcha in 5 tries :(
        # do stuff

login from scrapy when website has captcha


Tag : python , By : billputer
Date : March 29 2020, 07:55 AM
hop of those help? You can try first through Optical Character Recognition (OCR) and then with a CAPTCHA solving API. See the chapter 7 of this book: https://www.packtpub.com/big-data-and-business-intelligence/web-scraping-python
There are also on-line services to solve captcha. For example: https://anti-captcha.com/

Scrapy, login with captcha failed


Tag : python , By : jay
Date : March 29 2020, 07:55 AM
This might help you ok, the problem here is that the captcha image needs to receive the cookies from the actual response, and you are using urllib2 to make the captcha request, so Scrapy isn't handling that by default.
Use a scrapy request to check the captcha, something like:
def parse(self, response):
    yield Request(url="http://tinyz.us/securimage/securimage_show.php", callback=self.parse_captcha, meta={'previous_response': response})

def parse_captcha(self, response):
    with open('captcha.png', 'wb') as f:
        f.write(response.body)

    captcha = raw_input("-----> Enter the captcha in manually :")

    return FormRequest.from_response(
        response=response.meta['previous_response'],
        formdata={"login_user": "myusername",
                  "login_password": "mypass",
                  "captcha_code": captcha},
        formxpath="//*[@id='login-form']",
        callback=self.after_login)

Scrapy - simple captcha solving example


Tag : python , By : user179938
Date : March 29 2020, 07:55 AM
Hope that helps Solving the captcha itself is easy using Pillow and Python Tesseract. The hard part was to realize how to handle cookies (PHPSESSID). Here's complete working example for your case (using Python 2):
# -*- coding: utf-8 -*-                                                         
import io                                                                       
import urllib2                                                                  

from PIL import Image                                                           
import pytesseract                                                              
import scrapy                                                                   


class CaptchaSpider(scrapy.Spider):                                             
    name = 'captcha'                                                            

    def start_requests(self):                                                   
        yield scrapy.Request('http://145.100.108.148/login3/',                  
                             cookies={'PHPSESSID': 'xyz'})                      

    def parse(self, response):                                                  
        img_url = response.urljoin(response.xpath('//img/@src').extract_first())

        url_opener = urllib2.build_opener()                                     
        url_opener.addheaders.append(('Cookie', 'PHPSESSID=xyz'))               
        img_bytes = url_opener.open(img_url).read()                             
        img = Image.open(io.BytesIO(img_bytes))                                 

        captcha = pytesseract.image_to_string(img)                              
        print 'Captcha solved:', captcha                                        

        return scrapy.FormRequest.from_response(                                
            response, formdata={'captcha': captcha},                            
            callback=self.after_captcha)                                        

    def after_captcha(self, response):                                          
        print 'Result:', response.body

Trying to response Amazon's Captcha with scrapy, strange behavior on spider generator


Tag : python , By : Anand
Date : March 29 2020, 07:55 AM
seems to work fine Currently, your form request object is never returned to Scrapy for handling.
Replace self.solve_captcha(response, self.parse) by yield from self.solve_captcha(response, self.parse).
Related Posts Related QUESTIONS :
  • Join original np array with resulting np array in a form of dictionary? multidimensional array? etc?
  • Forcing labels on histograms in each individual graph in a figure
  • For an infinite dataset, is the data used in each epoch the same?
  • Is there a more efficent way to extend a string?
  • How to calculate each single element of a numpy array based on conditions
  • How do I change the width of Jupyter notebook's cell's left part?
  • Measure distance between lat/lon coordinates and utm coordinates
  • Installing megam for NLTK on Windows
  • filter dataframe on each value of a samn column have a specific value of another column in Panda\Python
  • Threading with pubsub throwing AssertionError: 'callableObj is not callable' in wxPython
  • Get grouped data from 2 dataframes with condition
  • How can I import all of sklearns regressors
  • How to take all elements except the first k
  • Whats wrong with my iteration list of lists from csv
  • Tensorflow Estimator API save image summary in eval mode
  • How to Pack with PyQt - how to make QFrame/Layout adapt to content
  • How do I get certain Time Range in Python
  • python doubly linked list - insertAfter node
  • Open .h5 file in Python
  • Joining a directory name with a binary file name
  • python, sort list with two arguments in compare function
  • Is it possible to print from Python using non-ANSI colors?
  • Pandas concat historical data using date minus some number of days
  • CV2: Import Error in Python OpenCV
  • Is it possible to do this loop in a one-liner?
  • invalid literal for int() with base 10: - django
  • Why does my code print a value that I have not assigned as yet?
  • the collatz func in automate boring stuff with python
  • How to find all possible combinations of parameters and funtions
  • about backpropagation deep neural network in tensorflow
  • Sort strings in pandas
  • How do access my flask app hosted in docker?
  • Replace the sentence include some text with Python regex
  • Counting the most common element in a 2D List in Python
  • logout a user from the system using a function in python
  • mp4 metadata not found but exists
  • Django: QuerySet with ExpressionWrapper
  • Pandas string search in list of dicts
  • Decryption from RSA encrypted string from sqlite is not the same
  • need of maximum value in int
  • a list of several tuples, how to extract the same of the first two elements in the small tuple in the large tuple
  • Display image of 2D Sinewaves in 3D
  • how to prevent a for loop from overwriting a dictionary?
  • How To Fix: RuntimeError: size mismatch in pyTorch
  • Concatenating two Pandas DataFrames while maintaining index order
  • Why does this not run into an infinite loop?
  • Python Multithreading no current event loop
  • Element Tree - Seaching for specific element value without looping
  • Ignore Nulls in pandas map dictionary
  • How do I get scrap data from web pages using beautifulsoup in python
  • Variable used, golobal or local?
  • I have a regex statement to pull all numbers out of a text file, but it only finds 77 out of the 81 numbers in the file
  • How do I create a dataframe of jobs and companies that includes hyperlinks?
  • Detect if user has clicked the 'maximized' button
  • Does flask_login automatically set the "next" argument?
  • Indents in python 3
  • How to create a pool of threads
  • Pandas giving IndexError on one dataframe but not on another similar dataframe
  • Django Rest Framework - Testing client.login doesn't login user, ret anonymous user
  • Running dag without dag file in airflow
  • shadow
    Privacy Policy - Terms - Contact Us © scrbit.com