Welcome to Scrapy Inline Requests’s documentation!

Contents:

Scrapy Inline Requests

Documentation Status Coverage Status Code Quality Status Requirements Status

A decorator for writing coroutine-like spider callbacks.

Quickstart

The spider below shows a simple use case of scraping a page and following a few links:

from inline_requests import inline_requests
from scrapy import Spider, Request

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_url = response.urljoin('?page=%d' % i)
            try:
                next_resp = yield Request(next_url)
                urls.append(next_resp.url)
            except Exception:
                self.logger.info("Failed request %s", i, exc_info=True)

        yield {'urls': urls}

See the examples/ directory for a more complex spider.

Warning

The generator resumes its execution when a request’s response is processed, this means the generator won’t be resume after yielding an item or a request with it’s own callback.

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
  • High concurrency and large responses can cause higher memory usage.
  • This decorator assumes your method have the following signature (self, response).
  • Wrapped requests may not be able to be serialized by persistent backends.
  • Unless you know what you are doing, the decorated method must be a spider method and return a generator instance.

Installation

Stable release

To install Scrapy Inline Requests, run this command in your terminal:

$ pip install scrapy-inline-requests

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for Scrapy Inline Requests can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/rolando/scrapy-inline-requests

Or download the tarball:

$ curl  -OL https://github.com/rolando/scrapy-inline-requests/tarball/master

Once you have a copy of the source, you can install it with:

$ pip install -e .

Reference

inline_requests.inline_requests(method_or_func)[source]

A decorator to use coroutine-like spider callbacks.

Example:

class MySpider(Spider):

    @inline_callbacks
    def parse(self, response):
        next_url = response.urjoin('?next')
        try:
            next_resp = yield Request(next_url)
        except Exception as e:
            self.logger.exception("An error occurred.")
            return
        else:
            yield {"next_url": next_resp.url}

You must conform with the following conventions:

  • The decorated method must be a spider method.
  • The decorated method must use the yield keyword or return a generator.
  • The decorated method must accept response as the first argument.
  • The decorated method should yield Request objects without neither callback nor errback set.

If your requests don’t come back to the generator try setting the flag to handle all http statuses:

request.meta['handle_httpstatus_all'] = True

History

0.3.1 (2016-07-04)

  • Added deprecation about decorating non-spider functions.
  • Warn if the callback returns requests with callback or errback set. This reverts the compability with requests with callbacks.

0.3.0 (2016-06-24)

  • ~~Backward incompatible change: Added more restrictions to the request object (no callback/errback).~~
  • Cleanup callback/errback attributes before sending back the request to the generator. This fixes an edge case when using request.replace().
  • Simplified example spider.

0.2.0 (2016-06-23)

  • Python 3 support.

0.1.2 (2016-05-22)

  • Scrapy API and documentation updates.

0.1.1 (2013-02-03)

  • Minor tweaks and fixes.

0.1.0 (2012-02-03)

  • First release on PyPI.

Indices and tables