Scrape

From VuzeWiki
Jump to: navigation, search

Introduction[edit]

A scrape, aka "Tracker Scrape", is a request sent to a tracker. A request is sent, connection to the tracker is established, information is exchanged, then the connection is closed. A scrape is what every BitTorrent client (such as Azureus) does, to any tracker that is hosting a .torrent which is loaded into the client. The request does something like a "wipe" or a "pass" over the tracker, and then the tracker sends information back to the client.

Please note that some trackers don't respond to scrape requests, but you will still be able to download the torrent. The returned information can contain such information as, whether the tracker is OK or offline, the reason it is offline (unknown host exception, hash missing, etc), the numbers of peers and seeds, etc.

Every BitTorrent client scrapes the tracker many times during the course of a download to update the swarm information. So you can imagine that the tracker is scraped many thousands of times for that torrent alone, even if the swarm is not very big. The tracker can usually handle this number of requests. However, if there are more requests than strictly necessary, this can destabilize the tracker and put it offline.

Why Scrape?[edit]

While a torrent is incomplete, Azureus scrapes in order to determine whether or not to send an announcement requesting more peers. Sending a list of peers is usually more bandwidth-consuming than sending a scrape result.

When a torrent is complete, Azureus periodically scrapes in order to determine which torrents are the neediest. Without scraping, Azureus would never know which torrents have no seeds and require assistance.

Scrape Intervals[edit]

Azureus calculates the interval to scrape based on:

  • min_request_interval flag sent by the tracker
  • # of clients known to be seeding

Azureus will never knowingly scrape a tracker more than once every 15 minutes. (It could unknowingly scrape more often if the user restarts Azureus more than once within 15 minutes). If min_request_interval is specified, it won't scrape for the torrents returned until the time specified.

Azureus will scrape at least once every 3 hours. Even on torrents with a large number of clients, the dynamics can change dramatically in 3 hours. For example, a new torrent may have thousands of clients, but within 3 hours all of those clients could finish downloading and leave, leaving the torrent with a poor seed:peer ratio.

Tracker Scrape Convention[edit]

Aside from convention described at the Bittorrent Specification Wiki, Azureus supports additional unofficial specs. These specs are used on a variety of trackers and are considered de-facto standards.

Scraping[edit]

Scraping is the act of sending a scrape request to the tracker.

Multi-Hash Requests[edit]

The scrape convention specifies that passing a info_hash as an URL parameter limits the scrape results to only providing information from that hash. When Azureus has multiple torrents from a single tracker, it will send multiple info_hash parameters on it's URL query string, hoping that the tracker will send back information on each hash.

Scrape Results[edit]

Azureus supports the following dictionary/keys.

files
  infohash
    complete
    incomplete
    downloaded
    name
failure reason
flags
  min_request_interval

Non-official keys are optional and are explained below.

failure reason[edit]

Similar to the failure reason returned in an announce result. The value is a human readable error string as to why the scrape failed.

min_request_interval[edit]

The value for this key is an integer specifying how the minimum number of seconds for Azureus to wait before scraping the tracker again.

Please note the flags is a dictionary key of the main dictionary, and its value is a dictionary. Inside the dictionary, is a key/value pair for min_request_interval.

Examples[edit]

Note: New lines in the request/results are not part of request/result. They exist only to help with formatting in the wiki.

Single Request[edit]

Request:

http://tracker/scrape?hash_id=xxxxxxxxxxxxxxxxxxxx

Reply:

d5:filesd20:xxxxxxxxxxxxxxxxxxxxd8:completei2e10:downloadedi0e10:incompletei4e
4:name12:xxxxxxxxxxxxee5:flagsd20:min_request_intervali3600eee

This tells us that torrent with hash 'xxxxxxxxxxxxxxxxxxxx' has 2 seeders, and 4 leechers. The torrent has been downloaded 0 times, and its name is xxxxxxxxxxxx. A scrape will not occur until at least 3600 seconds, or 60 minutes.

Multi Request[edit]

Request:

http://tracker/scrape?info_hash=xxxxxxxxxxxxxxxxxxxx&info_hash=
yyyyyyyyyyyyyyyyyyyy&info_hash=zzzzzzzzzzzzzzzzzzzz

Reply:

d5:filesd20:xxxxxxxxxxxxxxxxxxxxd8:completei19e10:downloadedi23896e
10:incompletei21e4:name6:Name Xe20:yyyyyyyyyyyyyyyyyyyyd8:completei23e
10:downloadedi24026e10:incompletei21e4:name6:Name Ye20:zzzzzzzzzzzzzzzzzzzz
d8:completei15e10:downloadedi26171e10:incompletei23e4:name6:Name Zee5:flags
d20:min_request_intervali18000eee