Azureus messaging protocol
Azureus Extended Messaging Protocol
At some point in the future, this protocol may be superseded by an Enhanced messaging protocol - but currently, this is the main extended messaging protocol that Azureus supports.
Note: If you choose to support this protocol as well as the LibTorrent extension protocol, it is important that you follow the rules described here. See also the article about other protocol extensions.
Since version 18.104.22.168 Azureus sets a bit in a reserved part of the BT-handshake which indicates it is capable of advanced messaging. If both peers in a connection claim to support that an additional messaging handshake is performed to negotiate the mutual available messages, announce the exact client Name/Version and the ports through which they are reachable (which simplifies the peer exchange procedure and reconnecting). Besides the normal BT messages (choke, piece, insterested ...) Azureus can send arbitrary information through additional messages, i.e. the ~ChatMessages from the ChatPlugin or Peer Exchange (PEX) messages.
In the future other messages will be added either directly in the core or through Plugins to provide additional features.
This bit is a work-in-progress!
To indicate that a client supports the Azureus Messaging Protocol, then it should indicate this in the BT handshake by setting the first (most significant) bit in the first byte of the reserved bytes section to true.
If the other client indicates that it also supports the Azureus Messaging Protocol, then both clients should begin communicating over the Azureus Messaging Protocol. Each client should then send a AZ-handshake message to each other to indicate what messages they support.
Each message in the Azureus protocol is of this format:
- [4 bytes] -> Signed integer indicating how big the message is (in number of bytes). This value does not include the 4 bytes indicating the size of the message.
- [4 bytes] -> Signed integer indicating how big the message type name is.
- [x bytes] -> The message type name - usually a string encoded in ASCII (what should this be?) - the size of which is indicated by the previous part of the header.
- [1 byte] -> The message type version (usually starts from 0x01, increments as the specification for the message type changes). Note: This indicates the version of the message type itself, not of the Azureus messaging protocol (which does not contain version information and cannot be revised). If the 4th (most signficant) bit is true, then this indicates that there is some message padding to follow.
So, if the message type version is 1, and there is no padding, the value will be 0000 0001. If the message type version is 1, but padding is enabled, this will be 0001 0001.
The first 4 bits are currently reserved for flags - the last 4 bits are allowed for the version number.
Message header padding is usually enabled if the transport stream is encrypted.
If padding is enabled, then the next two sections will be present in the header:
- [2 bytes] -> Signed short integer indicating the length of the padding section.
- [x bytes] -> Padding section - these bytes can be ignored.
- [y bytes] -> Message payload - the size of which is the number of bytes in the message, minus the number of bytes used in the header.
For the handshake, the data must be a bencoded dictionary of values. Note: I need confirmation that the encoding is UTF-8, since I don't believe Azureus currently demands it
Required dictionary values:
- "identity": 20 bytes representing the peer ID.
- "client": A UTF-8 encoded string representing the name of the client application (e.g. "Azureus", not "Azureus 22.214.171.124")
- "version": A UTF-8 encoded string representing the version of the client application (e.g. "126.96.36.199")
- "messages": An unordered sequence of mappings indicating what message types are supported. The structure of these mappings are as follows:
- "id": A UTF-8 encoded string representing the name of the message type.
- "ver": A one byte character representing the version of the message type supported.
Optional dictionary values:
- "tcp_port": An integer indicating the incoming TCP port for data transmission.
- "udp_port": An integer indicating the incoming UDP port for data transmission.
- "udp2_port": An integer indicating the incoming UDP port for non-data transmission (can someone clarify what this is?)
- "handshake_type": An integer indicating what type of handshake is used - 0 (default) means plain communication, 1 means encrypted (this needs clarifying)
This section just gives a brief indication of what messages will commonly be transmitted over the protocol - it is not intended to describe what data any message type contains, or how it is encoded.
Since a client which uses the Azureus messaging protocol will no longer communicate via the standard BitTorrent protocol, it is necessary to map standard BitTorrent messages to an equivalent Azureus messaging protocol. Here are the current message types used (all encoded to UTF-8 strings):
The last one will not be transmitted over the messaging protocol in reality (since the handshake would have occurred before switching over to the Azureus messaging protocol - but is defined for completeness).
These are the following messages that Azureus defines - it is not required for a client to support these messages to talk over the Azureus messaging protocol (apart from the handshake).
This section will contain other known messages that are sent over the protocol (this does not necessarily mean they are supported by the Azureus client itself).
- The chat plugin
These things need some clarification.
- What should the specified string encoding be that we currently use for message names and so on?
- Is it allowed for an AZ_HANDSHAKE to be submitted again? I think the current implementation allows it, but it may not behave exactly as expected, especially if you are indicating you are dropping support for a particular message type.
- Do the internal Azureus messages use 2 in the version number to indicate they can be padded? I'm not sure I understand this - surely the ability to pad messages should be something which is protocol specific, rather than message specific.