123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663 |
- ======================
- RxRPC NETWORK PROTOCOL
- ======================
- The RxRPC protocol driver provides a reliable two-phase transport on top of UDP
- that can be used to perform RxRPC remote operations. This is done over sockets
- of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and
- receive data, aborts and errors.
- Contents of this document:
- (*) Overview.
- (*) RxRPC protocol summary.
- (*) AF_RXRPC driver model.
- (*) Control messages.
- (*) Socket options.
- (*) Security.
- (*) Example client usage.
- (*) Example server usage.
- ========
- OVERVIEW
- ========
- RxRPC is a two-layer protocol. There is a session layer which provides
- reliable virtual connections using UDP over IPv4 (or IPv6) as the transport
- layer, but implements a real network protocol; and there's the presentation
- layer which renders structured data to binary blobs and back again using XDR
- (as does SunRPC):
- +-------------+
- | Application |
- +-------------+
- | XDR | Presentation
- +-------------+
- | RxRPC | Session
- +-------------+
- | UDP | Transport
- +-------------+
- AF_RXRPC provides:
- (1) Part of an RxRPC facility for both kernel and userspace applications by
- making the session part of it a Linux network protocol (AF_RXRPC).
- (2) A two-phase protocol. The client transmits a blob (the request) and then
- receives a blob (the reply), and the server receives the request and then
- transmits the reply.
- (3) Retention of the reusable bits of the transport system set up for one call
- to speed up subsequent calls.
- (4) A secure protocol, using the Linux kernel's key retention facility to
- manage security on the client end. The server end must of necessity be
- more active in security negotiations.
- AF_RXRPC does not provide XDR marshalling/presentation facilities. That is
- left to the application. AF_RXRPC only deals in blobs. Even the operation ID
- is just the first four bytes of the request blob, and as such is beyond the
- kernel's interest.
- Sockets of AF_RXRPC family are:
- (1) created as type SOCK_DGRAM;
- (2) provided with a protocol of the type of underlying transport they're going
- to use - currently only PF_INET is supported.
- The Andrew File System (AFS) is an example of an application that uses this and
- that has both kernel (filesystem) and userspace (utility) components.
- ======================
- RXRPC PROTOCOL SUMMARY
- ======================
- An overview of the RxRPC protocol:
- (*) RxRPC sits on top of another networking protocol (UDP is the only option
- currently), and uses this to provide network transport. UDP ports, for
- example, provide transport endpoints.
- (*) RxRPC supports multiple virtual "connections" from any given transport
- endpoint, thus allowing the endpoints to be shared, even to the same
- remote endpoint.
- (*) Each connection goes to a particular "service". A connection may not go
- to multiple services. A service may be considered the RxRPC equivalent of
- a port number. AF_RXRPC permits multiple services to share an endpoint.
- (*) Client-originating packets are marked, thus a transport endpoint can be
- shared between client and server connections (connections have a
- direction).
- (*) Up to a billion connections may be supported concurrently between one
- local transport endpoint and one service on one remote endpoint. An RxRPC
- connection is described by seven numbers:
- Local address }
- Local port } Transport (UDP) address
- Remote address }
- Remote port }
- Direction
- Connection ID
- Service ID
- (*) Each RxRPC operation is a "call". A connection may make up to four
- billion calls, but only up to four calls may be in progress on a
- connection at any one time.
- (*) Calls are two-phase and asymmetric: the client sends its request data,
- which the service receives; then the service transmits the reply data
- which the client receives.
- (*) The data blobs are of indefinite size, the end of a phase is marked with a
- flag in the packet. The number of packets of data making up one blob may
- not exceed 4 billion, however, as this would cause the sequence number to
- wrap.
- (*) The first four bytes of the request data are the service operation ID.
- (*) Security is negotiated on a per-connection basis. The connection is
- initiated by the first data packet on it arriving. If security is
- requested, the server then issues a "challenge" and then the client
- replies with a "response". If the response is successful, the security is
- set for the lifetime of that connection, and all subsequent calls made
- upon it use that same security. In the event that the server lets a
- connection lapse before the client, the security will be renegotiated if
- the client uses the connection again.
- (*) Calls use ACK packets to handle reliability. Data packets are also
- explicitly sequenced per call.
- (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs.
- A hard-ACK indicates to the far side that all the data received to a point
- has been received and processed; a soft-ACK indicates that the data has
- been received but may yet be discarded and re-requested. The sender may
- not discard any transmittable packets until they've been hard-ACK'd.
- (*) Reception of a reply data packet implicitly hard-ACK's all the data
- packets that make up the request.
- (*) An call is complete when the request has been sent, the reply has been
- received and the final hard-ACK on the last packet of the reply has
- reached the server.
- (*) An call may be aborted by either end at any time up to its completion.
- =====================
- AF_RXRPC DRIVER MODEL
- =====================
- About the AF_RXRPC driver:
- (*) The AF_RXRPC protocol transparently uses internal sockets of the transport
- protocol to represent transport endpoints.
- (*) AF_RXRPC sockets map onto RxRPC connection bundles. Actual RxRPC
- connections are handled transparently. One client socket may be used to
- make multiple simultaneous calls to the same service. One server socket
- may handle calls from many clients.
- (*) Additional parallel client connections will be initiated to support extra
- concurrent calls, up to a tunable limit.
- (*) Each connection is retained for a certain amount of time [tunable] after
- the last call currently using it has completed in case a new call is made
- that could reuse it.
- (*) Each internal UDP socket is retained [tunable] for a certain amount of
- time [tunable] after the last connection using it discarded, in case a new
- connection is made that could use it.
- (*) A client-side connection is only shared between calls if they have have
- the same key struct describing their security (and assuming the calls
- would otherwise share the connection). Non-secured calls would also be
- able to share connections with each other.
- (*) A server-side connection is shared if the client says it is.
- (*) ACK'ing is handled by the protocol driver automatically, including ping
- replying.
- (*) SO_KEEPALIVE automatically pings the other side to keep the connection
- alive [TODO].
- (*) If an ICMP error is received, all calls affected by that error will be
- aborted with an appropriate network error passed through recvmsg().
- Interaction with the user of the RxRPC socket:
- (*) A socket is made into a server socket by binding an address with a
- non-zero service ID.
- (*) In the client, sending a request is achieved with one or more sendmsgs,
- followed by the reply being received with one or more recvmsgs.
- (*) The first sendmsg for a request to be sent from a client contains a tag to
- be used in all other sendmsgs or recvmsgs associated with that call. The
- tag is carried in the control data.
- (*) connect() is used to supply a default destination address for a client
- socket. This may be overridden by supplying an alternate address to the
- first sendmsg() of a call (struct msghdr::msg_name).
- (*) If connect() is called on an unbound client, a random local port will
- bound before the operation takes place.
- (*) A server socket may also be used to make client calls. To do this, the
- first sendmsg() of the call must specify the target address. The server's
- transport endpoint is used to send the packets.
- (*) Once the application has received the last message associated with a call,
- the tag is guaranteed not to be seen again, and so it can be used to pin
- client resources. A new call can then be initiated with the same tag
- without fear of interference.
- (*) In the server, a request is received with one or more recvmsgs, then the
- the reply is transmitted with one or more sendmsgs, and then the final ACK
- is received with a last recvmsg.
- (*) When sending data for a call, sendmsg is given MSG_MORE if there's more
- data to come on that call.
- (*) When receiving data for a call, recvmsg flags MSG_MORE if there's more
- data to come for that call.
- (*) When receiving data or messages for a call, MSG_EOR is flagged by recvmsg
- to indicate the terminal message for that call.
- (*) A call may be aborted by adding an abort control message to the control
- data. Issuing an abort terminates the kernel's use of that call's tag.
- Any messages waiting in the receive queue for that call will be discarded.
- (*) Aborts, busy notifications and challenge packets are delivered by recvmsg,
- and control data messages will be set to indicate the context. Receiving
- an abort or a busy message terminates the kernel's use of that call's tag.
- (*) The control data part of the msghdr struct is used for a number of things:
- (*) The tag of the intended or affected call.
- (*) Sending or receiving errors, aborts and busy notifications.
- (*) Notifications of incoming calls.
- (*) Sending debug requests and receiving debug replies [TODO].
- (*) When the kernel has received and set up an incoming call, it sends a
- message to server application to let it know there's a new call awaiting
- its acceptance [recvmsg reports a special control message]. The server
- application then uses sendmsg to assign a tag to the new call. Once that
- is done, the first part of the request data will be delivered by recvmsg.
- (*) The server application has to provide the server socket with a keyring of
- secret keys corresponding to the security types it permits. When a secure
- connection is being set up, the kernel looks up the appropriate secret key
- in the keyring and then sends a challenge packet to the client and
- receives a response packet. The kernel then checks the authorisation of
- the packet and either aborts the connection or sets up the security.
- (*) The name of the key a client will use to secure its communications is
- nominated by a socket option.
- Notes on recvmsg:
- (*) If there's a sequence of data messages belonging to a particular call on
- the receive queue, then recvmsg will keep working through them until:
- (a) it meets the end of that call's received data,
- (b) it meets a non-data message,
- (c) it meets a message belonging to a different call, or
- (d) it fills the user buffer.
- If recvmsg is called in blocking mode, it will keep sleeping, awaiting the
- reception of further data, until one of the above four conditions is met.
- (2) MSG_PEEK operates similarly, but will return immediately if it has put any
- data in the buffer rather than sleeping until it can fill the buffer.
- (3) If a data message is only partially consumed in filling a user buffer,
- then the remainder of that message will be left on the front of the queue
- for the next taker. MSG_TRUNC will never be flagged.
- (4) If there is more data to be had on a call (it hasn't copied the last byte
- of the last data message in that phase yet), then MSG_MORE will be
- flagged.
- ================
- CONTROL MESSAGES
- ================
- AF_RXRPC makes use of control messages in sendmsg() and recvmsg() to multiplex
- calls, to invoke certain actions and to report certain conditions. These are:
- MESSAGE ID SRT DATA MEANING
- ======================= === =========== ===============================
- RXRPC_USER_CALL_ID sr- User ID App's call specifier
- RXRPC_ABORT srt Abort code Abort code to issue/received
- RXRPC_ACK -rt n/a Final ACK received
- RXRPC_NET_ERROR -rt error num Network error on call
- RXRPC_BUSY -rt n/a Call rejected (server busy)
- RXRPC_LOCAL_ERROR -rt error num Local error encountered
- RXRPC_NEW_CALL -r- n/a New call received
- RXRPC_ACCEPT s-- n/a Accept new call
- (SRT = usable in Sendmsg / delivered by Recvmsg / Terminal message)
- (*) RXRPC_USER_CALL_ID
- This is used to indicate the application's call ID. It's an unsigned long
- that the app specifies in the client by attaching it to the first data
- message or in the server by passing it in association with an RXRPC_ACCEPT
- message. recvmsg() passes it in conjunction with all messages except
- those of the RXRPC_NEW_CALL message.
- (*) RXRPC_ABORT
- This is can be used by an application to abort a call by passing it to
- sendmsg, or it can be delivered by recvmsg to indicate a remote abort was
- received. Either way, it must be associated with an RXRPC_USER_CALL_ID to
- specify the call affected. If an abort is being sent, then error EBADSLT
- will be returned if there is no call with that user ID.
- (*) RXRPC_ACK
- This is delivered to a server application to indicate that the final ACK
- of a call was received from the client. It will be associated with an
- RXRPC_USER_CALL_ID to indicate the call that's now complete.
- (*) RXRPC_NET_ERROR
- This is delivered to an application to indicate that an ICMP error message
- was encountered in the process of trying to talk to the peer. An
- errno-class integer value will be included in the control message data
- indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
- affected.
- (*) RXRPC_BUSY
- This is delivered to a client application to indicate that a call was
- rejected by the server due to the server being busy. It will be
- associated with an RXRPC_USER_CALL_ID to indicate the rejected call.
- (*) RXRPC_LOCAL_ERROR
- This is delivered to an application to indicate that a local error was
- encountered and that a call has been aborted because of it. An
- errno-class integer value will be included in the control message data
- indicating the problem, and an RXRPC_USER_CALL_ID will indicate the call
- affected.
- (*) RXRPC_NEW_CALL
- This is delivered to indicate to a server application that a new call has
- arrived and is awaiting acceptance. No user ID is associated with this,
- as a user ID must subsequently be assigned by doing an RXRPC_ACCEPT.
- (*) RXRPC_ACCEPT
- This is used by a server application to attempt to accept a call and
- assign it a user ID. It should be associated with an RXRPC_USER_CALL_ID
- to indicate the user ID to be assigned. If there is no call to be
- accepted (it may have timed out, been aborted, etc.), then sendmsg will
- return error ENODATA. If the user ID is already in use by another call,
- then error EBADSLT will be returned.
- ==============
- SOCKET OPTIONS
- ==============
- AF_RXRPC sockets support a few socket options at the SOL_RXRPC level:
- (*) RXRPC_SECURITY_KEY
- This is used to specify the description of the key to be used. The key is
- extracted from the calling process's keyrings with request_key() and
- should be of "rxrpc" type.
- The optval pointer points to the description string, and optlen indicates
- how long the string is, without the NUL terminator.
- (*) RXRPC_SECURITY_KEYRING
- Similar to above but specifies a keyring of server secret keys to use (key
- type "keyring"). See the "Security" section.
- (*) RXRPC_EXCLUSIVE_CONNECTION
- This is used to request that new connections should be used for each call
- made subsequently on this socket. optval should be NULL and optlen 0.
- (*) RXRPC_MIN_SECURITY_LEVEL
- This is used to specify the minimum security level required for calls on
- this socket. optval must point to an int containing one of the following
- values:
- (a) RXRPC_SECURITY_PLAIN
- Encrypted checksum only.
- (b) RXRPC_SECURITY_AUTH
- Encrypted checksum plus packet padded and first eight bytes of packet
- encrypted - which includes the actual packet length.
- (c) RXRPC_SECURITY_ENCRYPTED
- Encrypted checksum plus entire packet padded and encrypted, including
- actual packet length.
- ========
- SECURITY
- ========
- Currently, only the kerberos 4 equivalent protocol has been implemented
- (security index 2 - rxkad). This requires the rxkad module to be loaded and,
- on the client, tickets of the appropriate type to be obtained from the AFS
- kaserver or the kerberos server and installed as "rxrpc" type keys. This is
- normally done using the klog program. An example simple klog program can be
- found at:
- http://people.redhat.com/~dhowells/rxrpc/klog.c
- The payload provided to add_key() on the client should be of the following
- form:
- struct rxrpc_key_sec2_v1 {
- uint16_t security_index; /* 2 */
- uint16_t ticket_length; /* length of ticket[] */
- uint32_t expiry; /* time at which expires */
- uint8_t kvno; /* key version number */
- uint8_t __pad[3];
- uint8_t session_key[8]; /* DES session key */
- uint8_t ticket[0]; /* the encrypted ticket */
- };
- Where the ticket blob is just appended to the above structure.
- For the server, keys of type "rxrpc_s" must be made available to the server.
- They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an
- rxkad key for the AFS VL service). When such a key is created, it should be
- given the server's secret key as the instantiation data (see the example
- below).
- add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
- A keyring is passed to the server socket by naming it in a sockopt. The server
- socket then looks the server secret keys up in this keyring when secure
- incoming connections are made. This can be seen in an example program that can
- be found at:
- http://people.redhat.com/~dhowells/rxrpc/listen.c
- ====================
- EXAMPLE CLIENT USAGE
- ====================
- A client would issue an operation by:
- (1) An RxRPC socket is set up by:
- client = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
- Where the third parameter indicates the protocol family of the transport
- socket used - usually IPv4 but it can also be IPv6 [TODO].
- (2) A local address can optionally be bound:
- struct sockaddr_rxrpc srx = {
- .srx_family = AF_RXRPC,
- .srx_service = 0, /* we're a client */
- .transport_type = SOCK_DGRAM, /* type of transport socket */
- .transport.sin_family = AF_INET,
- .transport.sin_port = htons(7000), /* AFS callback */
- .transport.sin_address = 0, /* all local interfaces */
- };
- bind(client, &srx, sizeof(srx));
- This specifies the local UDP port to be used. If not given, a random
- non-privileged port will be used. A UDP port may be shared between
- several unrelated RxRPC sockets. Security is handled on a basis of
- per-RxRPC virtual connection.
- (3) The security is set:
- const char *key = "AFS:cambridge.redhat.com";
- setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));
- This issues a request_key() to get the key representing the security
- context. The minimum security level can be set:
- unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
- setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
- &sec, sizeof(sec));
- (4) The server to be contacted can then be specified (alternatively this can
- be done through sendmsg):
- struct sockaddr_rxrpc srx = {
- .srx_family = AF_RXRPC,
- .srx_service = VL_SERVICE_ID,
- .transport_type = SOCK_DGRAM, /* type of transport socket */
- .transport.sin_family = AF_INET,
- .transport.sin_port = htons(7005), /* AFS volume manager */
- .transport.sin_address = ...,
- };
- connect(client, &srx, sizeof(srx));
- (5) The request data should then be posted to the server socket using a series
- of sendmsg() calls, each with the following control message attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- MSG_MORE should be set in msghdr::msg_flags on all but the last part of
- the request. Multiple requests may be made simultaneously.
- If a call is intended to go to a destination other then the default
- specified through connect(), then msghdr::msg_name should be set on the
- first request message of that call.
- (6) The reply data will then be posted to the server socket for recvmsg() to
- pick up. MSG_MORE will be flagged by recvmsg() if there's more reply data
- for a particular call to be read. MSG_EOR will be set on the terminal
- read for a call.
- All data will be delivered with the following control message attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- If an abort or error occurred, this will be returned in the control data
- buffer instead, and MSG_EOR will be flagged to indicate the end of that
- call.
- ====================
- EXAMPLE SERVER USAGE
- ====================
- A server would be set up to accept operations in the following manner:
- (1) An RxRPC socket is created by:
- server = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
- Where the third parameter indicates the address type of the transport
- socket used - usually IPv4.
- (2) Security is set up if desired by giving the socket a keyring with server
- secret keys in it:
- keyring = add_key("keyring", "AFSkeys", NULL, 0,
- KEY_SPEC_PROCESS_KEYRING);
- const char secret_key[8] = {
- 0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 };
- add_key("rxrpc_s", "52:2", secret_key, 8, keyring);
- setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7);
- The keyring can be manipulated after it has been given to the socket. This
- permits the server to add more keys, replace keys, etc. whilst it is live.
- (2) A local address must then be bound:
- struct sockaddr_rxrpc srx = {
- .srx_family = AF_RXRPC,
- .srx_service = VL_SERVICE_ID, /* RxRPC service ID */
- .transport_type = SOCK_DGRAM, /* type of transport socket */
- .transport.sin_family = AF_INET,
- .transport.sin_port = htons(7000), /* AFS callback */
- .transport.sin_address = 0, /* all local interfaces */
- };
- bind(server, &srx, sizeof(srx));
- (3) The server is then set to listen out for incoming calls:
- listen(server, 100);
- (4) The kernel notifies the server of pending incoming connections by sending
- it a message for each. This is received with recvmsg() on the server
- socket. It has no data, and has a single dataless control message
- attached:
- RXRPC_NEW_CALL
- The address that can be passed back by recvmsg() at this point should be
- ignored since the call for which the message was posted may have gone by
- the time it is accepted - in which case the first call still on the queue
- will be accepted.
- (5) The server then accepts the new call by issuing a sendmsg() with two
- pieces of control data and no actual data:
- RXRPC_ACCEPT - indicate connection acceptance
- RXRPC_USER_CALL_ID - specify user ID for this call
- (6) The first request data packet will then be posted to the server socket for
- recvmsg() to pick up. At that point, the RxRPC address for the call can
- be read from the address fields in the msghdr struct.
- Subsequent request data will be posted to the server socket for recvmsg()
- to collect as it arrives. All but the last piece of the request data will
- be delivered with MSG_MORE flagged.
- All data will be delivered with the following control message attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- (8) The reply data should then be posted to the server socket using a series
- of sendmsg() calls, each with the following control messages attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- MSG_MORE should be set in msghdr::msg_flags on all but the last message
- for a particular call.
- (9) The final ACK from the client will be posted for retrieval by recvmsg()
- when it is received. It will take the form of a dataless message with two
- control messages attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- RXRPC_ACK - indicates final ACK (no data)
- MSG_EOR will be flagged to indicate that this is the final message for
- this call.
- (10) Up to the point the final packet of reply data is sent, the call can be
- aborted by calling sendmsg() with a dataless message with the following
- control messages attached:
- RXRPC_USER_CALL_ID - specifies the user ID for this call
- RXRPC_ABORT - indicates abort code (4 byte data)
- Any packets waiting in the socket's receive queue will be discarded if
- this is issued.
- Note that all the communications for a particular service take place through
- the one server socket, using control messages on sendmsg() and recvmsg() to
- determine the call affected.
|