INTERNALS INTRO The Cutlass internal architecture doc, the code logic, the guts. This doc is intended to be used to document what the internals of libcutlass look like, for the purpose of reminding the original authors if they forget what the internals look like, for facilitating communication between said authors, and to get outsiders who may wish to hack up to speed quickly. Reading this before hacking should help greatly. If you are programming a client for cutlass, rather than working on the underlying library functionality, please read api.txt instead. (Although it won't hurt to read this, it is not necessary, and the code details here should be hidden from you). LOCK RULES The following rules apply only if you are hacking on libcutlass directly. If you're hacking a client for Cutlass, please do not be locking ANY locks directly. Use the API. Handle or hashtable locks must be locked BEFORE connection locks. If you have a connection lock, you must drop it before locking the handle. There are 4 mutexes contained within the cutlass_t handle. These are: handle_mutex - controls handle settings htab_mutex - controls the connection hash table entries del_stack_mutex - Controls access to the stack of connections "To be deleted" fd_set_mutex - Controls access to the set of select()ed across file descriptors As a matter of fact, if you have any locks at all, you must drop them before locking handle or hashtable locks. Priority goes: Handle > Hashtable > Connection > FD_SET > del_stack You make "lock down" this chain, but you should never attempt to "lock up," i.e., obtain a connection lock, and then attempt to obtain a hashtable lock. Deadlocks may result. DELETION RULES If you are deleting a connection, you must pass it to the housekeeping thread for deletion. Call cutlass_connection_dispose(). Do NOT simply call free()! PROCESS FLOW Cutlass currently is architected around a three-thread model. These threads separate after some initialization logic, and sub-threads may spawn, but at any given time, three threads will likely be blocking. One thread is primarily responsible for user input, and waits for user input events. This is the "main thread," and is the central execution flow. This thread is the only thread that can terminate the program. This is the thread that API users will write. One thread is a housekeeping thread, and will periodically sleep, then iterate through all connections, sending outstanding traffic as needed, then sleep again. The last thread is the listening thread, and it spends all of its idle time in a select() call, listening on all the sockets we have open. Subthreads may spawn off in the case of action handlers, because action handlers may block on user input, and we don't want to block our listening thread on that input. All three threads share a central data structure, called a cutlass_t data structure. This handle stores the hash table of all connections, information about this instance of cutlass, the set of file descriptors, and a series of message stacks, that the three threads can leave messages for each other on. The cutlass_t also contains mutexes to protect the message queues from concurrent thread access. Each thread "owns" a certain area of the central structure. The housekeeping thread, for example, owns the hash table of connections, and only it is responsible for adding and deleting connections to this table. Other threads may initialize connection structures, and hand them to the housekeeping thread to add to the hash table, but the other threads should not edit the hash table directly. The reason for this message passing is to prevent deadlock conditions. I originally wrote cutlass to have each thread lock on the hashtable, and lock on each connection, and a lock on the file descriptor sets, but I kept running into deadlock conditions. Specifically, adding to the file descriptor set while the listener was going to be stuck in select() most of the time was a real bear. So now, the listener is responsible for adding a file descriptor to the select() list, and the other threads simple stack up the file descriptors for it. Keeps lock contention minimal. Cutlass uses the following hash tables: Connections - Keyed off of remote public key, and also potentially keyed off of IP address/port/protocol group if we have a direct connection. Keys of the form :::: All values in network byte order. Each connection value is a structure pointer, with the structure containing the following: - A linked list of group structures that the remote client belongs to. - The crypto key material structure, containing both public and session keys - The UTF-8 "Name" associated with the remote client. - A linked list of file structures containing info about in-transit files. We will need to hold some state about what fragments we're currently missing. - A counter for the number of in-transit files on this connection. (Not to exceed some sane value for number of simultaneous file transfers). - A linked list of message structures containing info about message fragments (for long messages). - A counter for the number of fragmented messages currently in transit, not to exceed some sane value. - A permissions/capabilities structure. - A struct timeval with last activity on this connection - A struct timeval with last ping sent on this connection. - The socket we're listening on, if a directly connected connection. - The remote struct sockaddr_in and socklen_t, if we did not initiate the connection. - Current routing info for non-directly connected connections (This is fuzzy, needs detail). Groups - a hashtable of all groups our client is currently part of. Keys are the groupID, a (128 bit?) value. Group values are structure pointers, with the structure containing the following: - A linked list of connections belonging to the group. - the UTF-8 "Name" of the group. ----------------------------------------------------------------------- The following are structures containing the layout of needed linked list nodes (all nodes have a *next pointer). Linked lists should be OK, because each connection can only have a finite, and small, number of any of these at any given time: Files - We will require a structure with information for each file currently in transit. The entry will contain: - The file descriptor for the local file - The SHA-1 hash for the expected file - The total expected file length - The list of gaps currently in the file. This will likely be a sorted red-black tree containing start and end gaps eventually, but right now is a doubly linked list. Message fragments - Messages are limited to some sane value in length, any longer and you need to send a file. 64K, perhaps? ------------------------------------------------------------------------ Structures we need: - Connection structure - Group structure - File structure - Fragmentary message structure - Cutlass packet message structure(s) - Capabilities/permissions structure ----------------------------------------------------------------------- Process flow: Code execution can get kicked off by one of three things. These things are: - User event (button click, keypress, sound input). - Network event (socket read) - Housekeeping thread (periodic timed thread). We're multithreaded, gotta worry about mutexen. Threads each have ownership over different data structures. The housekeeping thread owns the connection hash table, the network event thread owns the set of open file descriptors. In order to keep each other in synch, the Housekeeping will do the following: - Check all connections at regular intervals, if it's been quiescent for > and last ping sent was > send a ping. If it's been quiescent since > , initiate connection shutdown/cleanup. - Check all connection in-transit message frags and files in transit to ensure they're not stuck. Clean up as appropriate. - Sleep for some sane value. Network event thread will select() across the plethora of open sockets we have, and when that select() comes back, it'll read the packet into a buffer. Once we do that, retrieve the corresponding connection structure, overlay a Cutlass packet structure on it, and first thing, check the MAC. If the MAC is good, decrypt, and then check the packet type. Take action based on the packet type (Each one of these should be a separate subroutine, possibly calling architecture-specific modules): - Text packet - If complete message, display and ACK. Otherwise, place into fragmented message list for connection and ACK. - Audio packet - Play and ACK, or if out of sequence, discard and ACK. We may want to only ACK if there's been a sane interval since the last ACK, letting the remote connection know how successful we're being at receiving their audio, i.e. so many lost packets out of so many seq. - File transfer packet - Write and update file structure, and ACK. We may want to wait on the ACKs to so many per timeframe, so we're not overly chatty. If that's the case, housekeeping may need to ACK for us occasionally. - Ping packet. PONG. - Pong packet. Update activity counter in connection structure. (Actually, all of these packets should update activity counters). - ACK packet - Update in-transit values. We may need to make additional changes, such as switching to a lower bandwidth codec if the loss rate is too high, cutting back the "window size", cutting back the send rate, or switching relays. If the "window" is clear, we may want to send more data on a file transfer. - Request to Forward - Either forward or don't, depending on permissions. ACK failure to forward. User input can trigger a whole slew of things. As a matter of fact, I'm going to punt on this until I write the "user experience" doc. ------------------------------------------------------------------------ How connections are born, and how we keep track of them: Unfortunately, UDP doesn't have quite the simple socket structure that TCP has. You can bind() to a port, but there is no equivalent of accept() that allows you to "lock in" a localaddr:localport/remoteaddr:remoteport pairing to a single file descriptor. So, in order to keep process flows sane, we have two options: 1: You can connect() to a remote UDP port. This is great, as you have a 1->1 mapping with sockets and connections, and the remote side port is known. You don't know your own ephemeral port, but you don't have to, as return traffic automatically comes to you. In addition, ICMP_PORT_UNREACHABLE error messages get returned to you. 2: You can bind() to a local port, and when inbound traffic comes in, recvfrom() the traffic and SAVE THE REMOTE SOCKADDR_IN! After you've processed the proper response, sendto() the saved sockaddr_in. This means there is only one socket for many connections, and in addition, our process can't get errors returned via ICMP. Luckily, the one socket->many connection mappings isn't as much of a problem as it might be, as our hashtable structure is exactly what we need in order to cleanly resolve addressing. The ICMP problem is a bit more troubling. So, I propose that every time we get an inbound connection, we try to back-connect() to get an outbound socket to the remote host. This enables us to know more quickly if a host has rebooted or whatnot on us. This won't work through NAT, however. In that case, we're going to just have to issue response packets back through the one great all-listening socket, with the sendto() remote address structure filled in from the hashtable. We're also going to need to more aggressively keep track of retransmission states through some kind of loop. We can't afford to block on key exchange with one guy forever, and the packet that comes back to us through the one great socket might not even be from that guy, it may be someone else. So unfortunately we can't just fork off a thread for each key-echange, and have it set a variable when it's done. It's parallel processing at its bloodiest! Next function up: looping through all existing connections.