|Subject:||Max Requests per host/IP patch|
Moved from #69514. This implements the following: * Implemented a new "max_requests" parameter, defaulted at 3, which controls the rate that an individual host is processing requests at the same time. This required the following changes: * A new $event array variable called _HOSTNAME (detailed in another patch; this patch assumes it's already in place) * Changes to register/deregister to keep track of hostnames in a very similar fashion to the descriptor object. * Code at the top of _event_handle to skip over events that are past the max request limit until another request has finished * An auto-change of the max_requests to 1 if the host times out once, since it obviously cannot handle the requests it already had. * Changing the name of the unused "return_response_pdu" to "send_pdu_priority", and using that throughout SNMP.pm for _existing_ requests. This way, existing requests for, say, get_tables are sent immediately through the pipe and only the receive timers get put into the event list. This makes existing requests immune to the max_request limits (and post-select lag), and ensures that the host is not waiting too long for our reply for more information. * A new parameter (plus help) in both SNMP.pm and Net::SNMP::Transport This patch, along with the receive buffer patch, fixes both ends of the large request problem. The receive buffer patch fixes the one-to-many IPs problem. In other words, if a single client (us via Net::SNMP) is sending many requests to different hosts, it can be assumed that those hosts are going to collectively process those requests and send it out faster than the one client can process all of the return packets. It's like a 75-core server processing everything and sending it back to the single-core client, which was, before the patch, continuing to overload itself. This patch fixes the one-to-many _requests_ problem. In other words, a single client is sending a host many different requests, and forcing the host to process all of them at the same time. To the client, sending a request for 20 large tables is easy. Actually getting the data is a lot harder. Depending on how smart or dumb the host's SNMP software is, it may be trying to process all 20 requests at the same time. This results in timeouts, as it never gets to send any one packet in time. (In fact, I end up seeing late packets that get rejected because the msgID has already been thrown away.) The retries don't work at all, because all 20 requests time out at the same time, and the code just sends the same 20 requests with the same time frame. Rinse and repeat until the retry limit is reached, and you end up with an angry server and no data to show for it. This is a problem even if you were just sending to a single host, so it's not just for large multi-host requests. So, this patch keeps it at a reasonable 3 requests per host. Existing requests still get processed as normal, but new ones are curbed until one of the other requests have finished. Yes, 3 is somewhat of an arbitrary limit, but: 1. It's reasonable to assume that most hosts probably can't (and shouldn't) handle more than 3 table pull requests at a time. 2. It's adjustable per host by the user. 3. It has the potential to be replaced with an auto-threshold that adjusts this limit according to the response rate of the host, thus eliminating the arbitrary number.
Message body is not shown because it is too large.