fix(esp_http_server): Dispatch PONG frames to WebSocket handler

PONG frames (opcode 0xA) were never dispatched to the user's WebSocket
handler despite an existing comment stating they should be. The dispatch
condition `ra->ws_type < HTTPD_WS_TYPE_CLOSE` excluded PONG (0xA)
since CLOSE is 0x8.

This caused a critical secondary bug: when the server sends PING frames
and the client responds with PONG, httpd_ws_recv_frame() is never
called for the PONG, leaving the remaining frame bytes (second_byte
plus 4-byte mask_key) unconsumed in the TCP buffer. On the next
WebSocket read, these orphaned bytes are misinterpreted as a new frame
header, causing either "WS frame is not properly masked" errors or
EAGAIN timeouts with garbage length values, effectively destroying
the connection.

Add `ra->ws_type == HTTPD_WS_TYPE_PONG` to the dispatch condition so
PONG frames reach the user handler, which calls httpd_ws_recv_frame()
to properly consume the frame bytes from the socket.

Closes https://github.com/espressif/esp-idf/issues/18227
This commit is contained in:
Peter Backeris
2026-03-05 01:53:11 -05:00
committed by Ashish Sharma
parent 4193d214e3
commit 3d0e26170d
+9 -2
View File
@@ -774,9 +774,16 @@ esp_err_t httpd_req_new(struct httpd_data *hd, struct sock_db *sd)
ESP_LOGD(TAG, LOG_FMT("Received PONG frame"));
}
/* Call handler if it's a non-control frame (or if handler requests control frames, as well) */
/* Call handler if it's a non-control frame, a PONG frame,
* or if handler requests control frames as well.
* PONG must be dispatched so that:
* 1. User code that sends PINGs can track responses (heartbeat)
* 2. The PONG frame bytes are consumed from the socket via
* httpd_ws_recv_frame(), preventing TCP stream misalignment */
if (ret == ESP_OK &&
(ra->ws_type < HTTPD_WS_TYPE_CLOSE || sd->ws_control_frames)) {
(ra->ws_type < HTTPD_WS_TYPE_CLOSE ||
ra->ws_type == HTTPD_WS_TYPE_PONG ||
sd->ws_control_frames)) {
ret = sd->ws_handler(r);
}