Tuesday, December 24, 2013

Xen VNC and stale tcp connections

The problem: in case vnc client terminates and doesn't send packet with FIN/RST flag (for instance, VNC connection was tunneled through VPN and tunnel was closed), connection on server side remains in ESTABLISHED state and one cannot connect to this VM via VNC again, see vnc stops working after a while.
The solution of this simple problem is a bit complicated. First of all tcp keepalive on vnc server socket should be turned on, in other words one must patch qemu-dm that implements VNC in Xen (Xen Opensource 3.4.2).
diff -u -x '*.o' -x '*.o.d' xen-3.4.2/tools/ioemu-qemu-xen/osdep.c xen-3.4.2_fix/tools/ioemu-qemu-xen/osdep.c
--- xen-3.4.2/tools/ioemu-qemu-xen/osdep.c      2009-11-05 11:44:56.000000000 +0000
+++ xen-3.4.2_fix/tools/ioemu-qemu-xen/osdep.c  2011-12-28 13:43:29.938649747 +0000
@@ -338,4 +338,11 @@
     f = fcntl(fd, F_GETFL);
     fcntl(fd, F_SETFL, f | O_NONBLOCK);
 }
+
+int socket_set_keepalive(int fd) {
+    int optval;
+    
+    optval = 1;
+    return setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval));
+}
 #endif
diff -u -x '*.o' -x '*.o.d' xen-3.4.2/tools/ioemu-qemu-xen/qemu_socket.h xen-3.4.2_fix/tools/ioemu-qemu-xen/qemu_socket.h
--- xen-3.4.2/tools/ioemu-qemu-xen/qemu_socket.h        2009-11-05 11:44:56.000000000 +0000
+++ xen-3.4.2_fix/tools/ioemu-qemu-xen/qemu_socket.h    2011-12-28 13:43:32.788255661 +0000
@@ -41,6 +41,7 @@
 
 /* misc helpers */
 void socket_set_nonblock(int fd);
+int socket_set_keepalive(int fd);
 int send_all(int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
diff -u -x '*.o' -x '*.o.d' xen-3.4.2/tools/ioemu-qemu-xen/vnc.c xen-3.4.2_fix/tools/ioemu-qemu-xen/vnc.c
--- xen-3.4.2/tools/ioemu-qemu-xen/vnc.c        2009-11-05 11:44:56.000000000 +0000
+++ xen-3.4.2_fix/tools/ioemu-qemu-xen/vnc.c    2011-12-28 13:44:59.283974757 +0000
@@ -2389,6 +2389,8 @@
        VNC_DEBUG("New client on socket %d\n", vs->csock);
        dcl->idle = 0;
         socket_set_nonblock(vs->csock);
+    if (socket_set_keepalive(vs->csock) == -1)
+        VNC_DEBUG("Cannot set KEEPALIVE on socket %d\n", vs->csock);
        qemu_set_fd_handler2(vs->csock, NULL, vnc_client_read, NULL, opaque);
        vnc_write(vs, "RFB 003.008\n", 12);
        vnc_flush(vs);
Recompile xen tools (make tools, see xen README) and replace qemu-dm executable with dist/install/usr/lib64/xen/bin/qemu-dm.
Configure linux kernel
HDC11:~# sysctl -a | grep ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_intvl = 10
This means in case there are no packets during 30 secs send empty packet, if no ACK received send 5 packets every 10 secs, if still no answer send RST and close connection.
Test
HDC11:~# netstat -nap | grep 5901
tcp 0 0 0.0.0.0:5901 0.0.0.0:* LISTEN 28905/qemu-dm
HDC11:~# netstat -nap | grep 5901
tcp 0 0 0.0.0.0:5901 0.0.0.0:* LISTEN 28905/qemu-dm
tcp 0 0 10.10.17.11:5901 10.10.17.216:1334 ESTABLISHED 28905/qemu-dm <---- VNC connection via VPN
HDC11:~# tcpdump -n -i eth2 port 1334
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes
07:42:15.119612 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1601431976 win 5840
07:42:15.316704 IP 10.10.17.216.1334 > 10.10.17.11.5901: . ack 1 win 64228
07:42:45.319980 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1 win 5840 <--- acks every 30 secs
07:42:45.701610 IP 10.10.17.216.1334 > 10.10.17.11.5901: . ack 1 win 64228 <--- response to ack
07:43:15.700307 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1 win 5840 <--- kill vpn, no response anymore
07:43:25.700398 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1 win 5840
07:43:35.700586 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1 win 5840
07:43:45.700639 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1 win 5840
07:43:55.700765 IP 10.10.17.11.5901 > 10.10.17.216.1334: . ack 1 win 5840 <--- 5 probes every 10 secs
07:44:05.700913 IP 10.10.17.11.5901 > 10.10.17.216.1334: R 1:1(0) ack 1 win 5840 <--- still no response, RST
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
HDC11:~# netstat -nap | grep 5901
tcp 0 0 0.0.0.0:5901 0.0.0.0:* LISTEN 28905/qemu-dm
<--- no ESTABLISHED connections
HDC11:~# telnet localhost 5901 <--- we can connect to this port again
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
RFB 003.008 <--- server responds
^]
telnet> quit
But we need to go deeper. Tcp keepalive works only when there are no packets on the wire. Imagine VNC server sent some packet and connection terminated in silent way (i.e. it didn't receive ACK in his packet). In this case linux will try to resend packet until RTO.
07:57:36.123013 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1557:1598(41) ack 377 win 5840
07:57:36.317922 IP 10.10.17.216.1421 > 10.10.17.11.5901: P 377:387(10) ack 1598 win 63353 <--- here we lost connection
07:57:36.360862 IP 10.10.17.11.5901 > 10.10.17.216.1421: . ack 387 win 5840 07:58:06.317741 IP 10.10.17.11.5901 > 10.10.17.216.1421: . ack 387 win 5840
07:58:16.321445 IP 10.10.17.11.5901 > 10.10.17.216.1421: . ack 387 win 5840
07:58:16.932260 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840 <--- vnc server (qemu-dm) tries to send some data
07:58:17.621451 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840 <--- retransmit algorithm starts with dynamic growing intervals
07:58:19.001467 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
07:58:21.761500 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
07:58:27.281590 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
07:58:38.321648 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
07:59:00.401919 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
07:59:44.562552 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:01:12.883584 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:03:12.885069 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:05:12.886487 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:07:12.887861 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:09:12.889364 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:11:12.890812 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:13:12.892282 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
08:15:12.893702 IP 10.10.17.11.5901 > 10.10.17.216.1421: P 1598:1617(19) ack 387 win 5840
<--- ok, now reno gives up, connection is closed after ~15 mins
HDC11:~# netstat -nap | grep 5901
tcp 0 0 0.0.0.0:5901 0.0.0.0:* LISTEN 28905/qemu-dm
15 mins without ability to connect to VM is too long for client. Configure kernel again - decrease /proc/sys/net/ipv4/tcp_retries2 (net.ipv4.tcp_retries2) from 15 to 5 tries.
09:05:45.286580 IP 10.10.17.216.1312 > 10.10.17.11.5901: P 84:94(10) ack 3649 win 63679
09:05:45.320669 IP 10.10.17.11.5901 > 10.10.17.216.1312: . ack 94 win 5840
09:06:15.281030 IP 10.10.17.11.5901 > 10.10.17.216.1312: . ack 94 win 5840
09:06:25.281213 IP 10.10.17.11.5901 > 10.10.17.216.1312: . ack 94 win 5840
09:06:25.961866 IP 10.10.17.11.5901 > 10.10.17.216.1312: P 3649:3668(19) ack 94 win 5840
09:06:26.661261 IP 10.10.17.11.5901 > 10.10.17.216.1312: P 3649:3668(19) ack 94 win 5840
09:06:28.061265 IP 10.10.17.11.5901 > 10.10.17.216.1312: P 3649:3668(19) ack 94 win 5840
09:06:30.861301 IP 10.10.17.11.5901 > 10.10.17.216.1312: P 3649:3668(19) ack 94 win 5840
09:06:36.461346 IP 10.10.17.11.5901 > 10.10.17.216.1312: P 3649:3668(19) ack 94 win 5840
09:06:47.661424 IP 10.10.17.11.5901 > 10.10.17.216.1312: P 3649:3668(19) ack 94 win 5840
~60-90 secs
As a result one can reestablish connection to VNC in 1-2 mins.

Note, one can kill stale connection with the help of netfilter by setting conntrack net.netfilter.nf_conntrack_tcp_timeout_established to 1-2 hours (default is 5 days). After this time elapses iptables will send RST in both directions. But one still need to set proper net.ipv4.tcp_retries2.

No comments:

Post a Comment