Buy it, use it, break it, fix it, plug it, play it, burn it, rip it, drag and drop it, zip- unzip it: Performance analysis of our own full blown HTTP server with Netty 4

In previous post Let's do our own full blown HTTP server with Netty 4 you and I were excited by creation of our own web server. So far so good. But how good?
Given ordinary notebook

cat /proc/cpuinfo | grep model\ name
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
cat /proc/meminfo | grep MemTotal
MemTotal:        3956836 kB

(Ok, there are only 4 cores with hyperthreading.)
We have following numbers

build/default/weighttp -n 1000000 -k -c 100 http://localhost:9999/
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
progress:  10% done
progress:  20% done
progress:  30% done
progress:  40% done
progress:  50% done
progress:  60% done
progress:  70% done
progress:  80% done
progress:  90% done
progress: 100% done

finished in 20 sec, 195 millisec and 236 microsec, 49516 req/s, 7251 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 149967800 bytes total, 145967800 bytes http, 4000000 bytes data

49516 req/s. Is this good? I'd say very good. We need some baseline to compare to. As such baseline we'll use nginx serving small static file. 'Why serving a file is faster than serving a piece of memory?' you ask. With sendfile turned on nginx after response header sent just instructs kernel to send file to socket, file after first hit is in linux buffer cache, thus kernel sends a page from memory to Ethernet card. Zero-copy. If you are about to send dynamically generated response from memory kernel needs to copy this piece of data from userspace to kernelspace. Thus example below should be fast.
Nginx config

worker_processes  4;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    keepalive_timeout  65;

    server {
        listen       6666;
        server_name  localhost;

        location / {
            root   html;
            index  index.html index.htm;
        }
    }
}

The same test

build/default/weighttp -n 1000000 -k -c 100 http://localhost:6666/
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 11 sec, 402 millisec and 913 microsec, 87696 req/s, 72705 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 848950295 bytes total, 236950295 bytes http, 612000000 bytes data

87696 req/s. Very well, thus we are not so far behind.
Lets draw other baseline. Python is considered slow. But it is not as slow as you think. Add to your nginx config

http {
    upstream test {
        server unix:///home/adolgarev/uwsgi.sock;
    }
    ...
    server {
        location /test {
            uwsgi_pass  test;
            include     uwsgi_params;
        }
        ...
    }
}

uwsgi protocol is far more effective than http thus we are using uwsgi_pass instead of proxy_pass.
Start uwsgi with small WSGI test program

cat test.py
def application(env, start_response):
    start_response('200 OK', [('Content-Type','text/html')])
    return ["Hello World"]
uwsgi --socket uwsgi.sock --wsgi-file test.py --master --processes 4 --threads 2

The same test

build/default/weighttp -n 100000 -k -c 100 http://localhost:6666/test
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 100000 total requests
...
progress: 100% done

finished in 9 sec, 264 millisec and 26 microsec, 10794 req/s, 1844 kbyte/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 17495280 bytes total, 15395280 bytes http, 2100000 bytes data

10794 req/s, I'd say good enough as for python.
Another thing to compare to is GlassFish Server Open Source Edition 4.0 with Servlets 3.1 powered by Grizzly - a competitor of Netty.
Small Servlet

package test;

import java.io.IOException;
import java.io.PrintWriter;
import javax.json.Json;
import javax.json.stream.JsonGenerator;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

@WebServlet("/test")
public class TestServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        resp.setContentType("text/plain");
        PrintWriter writer = resp.getWriter();
        try (JsonGenerator gen = Json.createGenerator(writer)) {
            gen.writeStartObject().write("res", "Ok").writeEnd();
        }
    }

}

And the same test

build/default/weighttp -n 1000000 -k -c 100 http://localhost:8080/WebApplication1/test
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 41 sec, 745 millisec and 917 microsec, 23954 req/s, 6855 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 293074157 bytes total, 281074157 bytes http, 12000000 bytes data

23954 req/s, somewhere in between. But surprisingly if we remove json dependency it runs hell fast.

@WebServlet("/test")
public class TestServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        resp.setContentType("text/plain");
        try (PrintWriter writer = resp.getWriter()) {
            writer.print("Ok");
        }
    }

}

build/default/weighttp -n 1000000 -k -c 100 http://localhost:8080/WebApplication1/test
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 10 sec, 944 millisec and 231 microsec, 91372 req/s, 25169 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 282074100 bytes total, 280074100 bytes http, 2000000 bytes data

Amazing 91372 req/s, faster than nginx (due to the whole response is small enough to fit into one memory page and even one packet with MTU 1500).
I'm still not satisfied. Profile our server.

Eliminate most obvious hot spots.

build/default/weighttp -n 1000000 -k -c 100 http://localhost:9999/
weighttp - a lightweight and simple webserver benchmarking tool

starting benchmark...
spawning thread #1: 100 concurrent requests, 1000000 total requests
...
progress: 100% done

finished in 9 sec, 298 millisec and 269 microsec, 107546 req/s, 10292 kbyte/s
requests: 1000000 total, 1000000 started, 1000000 done, 1000000 succeeded, 0 failed, 0 errored
status codes: 1000000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 98000000 bytes total, 94000000 bytes http, 4000000 bytes data

107546 req/s, and we can do even better.

Conclusion? Our server is fast. But Servlets are fast too, taking into account asynchronous support (Servlets 3.0) and nonblocking I/O (Servlets 3.1) they are good choice for http. Our server has obvious advantage - one can change any piece of pipeline and even add support for various protocols.

Buy it, use it, break it, fix it, plug it, play it, burn it, rip it, drag and drop it, zip- unzip it

Monday, December 30, 2013

Performance analysis of our own full blown HTTP server with Netty 4

No comments:

Post a Comment