Buy it, use it, break it, fix it, plug it, play it, burn it, rip it, drag and drop it, zip- unzip it: Don't optimize when you are not asked to

Sunday, December 22, 2013

Don't optimize when you are not asked to

The code below as you may guess multiplies two matrices (see Ulrich Drepper, What Every Programmer Should Know About Memory)

for (i = 0;  i < N; ++i)
    for (j = 0; j < N; ++j)
        for (k = 0; k < N; ++k)
            res[i][j] += mul1[i][k] * mul2[k][j];

The next piece of code does the same (as you probably may not guess anymore)

#define SM (CLS / sizeof (double))

for (i = 0; i < N; i += SM)
    for (j = 0; j < N; j += SM)
        for (k = 0; k < N; k += SM)
            for (i2 = 0, rres = &res[i][j],
                rmul1 = &mul1[i][k]; i2 < SM;
                ++i2, rres += N, rmul1 += N)
                for (k2 = 0, rmul2 = &mul2[k][j];
                    k2 < SM; ++k2, rmul2 += N)
                    for (j2 = 0; j2 < SM; ++j2)
                        rres[j2] += rmul1[k2] * rmul2[j2];

where CLS is a linesize of L1d cache

gcc -DCLS=$(getconf LEVEL1_DCACHE_LINESIZE) ...

The speed

	Original	Tuned
Cycles	16,765,297,87	2,895,041,480
Relative	100%	17.3%

But the main difference as author says is

This looks quite scary.

This smacks of premature optimisation and the possibility the user really does not know what they are talking about, not to mention the problem of portability.

See Why not cache lines.

Buy it, use it, break it, fix it, plug it, play it, burn it, rip it, drag and drop it, zip- unzip it

Sunday, December 22, 2013

Don't optimize when you are not asked to

No comments:

Post a Comment