Execution time experiment

Posted on April 30, 2020.

Consider this Python program:

# num_add.py

ans = 0

for i in range(123456789):
    ans += 1

print(ans)

It runs in a little over 10 seconds:

$ time python num_add.py
123456789

real    0m12.896s
user    0m12.868s
sys     0m0.008s

This result is expected; it takes a certain amount of time to perform more than one hundred million additions. During this, my computer was working hard; the fans started going pretty fast.

Now consider this Python program:

# sleep.py

import time

time.sleep(12.8)

print(123456789)

It executes in about the same amount of time and outputs the same thing:

$ time python sleep.py
123456789

real    0m12.847s
user    0m0.026s
sys     0m0.008s

This time though, my computer was completely silent during the execution. As if nothing was being done. If you look at the source of time.sleep, and to the called cross platform pysleep function, you will see that the underlying mechanism for wasting time is a call to select with no watched file descriptor sets and the desired timeout. From this point, it is up to the operating system to return when the amount of time is elapsed.

The actual CPU time can be seen in the values returned by the time command. The user and sys parts is the time that was taken for running code in user-mode and in kernel-mode respectively. The first program spent all the time running code whereas the second spent almost no time running code. While they have the same behavior, it could be argued that the second script is better since it leaves CPU time for the OS to perform more important tasks.

We will now test out a similar program in C:

// num_add.c

#include <stdio.h>

int main() {
    int ans = 0;

    for (int i=0; i<123456789; i++) {
        ans += 1;
    }

    printf("%d\n", ans);
}

This one has a more surprising execution time:

$ gcc num_add.c
$ time ./a.out
123456789

real    0m0.296s
user    0m0.294s
sys     0m0.002s

The computation time is a fraction of a second. At first, I was certain a compiler optimization had avoided the computation and just printed the obvious result. By inspecting the Assembly for this program, we can see that the loop and arithmetic is actually performed:

...
40113e:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
401142:       83 45 f8 01             addl   $0x1,-0x8(%rbp)
401146:       81 7d f8 14 cd 5b 07    cmpl   $0x75bcd14,-0x8(%rbp)
40114d:       7e ef                   jle    40113e <main+0x18>
...

I was aware of overhead in interpreted languages such as Python but never thought it would be this obvious.

This simple experiment shows how execution time varies in very similar programs. I would much prefer waiting only 0.2 seconds rather than 12 for my computations to complete. At the same time, I would choose writing in a higher level programming language any day. As with many other things in life, it is a matter of tradeoffs.