Friday, March 25, 2022
HomeArtificial IntelligenceProfiling Python Code

Profiling Python Code

Profiling is a way to determine how time is spent in a program. With this statistics, we are able to discover the “scorching spot” of a program and take into consideration methods of enchancment. Typically, scorching spot in sudden location could trace a bug in this system as properly.

On this tutorial, we’ll see how we are able to use the profiling facility in Python. Particularly, you will notice

  • How we are able to examine small code fragments utilizing timeit module
  • How we are able to profile all the program utilizing cProfile module
  • How we are able to invoke a profiler inside an current program
  • What the profiler can’t do

Let’s get began.

Profiling Python Code. Photograph by Prashant Saini. Some rights reserved.

Tutorial Overview

This tutorial is in 4 elements; they’re:

  • Profiling small fragments
  • The profile module
  • Utilizing profiler inside code
  • Caveats

Profiling small fragments

When you find yourself requested in regards to the alternative ways of doing the identical factor in Python, one perspective is to examine which one is extra environment friendly. In Python’s normal library, we have now the timeit module that enables us to do some easy profiling.

For instance, to concatenate many quick strings, we are able to use the be part of() perform from strings or use the + operator. So how we all know which is quicker? Think about the next Python code:

This can produce an extended string 012345.... within the variabe longstr. Various solution to write that is:

To match the 2, we are able to do the next on the command line:

These two instructions will produce the next output:

The above instructions is to load the timeit module and go on a single line of code for measurement. Within the first case, we have now two traces of statements and they’re handed on to the timeit module as two separate arguments. In the identical rationale, the primary command may also be offered as three traces of statements (by breaking the for loop into two traces), however the indentation of every line must be quoted appropriately:

The output of timeit is to search out one of the best efficiency amongst a number of runs (default to be 5). Every run is to run the offered statements just a few occasions (which is dynamically decided). The time is reported as the common to execute the statements as soon as in one of the best run.

Whereas it’s true that the be part of perform is quicker than the + operator for string concatenation, the timing above isn’t a good comparability. It’s as a result of we use str(x) to make quick strings on the fly through the loop. The higher solution to do are the next:

which produces:

The -s choice permits us to supply the “setup” code, which is executed earlier than the profiling and never timed. Within the above, we create the checklist of quick strings earlier than we begin the loop. Therefore the time to create these strings are usually not measured within the “per loop” timing. From the above, we see that the be part of() perform is 2 orders of magnitude quicker than the + operator. The extra usually use of the -s choice is to import the libraries. For instance, we are able to examine the sq. root perform from Python’s math module, from numpy, and utilizing the exponential operator ** as follows:

The above produces the next measurement, which we see that math.sqrt() is quickest whereas numpy.sqrt() is slowest on this explicit instance:

In case you surprise why numpy is slowest, it’s as a result of numpy is optimized for arrays. You will note its distinctive velocity within the following various:

the place the result’s:

In case you want, you too can run timeit in Python code. For instance, the next might be just like the above, however provide the uncooked whole timing for every run:

Within the above, every run is to execute the assertion 10000 occasions; the result’s as follows, which you’ll be able to see the results of roughly 98 usec per loop in one of the best run:

The profile module

Deal with an announcement or two for efficiency is from a microscopic perspective. Chances are high, we have now an extended program and wish to see what’s inflicting it to run sluggish. That occurs earlier than we are able to take into account various statements or algorithms.

A program working sluggish can usually resulting from two causes: A component is working sluggish, or an element is working too many occasions and that added as much as take an excessive amount of time. We name these “efficiency hogs” the recent spot. Let’s take a look at an instance. Think about the next program that makes use of hill climbing algorithm to search out hyperparameters for a perceptron mannequin:

Assume we saved this program within the file, we are able to run the profiler within the command line as follows:

and the output would be the following:

The traditional output of this system might be printed first, after which the profiler’s statistics might be printed. From the primary row, we see that the perform goal() in our program has run for 101 occasions that took a complete of 4.89 seconds. However this 4.89 seconds are largely spent on the features it known as, which the entire time that spent at that perform is merely 0.001 second. The features from dependent modules are additionally profiled. Therefore you see a variety of numpy features above too.

The above output is lengthy and is probably not helpful to you as it may be troublesome to inform which perform is the recent spot. Certainly we are able to kind the above output. For instance, to see which perform is named essentially the most variety of occasions, we are able to kind by ncalls:

Its output is as follows, which says the get() perform from a Python dict is essentially the most used perform (however it solely consumed 0.03 seconds in whole out of the 5.6 seconds to complete this system):

The opposite kind choices are as follows:

Kind string Which means
calls Name rely
cumulative Cumulative time
cumtime Cumulative time
file File title
filename File title
module File title
ncalls Name rely
pcalls Primitive name rely
line Line quantity
title Operate title
nfl Title/file/line
stdname Commonplace title
time Inside time
tottime Inside time

If this system takes a while to complete, it isn’t cheap to run this system many occasions simply to search out the profiling lead to a distinct kind order. Certainly, we are able to save the profiler’s statistics for additional processing, as follows:

Much like above, it should run this system. However this won’t print the statistics to the display however to reserve it right into a file. Afterwards, we are able to use the pstats module like following to open up the statistics file and supply us a immediate to control the information:

For instance, we are able to use kind command to alter the kind order and use stats to print what we noticed above:

You’ll discover that the stats command above permits us to supply an additional argument. The argument could be a common expression to seek for the features such that solely these matched might be printed. Therefore it’s a means to supply a search string to filter.

This pstats browser permits us to see extra than simply the desk above. The callers and callees instructions reveals us which perform calls which perform and what number of occasions it’s known as, and the way a lot time it spent. Therefore we are able to take into account that as a breakdown of the perform degree statistics. It’s helpful if in case you have a variety of features that calls one another and wished to know the way the time spent in numerous eventualities. For instance, this reveals that the goal() perform is named solely by the hillclimbing() perform however the hillclimbing() perform calls a number of different features:

Utilizing profiler inside code

The above instance assumes you’ve the whole program saved in a file and profile all the program. Typically, we concentrate on solely part of all the program. For instance, if we load a big module, it takes time to bootstrap and we wish to ignore this from profiling. On this case, we are able to invoke the profiler just for sure traces. An instance is as follows, which modified from this system above:

it should output the next:


Utilizing profiler with Tensorflow fashions could not produce what you’ll anticipate, particularly if in case you have written your individual customized layer or customized perform for the mannequin. In case you did it appropriately, Tenorflow supposed to construct the computation graph earlier than your mannequin is executed and therefore the logic might be modified. The profiler output will subsequently not displaying your customized courses.

Equally for some superior modules that contain binary code. The profiler can see you known as some features and marked it as “built-in” strategies however it can’t go any additional into the compiled code.

Beneath is a brief code of LeNet5 mannequin for the MNIST classification drawback. In case you attempt to profile it and print the highest 15 rows, you will notice {that a} wrapper is occupying majority of the time and nothing may be proven past that:

Within the end result under, the TFE_Py_Execute is marked as “built-in” methodology and it consumes 30.1 sec out of the entire run time of 39.6 sec. Word that the tottime is identical because the cumtime which means from profiler’s perspective, it appears all time are spent at this perform and it doesn’t name another features. This illustrates the limitation of Python’s profiler.

Lastly, Python’s profiler provides you solely the statistics on time however not reminiscence utilization. You could must search for one other library or instruments for this goal.

Additional Readings

The usual library modules timeit, cProfile, pstats have their documentation in Python’s documentation:

The usual library’s profiler may be very highly effective however not the one one. In order for you one thing extra visible, you’ll be able to check out the Python Name Graph module. It could produce an image of how features calling one another utilizing the GraphViz device:

The limitation of not capable of dig into the compiled code may be solved by not utilizing the Python’s profiler however as a substitute, use one for compiled applications. My favourite is Valgrind:

however to make use of it, you could must recompile your Python interpreter to activate debugging assist.


On this tutorial, we realized what’s a profiler and what it could do. Particularly,

  • We all know the right way to examine small code with timeit module
  • We see Python’s cProfile module can present us detailed statistics on how time is spent
  • We realized to make use of the pstats module towards the output of cProfile to kind or filter



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments