Why pylint rule `W1203` is important

cover

If you have been using pylint tool to check the format of your Python code you may have encounterd the rule W1203: logging-fstring-interpolation but you didn't really understand why it's important.

The warning looks like :

path/to/my/file.py:37:4: W1203: Use lazy % formatting in logging functions (logging-fstring-interpolation)

It appears when you are using the Python logging library. On the pylint documentation about this rule it is said that you shouldn't use f-string formated strings.

The exemple given is :

Problematic code:

import logging
import sys

logging.error(f'Python version: {sys.version}')  # [logging-fstring-interpolation]

Correct code:

import logging
import sys

logging.error('Python version: %s', sys.version)

But why using f-string string in the logging context is a bad idea an can a huge impact on the performance of your application ??

The logging library

The Python logging library is the standard library to implement logging in an application or a script. It replace the traditional print statement. It allow to have an unified log output format and select multiple level of verbosity on the fly.

Using the logging library you display logs (on the stdout or in a file) with multiple levels like : info, warning, error, debug, see more. You can then set the logging level of the logger to one of them.

For example if you decided that you want only usefull logs when running in production, you will set the logging level to error. Only the logs with level error or critical will be printed. But if you are in development environment, you may want to be in the debug logging level to show all possible logs.

What is the problem then ?

Full example

In a production environment, if you have a logs of type debug that call other functions, thoses will interpolated even if it will never be printed.

For exemple you have a simple code :

import logging
import sys
from datetime import datetime

logger = logging.getLogger()
handler = logging.StreamHandler(sys.stdout)
logger.addHandler(handler)

# In production
logger.setLevel("INFO")
# In development
# logger.setLevel("DEBUG")

def handle_incoming_message(message):
  logger.debug(f"[{datetime.now()}] New message received : '{message}'")
  return len(message)

if __name__ == "__main__":
  total = 0
  for i in range(10000000):
    total += handle_incoming_message("my custom message")

  logger.info(f"The total is : {total}")

You call a function handle_incoming_message on each incoming message received by your app. You count the number of letters in the message an return it. Finally you count the total of letters in all messages.

In development environement, you want to have a debug output each time you receive a message (for debuging purpose). But you don't care about this log when running in production.

Runnig this code in development took 116,40s with 10000001 log output

Runnig this code in production took 16,99s with only on log output (the last one)

Can we optimize the script duration in production ?

Changing the line :

logger.debug(f"[{datetime.now()}] New message received : '{message}'")

By :

logger.debug("[%s] New message received : '%s'", datetime.now(), message)

Should produce the exact same output but in less time.

Runnig this code in development took 112,81s with 10000001 log output

Runnig this code in production took 8,54s with only on log output (the last one)

How did we save time ?

How can just changing the format of the log string we can run the same script in half the time ?

In the first solution :

logger.debug(f"[{datetime.now()}] New message received : '{message}'")

Python will firstly call the string format function to render the string, then call the logger.debug

In the second example :

logger.debug("[%s] New message received : '%s'", datetime.now(), message)

We directly call the logger.debug with multiple argument. In the algorithm of the logging function, the string format function will be called with all the argument passed to the function only if the log should be showed to the user (depending of the logging level).

By chosing the second code over the other save us one function call and then time. In the example, the time consuming call is datetime.now() but you have to imagin that this call could may be another more time ungry call in your application !

Conclusion

By using bulting logging library string formating capabilities, you can save a lot of time and make your application or script more fast and responsive.

Thank's pylint for the warning.