Run Cron Jobs in Python API

Deepjyoti Barman @deepjyoti30
Nov 13, 2023 7:08 AM UTC
Post cover

Scheduling can be very useful to automate certain tasks. Mostly used for housekeeping tasks, scheduling a task to be done at a regular interval is something that almost all languages support widely. From my days of working with Golang, it has a very nicely written library called cron that does exactly what it sounds like. It runs cron jobs.

In the recent past, I was building an API where I found the need to schedule a few functions at regular intervals. I decided that I will run a particular function on a daily basis. As the API was written in Python, I had to figure out how to achieve this in Python. I assumed that there was going to be a library similar to cron in Go, so I started looking. Within a few minutes of looking, I came across schedule. As the name suggests, it helps with scheduling tasks in Python.

The problem

My requirement was to run the functions in the background at a certain interval while the script runs on its own (eg: API router is running). So I quickly added the package into my source code and wrote some lines of code:

import schedule

def say_hello():
    print("Just saying hello!")

if __name__ == "__main__":
    schedule.every().day.at("00:00").do(say_hello)

In the above code, there is a function named say_hello which just prints out a line when run. The schedule library is used to run this function at midnight every day. My understanding was that the schedule library will take care of running the function every day at midnight and I quickly deployed the changes (to staging ofcourse :-p).

After a week I decided to check back on the API. To my disbelief the cron job had not run at all. I scratched my head thinkging that the function (say_hello) is the one that might have failed to do it's job instead of the library.

I started looking through the docs of schedule package until I came across this line of code. The library expects a function named run_pending to be called in order to run all the pending functions. This means that just adding the schedule job is just not enough.

The solution

As the library won't handle running the job on its own, there is need for us to handle it separately. My solution to do this was to run the run_pending at an interval but I also wanted to make sure that this would run in the background and not conflict with anything that's already running.

So I decided to use a thread to run the run_pending function at an interval of every 5 seconds. Here's how it's done.

import schedule

def run_pending_scheduled_jobs():
    while True:
        logger.debug("Running pending scheduled jobs")
        schedule.run_pending()
        sleep(5)

if __name__ == "__main__":
    # Add the scheduled job first
    # ...
    # Finally, start the background job
    thread = Thread(target=run_pending_scheduled_jobs)
    thread.daemon = True
    thread.start()

In the above code, the run_pending_scheduled_jobs function runs an infinite loop which will sleep for 5 seconds and then run the run_pending function from the schedule library. This is then run through a thread which is daemonised. After that the thread is started.

PS: It's important to daemonise the thread so that when the API exits, the thread doesn't block the exit flow.

Why this is not the perfect solution?

This solution is good and works nicely, however this is not perfect in the sense that if I have a job scheduled for midnight, it may not always run at midnight. In the above example, I am running the run_pending function every 5 seconds. This is because the server can restart any time and the run_pending function will only run the scheduled tasks that were supposed to be run before the current time or are scheduled for the current time.

This means that, even though the function runs every 5 seconds, the best case scenario is that the say_hello function runs exactly at midnight (which is good) but the worst case scenario is that it will run 5 seconds after midnight. Of-course, one solution to this is running the run_pending_scheduled_jobs function every second but that is only recommended if the scheduled function is actually supposed to be run to the dot.

In most cases, a delay of 5 seconds (worst case) for a scheduled function is not a big deal and can be ignored.

Discussion