Run Cron Jobs in Python API
Run scheduled tasks in a Python API to achieve a cron job like functionality without obstruction the API flow
Scheduling can be very useful to automate certain tasks. Mostly used for housekeeping tasks, scheduling a task to be done at a regular interval is something that almost all languages support widely. From my days of working with Golang, it has a very nicely written library called
cron that does exactly what it sounds like. It runs cron jobs.
In the recent past, I was building an API where I found the need to schedule a few functions at regular intervals. I decided that I will run a particular function on a daily basis. As the API was written in Python, I had to figure out how to achieve this in Python. I assumed that there was going to be a library similar to
cron in Go, so I started looking. Within a few minutes of looking, I came across schedule. As the name suggests, it helps with scheduling tasks in Python.
My requirement was to run the functions in the background at a certain interval while the script runs on its own (eg: API router is running). So I quickly added the package into my source code and wrote some lines of code:
import schedule def say_hello(): print("Just saying hello!") if __name__ == "__main__": schedule.every().day.at("00:00").do(say_hello)
In the above code, there is a function named
say_hello which just prints out a line when run. The
schedule library is used to run this function at midnight every day. My understanding was that the
schedule library will take care of running the function every day at midnight and I quickly deployed the changes (to
staging ofcourse :-p).
After a week I decided to check back on the API. To my disbelief the cron job had not run at all. I scratched my head thinkging that the function (
say_hello) is the one that might have failed to do it's job instead of the library.
I started looking through the docs of
schedule package until I came across this line of code. The library expects a function named
run_pending to be called in order to run all the pending functions. This means that just adding the schedule job is just not enough.
As the library won't handle running the job on its own, there is need for us to handle it separately. My solution to do this was to run the
run_pending at an interval but I also wanted to make sure that this would run in the background and not conflict with anything that's already running.
So I decided to use a thread to run the
run_pending function at an interval of every 5 seconds. Here's how it's done.
import schedule def run_pending_scheduled_jobs(): while True: logger.debug("Running pending scheduled jobs") schedule.run_pending() sleep(5) if __name__ == "__main__": # Add the scheduled job first # ... # Finally, start the background job thread = Thread(target=run_pending_scheduled_jobs) thread.daemon = True thread.start()
In the above code, the
run_pending_scheduled_jobs function runs an infinite loop which will sleep for 5 seconds and then run the
run_pending function from the
schedule library. This is then run through a thread which is daemonised. After that the thread is started.
PS: It's important to daemonise the thread so that when the API exits, the thread doesn't block the exit flow.
This solution is good and works nicely, however this is not perfect in the sense that if I have a job scheduled for midnight, it may not always run at midnight. In the above example, I am running the
run_pending function every 5 seconds. This is because the server can restart any time and the
run_pending function will only run the scheduled tasks that were supposed to be run before the current time or are scheduled for the current time.
This means that, even though the function runs every 5 seconds, the best case scenario is that the
say_hello function runs exactly at midnight (which is good) but the worst case scenario is that it will run 5 seconds after midnight. Of-course, one solution to this is running the
run_pending_scheduled_jobs function every second but that is only recommended if the scheduled function is actually supposed to be run to the dot.
In most cases, a delay of 5 seconds (worst case) for a scheduled function is not a big deal and can be ignored.