Run Cron Jobs in Python API
Run scheduled tasks in a Python API to achieve a cron job like functionality without obstruction the API flow
Scheduling can be very useful to automate certain tasks. Mostly used for housekeeping tasks, scheduling a task to be done at a regular interval is something that almost all languages support widely. From my days of working with Golang, it has a very nicely written library called cron
that does exactly what it sounds like. It runs cron jobs.
In the recent past, I was building an API where I found the need to schedule a few functions at regular intervals. I decided that I will run a particular function on a daily basis. As the API was written in Python, I had to figure out how to achieve this in Python. I assumed that there was going to be a library similar to cron
in Go, so I started looking. Within a few minutes of looking, I came across schedule. As the name suggests, it helps with scheduling tasks in Python.
My requirement was to run the functions in the background at a certain interval while the script runs on its own (eg: API router is running). So I quickly added the package into my source code and wrote some lines of code:
import schedule
def say_hello():
print("Just saying hello!")
if __name__ == "__main__":
schedule.every().day.at("00:00").do(say_hello)
In the above code, there is a function named say_hello
which just prints out a line when run. The schedule
library is used to run this function at midnight every day. My understanding was that the schedule
library will take care of running the function every day at midnight and I quickly deployed the changes (to staging
ofcourse :-p).
After a week I decided to check back on the API. To my disbelief the cron job had not run at all. I scratched my head thinkging that the function (say_hello
) is the one that might have failed to do it's job instead of the library.
I started looking through the docs of schedule
package until I came across this line of code. The library expects a function named run_pending
to be called in order to run all the pending functions. This means that just adding the schedule job is just not enough.
As the library won't handle running the job on its own, there is need for us to handle it separately. My solution to do this was to run the run_pending
at an interval but I also wanted to make sure that this would run in the background and not conflict with anything that's already running.
So I decided to use a thread to run the run_pending
function at an interval of every 5 seconds. Here's how it's done.
import schedule
def run_pending_scheduled_jobs():
while True:
logger.debug("Running pending scheduled jobs")
schedule.run_pending()
sleep(5)
if __name__ == "__main__":
# Add the scheduled job first
# ...
# Finally, start the background job
thread = Thread(target=run_pending_scheduled_jobs)
thread.daemon = True
thread.start()
In the above code, the run_pending_scheduled_jobs
function runs an infinite loop which will sleep for 5 seconds and then run the run_pending
function from the schedule
library. This is then run through a thread which is daemonised. After that the thread is started.
PS: It's important to daemonise the thread so that when the API exits, the thread doesn't block the exit flow.
This solution is good and works nicely, however this is not perfect in the sense that if I have a job scheduled for midnight, it may not always run at midnight. In the above example, I am running the run_pending
function every 5 seconds. This is because the server can restart any time and the run_pending
function will only run the scheduled tasks that were supposed to be run before the current time or are scheduled for the current time.
This means that, even though the function runs every 5 seconds, the best case scenario is that the say_hello
function runs exactly at midnight (which is good) but the worst case scenario is that it will run 5 seconds after midnight. Of-course, one solution to this is running the run_pending_scheduled_jobs
function every second but that is only recommended if the scheduled function is actually supposed to be run to the dot.
In most cases, a delay of 5 seconds (worst case) for a scheduled function is not a big deal and can be ignored.
Discussion