Get all tables used in join from SQLAlchemy

Deepjyoti Barman @deepjyoti30
Dec 19, 2020 1:27 PM UTC
Post cover

Recently, I was working on one of my projects and I came across this issue of joining two tables. Now the thing is, I mostly stay away from using SQL and prefer to go with NoSQL databases, however this time, the requirement was to go with PostgreSQL so I ended up setting it up. Since the language I used for building backend services is Python (mostly FastAPI), I went with SQLAlchemy in order to connect to the database and get the proper data.

I actually think that it is a great library to connect to the database. I understand it's a bit of a hassle to set it up at the beginning but that set up actually helps a lot. Since, you set up a proper types etc for each column, it's easier to have proper type checking and avoid errors in the long run.

Anyway, to be honest, I am kind of liking using PostgreSQL and since it is the language a lot of the Python developers use so I thought it will be worth trying it out. Now, I don't want to be hating on MySQL but honestly I feel like it is way too old school.

The problem ?

Well, my problem was I needed two tables. One would be a child table and the parent would keep a reference to the child. However, when I make a query I wanted to join the table and get an unified output.

So, I created the two tables in the following way.

Create the child table:

create table child (cid serial primary key, child_name varchar not null);

Create the parent table:

create table parent (pid serial primary key, parent_name varchar not null, child serial);

Once both the tables are created, I added a reference to the child id in the child column for the parent table.

alter table parent add foreign key (child) references child (cid);

Fetching using a join

Now, let's enter some sample data in the tables and try to fetch them the way we want it.

After entering the data, the tables looked like this

test=# select * from parent;
 pid | parent_name | child 
-----+-------------+-------
   1 | Test Parent |     2
(1 row)
test=# select * from child;
 cid | child_name 
-----+------------
   2 | Test Child
(1 row)

Now, as seen above, the parent with id 1 is referencing a child with id 2. If we want to fetch both the above data together, we would do a join. For example, the tables can be joined in the following way.

test=# select * from parent left join child on parent.child = child.cid;
 pid | parent_name | child | cid | child_name 
-----+-------------+-------+-----+------------
   1 | Test Parent |     2 |   2 | Test Child
(1 row)

When we join the tables using a left join on the child id, we get the above output, which is just perfect. However, this is just the command on the SQL console. We need to do the same command in SQLAlchemy and get the same data so that it will be easier for us.

Joining two tables in SQLAlchemy

I tried joining the tables as pointed by a lot of StackOverflow answers. However, that just didn't work.

Now the answer basically says that we can do a join like in the following way:

Assuming you created a Session object from SQLAlchemy which would be named as db.

db.query(Parent).join(Child, Parent.child == Child.cid)

Anyone with a basic understanding of Python would think that the above result would return a Parent object with the Child attributes included since that is what we understand from the table outputs that SQL provides us.

Like, if we run a basic join command, like in the above part, we get one table with both the child and the parent rows included.

However, this is not the case in SQLAlchemy. They actually work a bit different.

How and Why does SQLAlchemy do that?

So, in SQLAlchemy, when the above statement is run, it returns a Parent object with just the parent attributes. Now, I was expecting that they would provide us an object with both the attributes from Parent and Child but they didn't.

Here's a logical solution as to why they do not do that. If we want an object with both the parent and child attributes, that would mean we would get an object of a Class that has both the Parent class and the Child class. In other words, if we do a join in SQLAlchemy, they would have to create a new Class that would inherit from all the table classes that are present in the join.

This behaviour, even though, it would make developers lives easier, would be pretty inefficient. Creating classes on the fly like that (as far as I understand) would be a total mess and would not be an optimized solution for a library that is used by so many people.

The Solution

What is the solution you may ask. Well, it's not that difficult. Since, we understand the problem now which is that SQLAlchemy cannot just create dynamic classes with columns from more than one table, why don't we pass it the tables we want it to work on? That way it will return us all the tables we need and run a proper join on them as well.

So, in our situation, we need two tables from SQLAlchemy, one being the Parent table and the other the child table. So we can do something like the following:

parent, child = db.query(Parent, Child).join(Child, Parent.child == Child.cid)

The above statement returns two objects, one is a Parent object with all the parent columns and the second a Child object with all the child details. Thus we have all the data that we needed and we can just do whatever we want with this data now.

Discussion