Posted by on October 18, 2017

I’ve written a SQL code generator which automates the mindless, repetitive task of joining a table to the tables to which it has foreign keys.

Generating this code isn’t exactly trivial, so I’ve decomposed the problem. I start with the standard information_schema foreign key column query:

create or replace view generator.foreign_key_columns as
select  
     fk.constraint_name as fk_constraint_name
    ,fk.constraint_schema  
    ,fk.table_name as fk_table_name 
    ,fk.column_name as fk_column_name 
    ,fk.ordinal_position as fk_ordinal_position 
    ,pk.constraint_name as referenced_constraint_name 
    ,pk.table_name as referenced_table_name
    ,pk.constraint_schema as referenced_schema_name 
    ,pk.column_name as referenced_column_name 
    ,pk.ordinal_position as referenced_ordinal_position 
from information_schema.referential_constraints as rc 
inner join information_schema.key_column_usage as fk 
    on fk.constraint_catalog = rc.constraint_catalog  
    and fk.constraint_schema = rc.constraint_schema 
    and fk.constraint_name = rc.constraint_name 
inner join information_schema.key_column_usage as pk 
    on pk.constraint_catalog = rc.unique_constraint_catalog  
    and pk.constraint_schema = rc.unique_constraint_schema 
    and pk.constraint_name = rc.unique_constraint_name 
    and pk.ordinal_position = fk.ordinal_position;

We’re going to want to avoid duplicating the join columns in the select list, so the next step is to identify the non-join columns:

create or replace view generator.non_foreign_key_columns as 
select table_schema
	,table_name
    ,column_name
	from information_schema.columns
	where (table_schema, table_name, column_name) not in (
		select constraint_schema
		, fk_table_name
		, fk_column_name
		from foreign_key_columns
	);

Next we group columns so that there is one row per foreign key, with the join columns structured as a join clause and the non-join columns listed.

create or replace view generator.foreign_key_joins as 
with recursive join_columns as (
  select
    constraint_schema as root_schema
  , fk_table_name as root_table_name
  , fk_table_name as fk_table_alias
  , constraint_schema
  , referenced_table_name
  , referenced_schema_name
  , fk_constraint_name
  , referenced_column_name
  , fk_column_name
  , 0 as level
  from generator.foreign_key_columns fks
  union all
  select 
    join_columns.root_schema
  , join_columns.root_table_name
  , join_columns.fk_constraint_name as fk_table_alias
  , fks.constraint_schema
  , fks.referenced_table_name
  , fks.referenced_schema_name
  , fks.fk_constraint_name
  , fks.referenced_column_name
  , fks.fk_column_name
  , level + 1 as level
  from join_columns
  inner join generator.foreign_key_columns fks 
  	on join_columns.referenced_table_name = fks.fk_table_name
  	and join_columns.referenced_schema_name = fks.constraint_schema
)
select root_schema
  , root_table_name
  , fk_table_alias
  , referenced_table_name
  , referenced_schema_name
  , fk_constraint_name
  , 'left outer join "'|| referenced_schema_name ||'"."'|| referenced_table_name ||'" as "'|| fk_constraint_name ||'"
     on '|| string_agg('"'||fk_table_alias ||'"."'||fk_column_name ||'" = "'|| fk_constraint_name ||'"."'||referenced_column_name ||'"','and') as join_sql
  /* get other columns as a sub-select so they don't affect the cardinality of the view */
     ,(
		select string_agg('"'||fk_constraint_name||'"."'||non_foreign_key_columns.column_name||'" as "'||fk_constraint_name||'_'||non_foreign_key_columns.column_name||'"',chr(10)||',')  
		from generator.non_foreign_key_columns 
		where non_foreign_key_columns.table_schema = referenced_schema_name
		and non_foreign_key_columns.table_name = referenced_table_name
	) as non_fk_columns
from join_columns
group by root_schema
  , root_table_name
  , fk_table_alias
  , referenced_table_name
  , referenced_schema_name
  , fk_constraint_name;

This query is more complex than I like, but it is the best that I can do.

  1. There is a recursive Common Table Expression to find all of the remote ancestors of a table.
  2. The foreign key columns are aggregated into a SQL string after the CTE.
  3. The join is left outer so that the database can optimise the query and prune tables whose columns aren’t referenced.
  4. The tables are aliased in case the same table is referenced more than once. You can remove the table
    aliases if that doesn’t occur in your database.
  5. The column list is sub-select in the select clause so that it doesn’t modify the number of rows returned.
  6. The columns are aliased in case their names aren’t unique, or if the same table is referenced more than once. You can remove the column
    aliases if neither of those apply
  7. If there is a cycle of foreign keys in your database, where A references B which then references A, then the this view might not be what you expect.

With the joins identified creating the views is just another aggregation:

create or replace view generator.ancestor_views as 
select root_schema
root_table_name
,'create or replace view "'||root_schema ||'"."'||root_table_name ||'_av" as 
	select ' || (
		select string_agg('"'||root_table_name||'"."'||root_columns.column_name||'"',chr(10)||',') 
		from generator.non_foreign_key_columns root_columns
		where root_columns.table_schema = root_schema
		and root_columns.table_name = root_table_name
		)
	||','|| string_agg(non_fk_columns,chr(10)||',') || '
	from "'||root_schema ||'"."'||root_table_name ||'" 
	'|| string_agg(join_sql,chr(10)) as view_sql
from generator.foreign_key_joins
group by root_schema
, root_table_name;

I call these views, ancestor views. The _av suffix should be adjusted to taste. Having a special schema with an _av suffix also works well, if you’re not already overloaded with schemas for things like concurrent releases.

Once we have generated these views, we can write SQL at a higher level of abstraction. Our code can be concise and limited to our intent, rather than being cluttered by the joins. Queries like the following can be written without being cluttered by dozens of lines of code joining many tables.

SELECT customer_name, email_address
FROM order_lines_av
WHERE product_recall_flag = true

Give this a try. I hope you like it.

PS – all of the example code is for postgreSQL 9.6. Converting it to other databases that support information_schema (e.g. MS SQL Server) is simple, mostly a matter of converting the string_agg to whatever your database supports. For non-information schema databases (e.g. Oracle) you’ll need to use the proprietary data dictionary (e.g. all_cons_columns ).

Posted in: Technical

Comments

Be the first to comment.

Leave a Reply