Menu

Mention it
Digg
Reddit
Newsvine
DEL.ICIO.US

Coded Silicon Product Proposal

How To: Reverse engineer entity relations in a legacy system


System design documents usually contain entity relation diagrams to illustrate the relationship between database entities. If two tables were suitably related one would expect a relationship indicator between them with the appropriate qualification defining the nature of the relationship, e.g.

  • One to one,
  • One to many,
  • One and only one to zero or one,
  • One and only one to zero or many.

So what if the entity relation diagram is not available? In this case most technologist guess the relationship between tables by:

  • Verification of business requirements,
  • Deduction from existing application code,
  • Logical assumptions derived from the visible database structure.

Usually a technologist, analyst programmer or DBA can derive entity relations; however there may be some relations that need to be tested. If there is any doubt about a relationship between to tables in a legacy database, the data in the database will most likely reflect that relationship. I can only use the words “most likely” because the complete qualification of a relationship is hindered by the lack of data. For example, if TableA has a one to many relationship to TableB, but TableB has no rows, then there is no way to verify the relationship by using the data alone. Alternatively if Table B has data and each row in TableA has an associated row in TableB, we may assume a relationship of “One and only one to one or many” but in the intended relationship may be “One and only one to zero or many”.


Taking into account the above limitation, we will move on to show how we can derive the relationship between to tables in a legacy system with sufficient data.


Step 1 : Select fields for the relation

Select the fields in each table that appear to be related.

Usually one can spot the related fields by name.

Examples of a selection for step 1 are:

  • TableA.ItemId – TableB.ItemId
  • TableA.Field1 – TableB.Field2; TableA.Field2 – TableB.Field6

The second example above shows how two fields from TableA may be related to two fields from TableB.


If the intuitive selection were not possible one would have to take an iterative process, i.e. select the first field in each table and if unsuccessful, return to this step and select the next iteration.

Examples of iterative selections for step 1 are:

  • TableA.Field1 – TableB.Field1
  • TableA.Field1 – TableB.Field2
  • TableA.Field1 – TableB.Field3
  • TableA.Field2 – TableB.Field1
  • TableA.Field2 – TableB.Field2
  • TableA.Field2 – TableB.Field3 etc


Step 2 : Calculate the number of records in five domains

Step 2a: Count the number of rows in the first table, e.g.

    select count(*) from TableA
    

Step 2b: Count the number of rows in the second table, e.g.

    select count(*) from TableB
    

Step 2c: Count the number of rows returned with an inner join, e.g.

    select count(*) from TableA, TableB
    where TableB.Field1 = TableA.Field1
    

Here is an alternative SQL syntax:

    select count(*) from TableA
    inner join TableB on TableB.Field1 = TableA.Field1
    

Step 2d: Count the number of rows returned with an outer join to the second table, e.g.

    select count(*) from TableA, TableB
    where TableB.Field1 *= TableA.Field1
    

Here is an alternative SQL syntax:

    select count(*) from TableA
    left outer join TableB on TableB.Field1 = TableA.Field1
    

Step 2e: Count the number of rows returned with an outer join to the first table, e.g.

    select count(*) from TableA, TableB
    where TableB.Field1 =* TableA.Field1
    

Here is an alternative SQL syntax:

    select count(*) from TableB
    left outer join TableA on TableA.Field1 = TableB.Field1
    

Step 3 : Tabulate the results

Before we apply the algorithm, lets summarize the results.

Step

Description

Variable name

2a

Count the number of rows in the first table

CountA

2b

Count the number of rows in the second table

CountB

2c

Count the number of rows returned with an inner join

CountInnerJoin

2d

Count the number of rows returned with an outer join to the second table

CountAOuterB

2e

Count the number of rows returned with an outer join to the first table

CountBOuterA



Step 4 : Apply the algorithm

Derive the cardinality and optionality. This will be illustrated using diagrams.


For a hypothetical relationship between Table A and Table B:

Hypothetical relationship between Table A and Table B


Place the result of each variable in the designated position:

Place the result of each variable in the designated position


Evaluate the cardinality near Table B:

Evaluate the cardinality near Table B


Evaluate the cardinality near Table A:

Evaluate the cardinality near Table A


Evaluate the optionality near Table B:

Evaluate the optionality near Table B


Evaluate the optionality near Table A:

Evaluate the optionality near Table A


This concludes the theory for derivation of the entity relation between two tables.




Example

Here is an example of the evaluation process using an Oracle database containing tables “Order” and “Customer”.

The following Oracle SQL script was used to obtain the calculations:

    DECLARE
	    CountA INT;
	    CountB INT;
	    CountInnerJoin INT;
	    CountAOuterB INT;
	    CountBOuterA INT;
    BEGIN
	    --Calculate CountA
	    SELECT COUNT(*) INTO CountA 
	        FROM Order;
	    DBMS_OUTPUT.put_line('CountA = ' || CountA);

	    --Calculate CountB
	    SELECT COUNT(*) INTO CountB 
	        FROM Customer;
	    DBMS_OUTPUT.put_line('CountB = ' || CountB);

	    --Calculate CountInnerJoin
	    SELECT COUNT(*) INTO CountInnerJoin 
	        FROM Order A 
	        INNER JOIN Customer B 
	            ON B.CustomerId = A.FK_CustomerId;
	    DBMS_OUTPUT.put_line('CountInnerJoin = ' || CountInnerJoin);

	    --Calculate CountAOuterB
	    SELECT COUNT(*) INTO CountAOuterB 
	        FROM Order A 
	        LEFT OUTER JOIN Customer B 
	            ON B.CustomerId = A.FK_CustomerId;
	    DBMS_OUTPUT.put_line('CountAOuterB = ' || CountAOuterB);

	    --Calculate CountBOuterA
	    SELECT COUNT(*) INTO CountBOuterA 
	        FROM Customer B 
	        LEFT OUTER JOIN Order A 
	            ON A.FK_CustomerId = B.CustomerId;
	    DBMS_OUTPUT.put_line('CountBOuterA = ' || CountBOuterA);
    END;
    

The DBMS Output was captured as:

    CountA = 153768
    CountB = 129697
    CountInnerJoin = 153768
    CountAOuterB = 153768
    CountBOuterA = 283462
    

The calculations are written on the diagram:

The calculations are written on the diagram


Then the evaluation algorithm is applied:

Evaluation step 1


Evaluation step 2


Evaluation step 3


Evaluation step 4


This evaluates to "One and only one customer relates to zero or many orders":

Conclusion


Conclusion

Assuming the database contains sufficient data that represents the intended relationship between two tables, the entity relation can be deduced using the above documented algorithm.

Author: Johan Fourie
Date: 14 July 2006

Status: Considering development


Links
HomeProductsCatsBackFuzzyDuckHyperspaceMonkeySeeDownloadsPricesIdeasContact UsAboutDisclaimer

All rights reserved
The page you are viewing was last updated on 10 Nov 2007
Contact Web master