|
Coded Silicon Product Proposal
How To: Reverse engineer entity relations in a legacy system
System design documents usually contain entity relation diagrams to illustrate the
relationship between database entities. If two tables were suitably related one
would expect a relationship indicator between them with the appropriate qualification
defining the nature of the relationship, e.g.
- One to one,
- One to many,
- One and only one to zero or one,
- One and only one to zero or many.
So what if the entity relation diagram is not available? In this case most technologist
guess the relationship between tables by:
- Verification of business requirements,
- Deduction from existing application code,
- Logical assumptions derived from the visible database structure.
Usually a technologist, analyst programmer or DBA can derive entity relations; however
there may be some relations that need to be tested. If there is any doubt about
a relationship between to tables in a legacy database, the data in the database
will most likely reflect that relationship. I can only use the words “most likely”
because the complete qualification of a relationship is hindered by the lack of
data. For example, if TableA has a one to many relationship to TableB, but TableB
has no rows, then there is no way to verify the relationship by using the data alone.
Alternatively if Table B has data and each row in TableA has an associated row in
TableB, we may assume a relationship of “One and only one to one or many” but in
the intended relationship may be “One and only one to zero or many”.
Taking into account the above limitation, we will move on to show how we can derive
the relationship between to tables in a legacy system with sufficient data.
Step 1 : Select fields for the relation
Select the fields in each table that appear to be related.
Usually one can spot the related fields by name.
Examples of a selection for step 1 are:
- TableA.ItemId – TableB.ItemId
- TableA.Field1 – TableB.Field2; TableA.Field2 – TableB.Field6
The second example above shows how two fields from TableA may be related to two
fields from TableB.
If the intuitive selection were not possible one would have to take an iterative
process, i.e. select the first field in each table and if unsuccessful, return to
this step and select the next iteration.
Examples of iterative selections for step 1 are:
- TableA.Field1 – TableB.Field1
- TableA.Field1 – TableB.Field2
- TableA.Field1 – TableB.Field3
- TableA.Field2 – TableB.Field1
- TableA.Field2 – TableB.Field2
- TableA.Field2 – TableB.Field3 etc
Step 2 : Calculate the number of records in five domains
Step 2a: Count the number of rows in the first table, e.g.
select count(*) from TableA
Step 2b: Count the number of rows in the second table, e.g.
select count(*) from TableB
Step 2c: Count the number of rows returned with an inner join, e.g.
select count(*) from TableA, TableB
where TableB.Field1 = TableA.Field1
Here is an alternative SQL syntax:
select count(*) from TableA
inner join TableB on TableB.Field1 = TableA.Field1
Step 2d: Count the number of rows returned with an outer join to the second table,
e.g.
select count(*) from TableA, TableB
where TableB.Field1 *= TableA.Field1
Here is an alternative SQL syntax:
select count(*) from TableA
left outer join TableB on TableB.Field1 = TableA.Field1
Step 2e: Count the number of rows returned with an outer join to the first table,
e.g.
select count(*) from TableA, TableB
where TableB.Field1 =* TableA.Field1
Here is an alternative SQL syntax:
select count(*) from TableB
left outer join TableA on TableA.Field1 = TableB.Field1
Step 3 : Tabulate the results
Before we apply the algorithm, lets summarize the results.
|
Step
|
Description
|
Variable name
|
|
2a
|
Count the number of rows in the first table
|
CountA
|
|
2b
|
Count the number of rows in the second table
|
CountB
|
|
2c
|
Count the number of rows returned with an inner join
|
CountInnerJoin
|
|
2d
|
Count the number of rows returned with an outer join to the second table
|
CountAOuterB
|
|
2e
|
Count the number of rows returned with an outer join to the first table
|
CountBOuterA
|
Step 4 : Apply the algorithm
Derive the cardinality and optionality. This will be illustrated using diagrams.
For a hypothetical relationship between Table A and Table B:
Place the result of each variable in the designated position:
Evaluate the cardinality near Table B:
Evaluate the cardinality near Table A:
Evaluate the optionality near Table B:
Evaluate the optionality near Table A:
This concludes the theory for derivation of the entity relation between two tables.
Example
Here is an example of the evaluation process using an Oracle database containing
tables “Order” and “Customer”.
The following Oracle SQL script was used to obtain the calculations:
DECLARE
CountA INT;
CountB INT;
CountInnerJoin INT;
CountAOuterB INT;
CountBOuterA INT;
BEGIN
--Calculate CountA
SELECT COUNT(*) INTO CountA
FROM Order;
DBMS_OUTPUT.put_line('CountA = ' || CountA);
--Calculate CountB
SELECT COUNT(*) INTO CountB
FROM Customer;
DBMS_OUTPUT.put_line('CountB = ' || CountB);
--Calculate CountInnerJoin
SELECT COUNT(*) INTO CountInnerJoin
FROM Order A
INNER JOIN Customer B
ON B.CustomerId = A.FK_CustomerId;
DBMS_OUTPUT.put_line('CountInnerJoin = ' || CountInnerJoin);
--Calculate CountAOuterB
SELECT COUNT(*) INTO CountAOuterB
FROM Order A
LEFT OUTER JOIN Customer B
ON B.CustomerId = A.FK_CustomerId;
DBMS_OUTPUT.put_line('CountAOuterB = ' || CountAOuterB);
--Calculate CountBOuterA
SELECT COUNT(*) INTO CountBOuterA
FROM Customer B
LEFT OUTER JOIN Order A
ON A.FK_CustomerId = B.CustomerId;
DBMS_OUTPUT.put_line('CountBOuterA = ' || CountBOuterA);
END;
The DBMS Output was captured as:
CountA = 153768
CountB = 129697
CountInnerJoin = 153768
CountAOuterB = 153768
CountBOuterA = 283462
The calculations are written on the diagram:
Then the evaluation algorithm is applied:
This evaluates to "One and only one customer relates to zero or many orders":
Conclusion
Assuming the database contains sufficient data that represents the intended relationship
between two tables, the entity relation can be deduced using the above documented
algorithm.
Author: Johan Fourie
Date: 14 July 2006
Status: Considering development
|