## Sunday, January 13, 2008

### Managing Hierarchical Data (Tree) in Relational Database with JPA

You can't fit an elephant into a matchbox. If hierarchical data model is an elephant then relational database is sure not bigger than a matchbox. I guess if you read this you'd agree that keeping a tree in relational database requires a lot of work. There are at least 2 distinct methods to represent hierarchies (or trees) with relational data: the adjacency list model and the nested set model. I believe that latter is a poor candidate to use with ORM-based technologies due to its set-oriented nature. The adjacency list model is more intuitive to understand and is a good fit for ORM technology. Using both we can introduce some optimization tricks to make our life easier. Below I present an implementation using JPA.

We define a JPA entity Division in accordance with the adjacency list model:

@Entity(name = "Division")
@Table(name = "DIVISION")
@EntityListeners( { HierarchyListener.class } )
public class Division implements IHierarchyElement {
...
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "global_id_gen")
@SequenceGenerator(name = "global_id_gen", sequenceName = "GLOBAL_ID_GEN")
@Column(name = "ID")
public Long getId() {
return id;
}

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "PARENT_ID")
public Division getParent() {
return parent;
}

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "ORG_ID", nullable = false)
public Organization getOrganization() {
return organization;
}

@Basic(optional = false)
@Column(name = "LEVEL", nullable = false)
public Short getLevel() {
return level;
}

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "TOP_ID")
public Division getTop() {
}

public void setLevel(Short theLevel) {
level = theLevel;
}

public void setTop(IHierarchyElement theTop) {
top = (Division) theTop;
}
...
}


for the following table (PostgreSQL flavor):
CREATE TABLE division (
id INTEGER DEFAULT nextval('global_id_gen') PRIMARY KEY,

name VARCHAR(128) NOT NULL,
org_id INTEGER NOT NULL,
parent_id INTEGER,
top_id INTEGER NOT NULL,
level SMALLINT NOT NULL DEFAULT 0,

UNIQUE (name, org_id),

CONSTRAINT div_org_fk FOREIGN KEY(org_id)
REFERENCES organization(id),
CONSTRAINT div_parent_fk FOREIGN KEY(parent_id)
REFERENCES division(id),
CONSTRAINT div_top_fk FOREIGN KEY(top_id)
REFERENCES division(id),

CHECK ( (parent_id IS NULL AND level = 0 AND top_id = id) OR
(parent_id IS NOT NULL AND level > 0 AND top_id <> id)
)
)

The hierarchical data consists of divisions per organization. Organization may have multiple top divisions. That's why there is an ORG_ID that points to organization this division belongs to. Thus, organization entity serves as a top of the tree of divisions. Table DIVISION uses PARENT_ID column to reference its parent division - so PARENT_ID is nullable (in case of top division whose parent is an organization).

Last piece to the puzzle is the purpose of columns LEVEL and TOP_ID. They are our optimization additions to adjacency list model. LEVEL is an integer that stores level of division in hierarchy: top division (with no parent division and parent organization) has level 0; its children have level 1 and so on. TOP_ID always references top-most division in the hierarchy for its division. If division is top-most (its parent is an organization) then it references itself.

What kind of optimization do we get with LEVEL and TOP_ID? They are not necessary for adjacency list model as PARENT_ID is sufficient. But having more information about tree stored with the data we have more flexibility and easier job in manipulating or reporting on hierarchical data. Thus, I can extract whole sub-tree for any top-most division or answer questions about hierarchy depth using LEVEL with no expensive cursors. It could be that you come up with optimization of your own depending on specific needs.

Back to entity definition in Java. The interface IHierarchyElement is all any hierarchical entity needs to implement and support hierarchical data:

public interface IHierarchyElement {

IHierarchyElement getParent();

Short getLevel();

void setLevel(Short level);

IHierarchyElement getTop();

void setTop(IHierarchyElement top);
}


And this is all to it to have an entity support hierarchical model in the relational database. Now we can use one of standard JPA features - callback lifecycle listener class (lookup its usage in the annotations to Division entity above) to maintain data for adjacency list model (for illustration purpose code below is stripped of all error handling):
public class HierarchyListener {

@PrePersist
@PreUpdate
public void setLevelAndTop(IHierarchyElement entity) {

final IHierarchyElement parent = entity.getParent();

// set level
if (parent == null) {
entity.setLevel((short) 0);
} else {
entity.setLevel((short) (parent.getLevel() + 1));
}

// set top
if (parent == null) {
entity.setTop(entity);
} else {
entity.setTop(parent.getTop());
}
}
}


This pattern is a combination of specialized interface IHierarchyElement and a callback HierarchyListener class. Using them our goal is to implicitly maintain hierarchical data for JPA entities. When a hierarchical data is shared between many entities in domain model we can claim that this pattern factors out hierarchical aspect of data to focus on business-related semantic in the domain model.