Sunday, January 13, 2008

Managing Hierarchical Data (Tree) in Relational Database with JPA

You can't fit an elephant into a matchbox. If hierarchical data model is an elephant then relational database is sure not bigger than a matchbox. I guess if you read this you'd agree that keeping a tree in relational database requires a lot of work. There are at least 2 distinct methods to represent hierarchies (or trees) with relational data: the adjacency list model and the nested set model. I believe that latter is a poor candidate to use with ORM-based technologies due to its set-oriented nature. The adjacency list model is more intuitive to understand and is a good fit for ORM technology. Using both we can introduce some optimization tricks to make our life easier. Below I present an implementation using JPA.

We define a JPA entity Division in accordance with the adjacency list model:

@Entity(name = "Division")
@Table(name = "DIVISION")
@EntityListeners( { HierarchyListener.class } )
public class Division implements IHierarchyElement {
    ...
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "global_id_gen")
@SequenceGenerator(name = "global_id_gen", sequenceName = "GLOBAL_ID_GEN")
@Column(name = "ID")
    public Long getId() {
        return id;
}

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "PARENT_ID")
    public Division getParent() {
        return parent;
}

@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "ORG_ID", nullable = false)
public Organization getOrganization() {
  return organization;
}

    @Basic(optional = false)
    @Column(name = "LEVEL", nullable = false)
    public Short getLevel() {
        return level;
    }

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "TOP_ID")
    public Division getTop() {
        return top;
}

    public void setLevel(Short theLevel) {
 level = theLevel;
}

    public void setTop(IHierarchyElement theTop) {
 top = (Division) theTop;
}
...
}

for the following table (PostgreSQL flavor):
CREATE TABLE division (
  id INTEGER DEFAULT nextval('global_id_gen') PRIMARY KEY,

  name VARCHAR(128) NOT NULL,
  org_id INTEGER NOT NULL,
  parent_id INTEGER,
  top_id INTEGER NOT NULL,
  level SMALLINT NOT NULL DEFAULT 0,

  UNIQUE (name, org_id),

  CONSTRAINT div_org_fk FOREIGN KEY(org_id)
    REFERENCES organization(id),
  CONSTRAINT div_parent_fk FOREIGN KEY(parent_id)
    REFERENCES division(id),
  CONSTRAINT div_top_fk FOREIGN KEY(top_id)
    REFERENCES division(id),

  CHECK ( (parent_id IS NULL AND level = 0 AND top_id = id) OR
     (parent_id IS NOT NULL AND level > 0 AND top_id <> id)
   )
)
The hierarchical data consists of divisions per organization. Organization may have multiple top divisions. That's why there is an ORG_ID that points to organization this division belongs to. Thus, organization entity serves as a top of the tree of divisions. Table DIVISION uses PARENT_ID column to reference its parent division - so PARENT_ID is nullable (in case of top division whose parent is an organization).

Last piece to the puzzle is the purpose of columns LEVEL and TOP_ID. They are our optimization additions to adjacency list model. LEVEL is an integer that stores level of division in hierarchy: top division (with no parent division and parent organization) has level 0; its children have level 1 and so on. TOP_ID always references top-most division in the hierarchy for its division. If division is top-most (its parent is an organization) then it references itself.

What kind of optimization do we get with LEVEL and TOP_ID? They are not necessary for adjacency list model as PARENT_ID is sufficient. But having more information about tree stored with the data we have more flexibility and easier job in manipulating or reporting on hierarchical data. Thus, I can extract whole sub-tree for any top-most division or answer questions about hierarchy depth using LEVEL with no expensive cursors. It could be that you come up with optimization of your own depending on specific needs.

Back to entity definition in Java. The interface IHierarchyElement is all any hierarchical entity needs to implement and support hierarchical data:

public interface IHierarchyElement {

  IHierarchyElement getParent();

  Short getLevel();

  void setLevel(Short level);

  IHierarchyElement getTop();

  void setTop(IHierarchyElement top);
}

And this is all to it to have an entity support hierarchical model in the relational database. Now we can use one of standard JPA features - callback lifecycle listener class (lookup its usage in the annotations to Division entity above) to maintain data for adjacency list model (for illustration purpose code below is stripped of all error handling):
public class HierarchyListener {

    @PrePersist
    @PreUpdate
    public void setLevelAndTop(IHierarchyElement entity) {

      final IHierarchyElement parent = entity.getParent();

      // set level
      if (parent == null) {
          entity.setLevel((short) 0);
      } else {
          entity.setLevel((short) (parent.getLevel() + 1));
      }

      // set top
      if (parent == null) {
          entity.setTop(entity);
      } else {
          entity.setTop(parent.getTop());
      }
    }
}

This pattern is a combination of specialized interface IHierarchyElement and a callback HierarchyListener class. Using them our goal is to implicitly maintain hierarchical data for JPA entities. When a hierarchical data is shared between many entities in domain model we can claim that this pattern factors out hierarchical aspect of data to focus on business-related semantic in the domain model.

Managing OneToMany relationship in JPA

Now classic book Java Persistence with Hibernate addresses OneToMany relationship in detail. However, I still struggled to find a way to maintain the list of children from the parent entity. The problem appeared while removing children from the list.

Example:
@Entity
@Table(name = "PARENT")
public class Parent implements Serializable {
...
@OneToMany(cascade = { CascadeType.PERSIST,
CascadeType.MERGE },
mappedBy = "parent")
private Set getChildren() {
return children;
}
...
}

@Entity
@Table(name = "CHILD")
public class Child implements Serializable {
...
@ManyToOne

@JoinColumn(name = "PARENTID",
nullable = true)

private Parent getParent() {
return parent;
}
...
}
The problem: I would like to maintain children by manipulating set of children held in parent entity in detached state. After new children are added and/or some of its existing children are removed, parent entity is updated (merged) and with it set of children is updated as well.

Note, that child entity may simply be unassigned from its parent to exist with no parent (nullable = true): by removing child from children list we are not removing child entity from persistent context. This case is more general than one when child is simply removed when unassigned from its parent.

Also note, that child class should implement equals() and hashCode() based on values that uniquely identify each instance. Of course, this is simply JPA best practice.

The following classic solution by the book handles adding new children flawlessly:
@Entity
@Table(name = "PARENT")
public class Parent implements Serializable {
...
@OneToMany(cascade = { CascadeType.PERSIST,
CascadeType.MERGE },
mappedBy = "parent")
private Set getChildren() {
return children;
}

public void addChild(Child theChild) {
theChild.setParent(this);
children.add(theChild);
}
...
}

public class ParentManager {
...
@Transactional(propagation = Propagation.REQUIRED,
readOnly = false)
public void updateParent(Parent parent) {
parentDAO.update(parent);
}

...
}

But if I decide to remove children from the list then this will not have any effect on database as merge operation on parent entity (update) will happily restore removed children. You can search various JPA forums on this topic, e.g.
http://jira.jboss.org/jira/browse/EJBTHREE-941
http://forum.java.sun.com/thread.jspa?threadID=5145294&tstart=210
http://forums.oracle.com/forums/thread.jspa?messageID=1707487

The implementation I propose is not confined to JPA entity classes - it requires coding at higher level of persistent context (where EntityManager is used). This is usually handled at session bean or business manager levels (depending on if I use EJB3 or POJO frameworks):

@Entity
@Table(name = "PARENT")
public class Parent implements Serializable {
...
@OneToMany(cascade = { CascadeType.PERSIST,
CascadeType.MERGE },
mappedBy = "parent")
private Set getChildren() {
return children;
}

@Transient
public Set getUnassignedChildren() {
return unassignedChildren;
}

public void addChild(Child theChild) {
theChild.setParent(this);
children.add(theChild);
}

public void unassignChild(Child theChild) {
if (children.remove(theChild)) {
theChild.setParent(null);
unassignedChildren.add(theChild);

}
}
...
}
And Parent manager update:
public class ParentManager {
...
@Transactional(propagation = Propagation.REQUIRED,
readOnly = false)
public void updateParent(Parent parent) {
if (!parent.getUnassignedChildren().isEmpty()) {
for (Child child : parent.getUnassignedChildren()) {
childManager.update(child);
}
}
parentDAO.update(parent);
}

...
}
Thus, any change to set of children using addChild() and unassignChild() is guaranteed to propagate to the database via single update to parent entity - problem solved.

Friday, January 11, 2008

Java Persistence API (JPA) Acceptance

JPA is arguably one of the best Java enterprise technologies introduced in last several years. The evolution of entity EJBs is hardly a happy path for other technologies to follow. But fact of the matter is it resulted in JPA which at least partially redeems its previous failures and half-failures. Of course, Gavin King should be credited, at least partially, as a savior for the J2EE persistence technology for its widely accepted Hibernate. But Sun had guts to discard its own ideas and re-position EJB3 and JPA as best of breed industry standard.

At the same time hardly anyone will argue that JPA acceptance is not where it should be for such advanced technology. There are couple of reasons for this:

1. JPA presents a large shift for EJB persistence. Both programmers who used entity beans before and programmers who discarded entity beans as not practical and sticked to JDBC-based APIs do not jump on JPA for the same reason: it presents a steep learning curve (sorry for mathematical nonsense) for both camps.

2. JPA is largely based on ORM technologies such as Hibernate which have very strong following of its own. It would be hard to convince a Hibernate developer to switch to JPA just because it's a new standard. Hibernate 3 is more flexible and feature rich. After all, with introduction of JPA Hibernate takes place of legacy code and legacy code as we all know is here to stay.

Thus, adoption of JPA is probably slower than it deserves but it should not scare away people who are evaluating it for new projects or consider upgrading now legacy ORM implementations. JPA will eventually take place of JDBC and ORM technologies by making JDBC stack as invisible tomorrow as TCP/IP is to JDBC clients today.