Monday, February 8, 2010

Is HyperGraphDB an Object-Oriented Database?

Back in the 90s, the "killer" of RDBMs were presumed to be the ODBMs. Today it is NoSQL. Why are RDBMs a prey to be killed, and why should any other approach be a voracious predator rather than a gentle companion has never been clear to me. Industry fads are always a bit ridiculous in retrospective, but fortunately the technical advances that fuel them at the beginning often follow their independent paths eventually contributing their fair share to our beloved profession. Strangely, OO databases are now being categorized as NoSQL. So the old and new predators join forces in a cooperative onslaught. Why not? Whatever it takes to get the crowd's attention. Since HyperGraphDB was announced as a graph database, it fits the NoSQL bill, that's good for promotion. But we've received "criticism" in the past that it was actually more of an OO database than a graph database, so why not call it that simply?

Well, for starters objects in memory form a graph, so at a certain abstraction level we are talking about essentially the same thing. But more interestingly, does HyperGraphDB fit the accepted definition of what constitutes an objent-oriented database. According to the ODMBS.ORG:

"An object database management system (ODBMS, also referred to as object-oriented database management system or OODBMS), is a database management system (DBMS) that supports the modelling and creation of data as objects. This includes some kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their objects."

The Object-Oriented Database System Manifesto from 1995 is still the main reference for the core features of an OO database. So let's examine (admittedly, a bit crudely) that paper's defining list and see how it applies to HyperGraphDB:
  1. Complex Objects built from simpler ones by applying constructors to them. HyperGraphDB has that - type constructors are fundamental to representing complex values.
  2. Object Identity an object has an existence which is independent of its value. This means one can change the value while preserving the identity. The authors note " that identity-based models are the norm in imperative programming languages: each object manipulated in a program has an identity and can be updated. This identity either comes from the name of a variable or from a physical location in memory. But the concept is quite new in pure relational systems, where relations are value-based." HyperGraphDB has identity at its very basis: the atom handle is like a memory location in a universal addressing space. Atom identity in HyperGraph is in fact more fundamental than anything else.
  3. Encapsulation, which in a database context is taken to mean "that an object encapsulates both program and data". This is supported via Java. When storing Java objects, the current implementation does not store the program part (the bytecode of a class' methods) because there's no really a need for it. Naturally, this wouldn't be hard to achieve with a different set of type constructors that do store the program. In fact, this is something that we plan to do with Seco.
  4. Types and Classes - the system should offer some form of data structuring mechanism, be it classes or types. Thus the classical notion of database schema will be replaced by that of a set of classes or a set of types. The distinction between types & classes comes into play mostly at the Java level (HyperGraphDB's "host language" at the moment). Nevertheless, HyperGraphDB's types cover both the notion of a class as a factory of objects with a well-defined extent and of a type as a semantic notion obeying certain composition rules. The core notion of substitutability is expressed with HGSubsumes links. More extensive checking and enforcement is something left for actual type & type constructor implementations.
  5. Class or Type Hierarchies with various forms of inheritance being distinguished by the authors - substitution, inclusion, constraint and specialisation. HyperGraphDB has a type hierarchy with multiple inheritance via multiple HGSubsumes links between types, but it doesn't make such fine-grained distinctions between the different kinds. Such distinctions are left open to the application. When mapping Java classes to HyperGraphDB types, the HGSubsumes link created between a class and its parent corresponds to "specialisation inheritance". A HGSubsumes link between a class and an implemented interface may correspond to any/all of the other kinds.
  6. Overriding, overloading and late-binding are notions at the programming language level that usually apply to operations rather than data and as such are supported only to the extent that HyperGraphDB is being used from an OO language (Java). At the data level, we note that an object property is always fully stored, regardless of its declared type. For instance, if a bean has a property of declared type A, but the actual value is of a subclass B, B will be used as the stored type instead of A. So overriding is supported. In addition, HyperGraphDB supports properties with the same name but different types within a single record: one could have a property "x" of type int and a property "x" of type String within the same complex type. So, overloading is supported as well!
  7. Computation Completeness is required, but the authors "are not advocating here that designers of object-oriented database systems design new programming languages: computational completeness can be introduced through a reasonable connection to existing programming languages"... which HyperGraphDB does, again via the JVM.
  8. Extensibility is required in the following sense: there is a means to define new types and there is no distinction in usage between system defined and user defined types. HyperGraphDB's type system is open and extensible from the very high-level type-constructor-constructors...down to the primitive types which could be replaced as well. So this requirement is met with applause.
  9. Persistence should be orthogonal, i.e., each object, independent of its type, is allowed to become persistent as such (i.e., without explicit translation). It should also be implicit: the user should not have to explicitly move or copy data to make it persistent. Yep, check-mark, we've got it.
  10. Secondary storage management with "clear independence between the logical and the physical level of the system". Check-mark here too.
  11. Concurrency - yes.
  12. Recovery - yes, thanks to the very reliable BerkeleyDB.
  13. Ad Hoc Query Facility which lets you express non-trivial queries concisely, is efficient and it's application independent. HyperGraphDB meets that requirement, but not with flying colors at this point. More mature DBs have better querying capabilities and we hope to get there soon.
In conclusion, HyperGraphDB is a full-fledged OO database according to the most official definition.

Cheers,
Boris

PS: Perhaps the most prominent OO database in the Java world these days is db4o. I haven't used it, but skimming through tutorials and docs, I don't see what it can do that HyperGraphDB can't. Their querying options might be better (the native queries are quite an advanced concept), and the optimizer might be more advanced, but besides that I challenge readers to tell us what HGDB is missing as a competitor in the object-oriented database space?

2 comments:

  1. What ever happened to JavaSpaces [http://java.sun.com/developer/technicalArticles/tools/JavaSpaces/].. I always thought this was the most genius technology. Create an object that you want---set the fields that you want as wildcards to null. The JavaSpace repository returns all objects that match that query object's criteria (based on its fields). Very simple... I suppose MongoDB's query mechanism is analogous and more powerful. Still---go JavaSpaces.

    ReplyDelete
  2. It didn't take off probably because of the same reason as many other Sun Java initiatives failed: sound architecture, but bad implementation. I remember it and I think GigaSpaces is now doing the same and being quite popular. Query by example is indeed a very nice way to retrieve objects, but it has to be complimented with another powerful (and complex) machanism and JavaSpaces didn't do that. Hibernate has it, db4o has it, but they both have means to do more advanced queries with predictable results.

    ReplyDelete