Minerva Persistency

Overview
Problem with the Athena Persistency
Minerva Solution
Analysis of Reachability Requirements

Overview

The Persistency Architecture of Minerva is the same as the existing Athena persistency Architecture, except for its C++-specific limitations. Persistency to Root files will be implemented via JDO API build on top of Java RootIO. This allows a possibility to switch persistency technologies easily. Persistifiable objects may be required to implement PersistenceCapable (or similar) Interface; unlike in C++, this serves just as a label and user is not required to implement any persistency-related functionality. The OID format and Catalogging/Bookkeeping (based on a relational DB) should be defined in a language-independent way insuring interoperability and exchangeability of data between C++ and Java world.

All Converters are based on standard technologies like Java Streaming API or JDO. They are generic, i.e. they are the same for whatever objects they manage (it is, however, possible to customise transient-persistent mapping, usually using XML files). Each Converter Service is just a kind of an Algorithm. Converters can be used not only for the Persistency Service but also to change representation of Data or to communicate Data between Processes.

Support for Objects unsupported by underlying technology ?
Navigation into Second-Class Objects ?

Problems with the Athena Persistency

Root is needed for interpretation of Root persistency, StoreGate is needed for interpretation of DataObjects. TObjects and DataObjects can't exist outside Root and StoreGate.
Both RootSerialisation and RootCnv have approximately one converter per Object (often created semi-automaticaly, but heavily expanding the complexity of the system). It is highly un-trivial to find a proper converter in an polymorphic environment without realistic Reflection facility.
Only Objects manageable by both Root and StoreGate (DataObjects) can be processed. Restrictions are defined not by the policy, but by the implementation deficiencies.
There are many intermediate components (converters, Root dictionary, ...) complicating the processing. In theory, everything is automatic; in practice, such system is very hard to maintain and debug.
No other IO implementation is available for the storage - for example XML or Relational DB. This implies that we are locked in a proprietary solution.
Root Objects are different from DataObjects. We may not like the idea of accessing directly Roor Objects, but it will be what people will do in reality as they are much richer than DataObjects.
Problems with DataLinks and Buckets are artificial: it doesn't exist on the level of Roor Objects, it is created by RootCnv and then resolved by StoreGate.
All interfaces are defined in a language specific way.
Converters can only be used to perform simple IO. It is not possible to chain converters and converters can't be used for communication between processes (which is generalized IO operation).
StoreGate doesn't have clearly defined mission. Its tasks are (among others):
- Transient Database (searching, keys,...) - do we need it ?
- InfoBus (sharing objects between Algorithms).
- Fixing C++ problems (DataLinks, Buckets, Garbage Collection...).
To implement the functionality, StoreGate often violates agreed coding rules (use of macros,...).

Minerva Solution

Most of the mentioned problems can be easily solved by choosing the right technology. The proposed solution maintains the file compatibility with existing solution (and with RootIO file format itself) and it implements the same Architecture.

The key component of the proposed solution is the replacement of the C++ Conversion Service with the Java Streaming Service. Other services, not well performed or not performed at all in the current Design can be satisfied using out-of-box packages, like InfoBus and Java Transaction Service (JTA+JTS).

Java streams use similar concept as C++ iostreams. User just feeds her objects (data) into a stream or reads them from a stream. There are, however, three important differences between Java streams and C++ iostreams:

Java streams do not use special syntax, while C++ iostreams are mostly accessed using << and >> operators. Java streams are represented by usual objects, so the syntax of Java streaming looks similar to the syntax of C-IO (the semantics is, however, more C++-like). The advantage of the standard syntax is the possibility to manipulate Java stream as any other object.

Java streams can be nested (chained, piped). So, for example when a user wants to just write to a file, she does:


FileOutputStream fos = new FileOutputStream("myfile");
fos.write("blabla");

For compressing that file on the fly, she has to add:


FileOutputStream fos = new FileOutputStream("myfile");
GZIPOutputStream gos = new GZIPOutputStream(fos);
fos.write("blabla");

And to write out Objects (instead of just text), it's enough to do (using standard Java serialization file format):


FileOutputStream fos = new FileOutputStream("myfile");
GZIPOutputStream gos = new GZIPOutputStream(fos);
ObjectOutputStream oos = new ObjectOutputStream(gos);
oos.write(myObject);

One can easily imagine similar stream using RootIO file format:


FileOutputStream fos = new FileOutputStream("myfile");
GZIPOutputStream gos = new GZIPOutputStream(fos);
RootOutputStream ros = new RootOutputStream(gos);
ros.write(myObject);

Java streams can be used not only to read/write files, but to send data over the network, to make connection between processes, to talk to hardware.

The implementation of the proposal requires to build two (or more, if needed) Java Root streams to access RootIO files from Java in the standard Java way. There are two missing pieces for implementation of Java Root streams:

Writing part: The library for reading (all) RootIO files already exists, it has been written by Tony Johnson. Tony is currently working on making its performance better, he doesn't have time, in a near future, to implement the writing part. It should be, however, quite straightforward, as the structure of the files is already known, Tony is willing to help with his advice and writing is generally easier that reading (as we know what we want to write and is not necessary to support old versions).
Conformance to the Java streaming IO: Existing Java RootIO reader doesn't implement standard Java streaming API.

This solution will further evolve into full-featured OODB solution using existing Java Data Objects (JDO) implementations. This will be possible thanks to already existing implementation of the JDO (Java Data Objects) based on Java Streaming API. While there is still a bit of work to be done (like definition of OIDs), this seems to be the most direct way how to implement the true OODB based on RootIO files.

The advantages of the proposed solutions are obvious, but it's nevertheless useful to summarize at least the most important ones:

All converters are generic, there is only one converter per technology so all the machinery with "automatic" creation of convertors disappears.
There are other standard streams, which can be immediately used (Java files, XML files, proprietary files,...).
There are other JDOs, which can be immediately used.
It satisfies all Atlas+DB architectural principles; the design is different as the C++ design is dictated by C++ faults. In particular, Transient-Persistent separation is automatically available without any need for complex Atlas-specific code (StoreGate, DataLinks, ...).
Highly desirable interoperability between Java and C++ via RootIO is easily possible.

All exiting functionality is available, namely:

Symbolic Links: Irrelevant thanks to Reflection.
Const-access: Much richer access policy is available using InfoBusPolicyHelper.
Data Buckets: see Analysis of Reachability Requirements.
Data Links:Â see Analysis of Reachability Requirements.
Keys: Much richer interface is available by InfoBus Names and Flavors.
History Objects: Should be independent on persistency.

There are certainly other alternatives for implementing Java RootIO:

Using directly C++ RootIO + JNI is possible (maybe even easier). It would be, however, absurd to carry the whole Root where it is not needed. We would have to create Root dictionaries and parse C++ with Cint to be able to use RootIO in Java !
We could implement JDO directly on top of RootIO (in Java or even in C++). This may have some performance advantages (not so many layers), but we would loose a possibility of using Root files in a simple way as streaming objects. Also, it violates Atlas Event DB Architecture.

Analysis of Reachability Requirements

We have to satisfy following main services Data Mining is provided on top):

Reading/Writing Objects from/to files.
Local and Global Navigability and Addressability:
- local - within unit of storage (file)
- global - between units of storage
- navigability - navigation from one (existing) transient object to another
- addresability - lookup for objects using its identifier).

For the classification purposes, it is useful to define several kinds of objects:

Second-class Object with Non-data-store identity doesn't keep its identity in persistent state. It is generally the responsibility of some First-class Object to restore the identity of the it.
First-class Object with Data-store identity keeps its identity in persistent state within its Storage Unit. It's identity is restored by the DB/Streaming mechanism.
First-class Object with Application (primary key) identity keeps its identity in persistent state across Storage Units. It's global identity is managed by the external service.

Local Navigability	Global Navigability	Local Addressability	Global Addressability
C++ Streaming (TObjects)
TES (DataObjects)
Java Streaming (any)
JDO (any)

Releted Objects, which don't belong to the same EDO (and are for that reason not read in at the same time) are represented as Proxies. Proxies are created automaticaly, when reference to their original Object has to be written. When this reference is requested (for the first time), it initiases location and reading of the original Object. The whole EDO is read at the same time, so a Proxy is called only once per EDO (or following references are normal references).

Proxies should share OIDs.
How to update all proxies of the same EDO, once first Object is read in ?

ToDo:

J.Hrivnac, Mar'03