Minerva Persistency
The Persistency Architecture
of Minerva is the same as the existing Athena persistency Architecture,
except for its C++-specific limitations. Persistency to Root files will be implemented
via JDO API build on top of Java RootIO. This allows a possibility to
switch persistency technologies easily. Persistifiable objects may be
required to implement PersistenceCapable (or similar) Interface; unlike in
C++, this serves just as a label and user is not required to implement any
persistency-related functionality. The OID format and Catalogging/Bookkeeping
(based on a relational DB) should be defined in a language-independent way insuring
interoperability and exchangeability of data between C++ and Java world.
All Converters are based on standard technologies like Java Streaming
API or JDO. They are generic, i.e. they are the same for whatever objects
they manage (it is, however, possible to customise transient-persistent mapping,
usually using XML files). Each Converter Service is just a kind of an Algorithm.
Converters can be used not only for the Persistency Service but also to change
representation of Data or to communicate Data between Processes.
- Support for Objects unsupported by underlying technology ?
- Navigation into Second-Class Objects ?
- Root is needed for interpretation of Root persistency,
StoreGate is needed for interpretation of DataObjects.
TObjects and DataObjects can't exist outside Root and StoreGate.
- Both RootSerialisation and RootCnv have approximately one converter
per Object (often created semi-automaticaly, but heavily expanding
the complexity of the system). It is highly un-trivial to find a proper
converter in an polymorphic environment without realistic Reflection facility.
- Only Objects manageable by both Root and StoreGate
(DataObjects) can be processed. Restrictions are defined
not by the policy, but by the implementation deficiencies.
- There are many intermediate components (converters, Root dictionary,
...) complicating the processing. In theory, everything is automatic; in
practice, such system is very hard to maintain and debug.
- No other IO implementation is available for the storage -
for example XML or Relational DB. This implies that we are locked
in a proprietary solution.
- Root Objects are different from DataObjects. We may not like the idea of
accessing directly Roor Objects, but it will be what people will do in reality
as they are much richer than DataObjects.
- Problems with DataLinks and Buckets are artificial: it doesn't exist on the level
of Roor Objects, it is created by RootCnv and then resolved by StoreGate.
- All interfaces are defined in a language specific way.
- Converters can only be used to perform simple IO. It is not possible to chain
converters and converters can't be used for communication between processes
(which is generalized IO operation).
- StoreGate doesn't have clearly defined mission. Its tasks are
(among others):
- Transient Database (searching, keys,...) - do we need it ?
- InfoBus (sharing objects between Algorithms).
- Fixing C++ problems (DataLinks, Buckets, Garbage Collection...).
- To implement the functionality, StoreGate often violates agreed coding
rules (use of macros,...).
Minerva Solution
Most of the mentioned problems can be easily solved by choosing the right
technology. The proposed solution maintains the file compatibility with
existing solution (and with RootIO file format itself) and it implements
the same Architecture.
The key component of the proposed solution is the replacement of the C++ Conversion Service with the
Java Streaming Service. Other services, not well performed or not performed at all in the current
Design can be satisfied using out-of-box packages, like InfoBus and Java Transaction Service
Java streams use similar concept as C++ iostreams. User just feeds her objects
(data) into a stream or reads them from a stream. There are, however, three
important differences between Java streams and C++ iostreams:
- Java streams do not use special syntax, while C++ iostreams are
mostly accessed using << and >> operators. Java streams are represented
by usual objects, so the syntax of Java streaming looks similar to the syntax
of C-IO (the semantics is, however, more C++-like). The advantage of the
standard syntax is the possibility to manipulate Java stream as any other
- Java streams can be nested (chained, piped). So, for example when
a user wants to just write to a file, she does:
FileOutputStream fos = new FileOutputStream("myfile");
For compressing that file on the fly, she has to add:
FileOutputStream fos = new FileOutputStream("myfile");
GZIPOutputStream gos = new GZIPOutputStream(fos);
And to write out Objects (instead of just text), it's enough to do
(using standard Java serialization file format):
FileOutputStream fos = new FileOutputStream("myfile");
GZIPOutputStream gos = new GZIPOutputStream(fos);
ObjectOutputStream oos = new ObjectOutputStream(gos);
One can easily imagine similar stream using RootIO file format:
FileOutputStream fos = new FileOutputStream("myfile");
GZIPOutputStream gos = new GZIPOutputStream(fos);
RootOutputStream ros = new RootOutputStream(gos);
- Java streams can be used not only to read/write files, but to send data over
the network, to make connection between processes, to talk to hardware.
The implementation of the proposal requires to build two (or more, if needed)
Java Root streams to access RootIO files from Java in the standard Java way.
There are two missing pieces for implementation of Java Root streams:
- Writing part: The library for reading (all) RootIO files already exists,
it has been written by Tony Johnson. Tony is currently working on making
its performance better, he doesn't have time, in a near future, to implement
the writing part. It should be, however, quite straightforward, as the structure
of the files is already known, Tony is willing to help with his advice and
writing is generally easier that reading (as we know what we want to write
and is not necessary to support old versions).
- Conformance to the Java streaming IO: Existing Java RootIO reader
doesn't implement standard Java streaming API.
This solution will further evolve into full-featured OODB solution using existing
Java Data Objects
(JDO) implementations. This will be possible thanks to already existing
implementation of the JDO (Java Data Objects) based on Java Streaming API.
While there is still a bit of work to be done (like definition of OIDs),
this seems to be the most direct way how to implement the true OODB based
on RootIO files.
The advantages of the proposed solutions are obvious, but it's nevertheless
useful to summarize at least the most important ones:
- All converters are generic, there is only one converter per technology
so all the machinery with "automatic" creation of convertors disappears.
- There are other standard streams, which can be immediately used
(Java files, XML files, proprietary files,...).
- There are other JDOs, which can be immediately used.
- It satisfies all Atlas+DB architectural principles; the design is different
as the C++ design is dictated by C++ faults. In particular, Transient-Persistent
separation is automatically available without any need for complex Atlas-specific
code (StoreGate, DataLinks, ...).
- Highly desirable interoperability between Java and C++ via RootIO is easily possible.
All exiting functionality is available, namely:
- Symbolic Links: Irrelevant thanks to Reflection.
- Const-access: Much richer access policy is available using InfoBusPolicyHelper.
- Data Buckets: see Analysis of Reachability Requirements.
- Data Links:Â see Analysis of Reachability Requirements.
- Keys: Much richer interface is available by InfoBus Names and Flavors.
- History Objects: Should be independent on persistency.
There are certainly other alternatives for implementing Java RootIO:
- Using directly C++ RootIO + JNI is possible (maybe even easier).
It would be, however, absurd to carry the whole Root where it is not needed.
We would have to create Root dictionaries and parse C++ with Cint to be able
to use RootIO in Java !
- We could implement JDO directly on top
of RootIO (in Java or even in C++). This may have some performance advantages
(not so many layers), but we would loose a possibility of using Root files
in a simple way as streaming objects. Also, it violates Atlas Event DB Architecture.
We have to satisfy following main services Data Mining is provided on top):
- Reading/Writing Objects from/to files.
- Local and Global Navigability and Addressability:
- local - within unit of storage (file)
- global - between units of storage
- navigability - navigation from one (existing) transient object to another
- addresability - lookup for objects using its identifier).
For the classification purposes, it is useful to define several kinds of objects:
- Second-class Object with Non-data-store identity doesn't
keep its identity in persistent state. It is generally the responsibility
of some First-class Object to restore the identity of the it.
- First-class Object with Data-store identity
keeps its identity in persistent state within its Storage Unit. It's identity is
restored by the DB/Streaming mechanism.
- First-class Object with Application (primary key) identity keeps
its identity in persistent state across Storage Units. It's global
identity is managed by the external service.
Local Navigability |
Global Navigability |
Local Addressability |
Global Addressability |
C++ Streaming (TObjects) |
| |
| |
TES (DataObjects) |
| |
| |
Java Streaming (any) |
| |
| |
JDO (any) |
Releted Objects, which don't belong to the same EDO (and are for that reason
not read in at the same time) are represented as Proxies. Proxies are created
automaticaly, when reference to their original Object has to be written.
When this reference is requested (for the first time), it initiases location
and reading of the original Object. The whole EDO is read at the same time,
so a Proxy is called only once per EDO (or following references are normal
- Proxies should share OIDs.
- How to update all proxies of the same EDO, once first Object is read in ?
- x
J.Hrivnac, Mar'03