Kaivo

Kaivo is an information system framework that has currently only a partial prototype implementation which I developed in my master's thesis. Kaivo supports an architecture for information systems in which the data is always kept in a single data model which is RDF. This main idea implies or supports the following aspects of the Kaivo's architecture:

A problem in modern information systems is that the databases use the relational model, application developers use object oriented languages and users perceive the data on a conceptual level that does not necessarily resemble either of the relational or object oriented data models. There is thus three separate formats for the data, which results to confusion between the system users and developers and also makes the system laborious to implement and maintain.

The relational model separates the physical and logical structure of the data and thus the data can be queried on the logical level and the database system can optimize the query execution on the physical level. Objects in an object oriented language provide more natural structure for the data from a developers perspective, but the logical and physical structure of the objects is the same, which means that queries on objects are hard to optimize.

The good aspects of the relational model and objects of the object oriented languages need to be combined. Currently this is done by using object relational mapping (ORM) frameworks, which does really not solve the problem but makes it manageable.

I propose that by using RDF as the only data model solves the object relational impedance mismatch, which is described above. RDF provides the separation of the logical and physical structure of the data just like the relational model. RDF is still close to the object model provided by object oriented programming languages, thus handling RDF in such languages is close to the managing of ordinary objects of that language. I postulate that RDF is also close to the intuitive conceptual model of data. Data in RDF may thus be understood by ordinary computer users better than objects or relations.

As the data can be assumed to be in RDF, there is no need for imperative code that loads the data in and out of the user interface components. This code is replaced by declarative bindings that define what part of the data is shown and what user interface components are used for the task.

The components can even be developed to understand the semantics of the data by using standard ontologies. A calendar component can for example automatically figure out what dates are related to certain entities and show the entities on a time line while also providing appropriate labels for them without any more specific configuration. This is because the calendar can understand the semantics of some common ontologies such as RDF Calendar and RDF Schema that can be used to describe entities of any application domain.

This allows a two tier architecture in which the user interface and automation can operate directly on the data and the constraints are enforced independently by the database system. In a three tier architecture a programmatic interface encapsulates the logical data model and thus only certain modifications and queries are provided.

The development and maintenance of declarative constraints is easier as the system can take care when the constraints need to be checked and how to do it most efficiently. Each constraint is expressed by only one definition, rather than by code that is scattered through out the middle tier.

As the constraints are expressed on the same logical structure of the data that is shown to the end users, they may be directly understood and even maintained by the domain experts.

Sometimes it is feasible to let only certain automated procedures to manipulate a part of the data. This happens when there is only few well understood operations that need to be executed on the data and many constraints that must be followed when the data is manipulated. In this case the direct low level operations on that data can be denied. But this is not the usual case and thus this should not be generally the only way to access the data.

As RDF is a semistructured data model, all of the constraints, which essentially constitute the schema definition, are optional. As the user interface is bound to the structure of the data, the developer can start by defining some user interface views, enter some data through them and only then add constraints if they are needed. This is essentially the same as setting up columns of a spreadsheet and entering data into them. The developer can aswell start by importing some data and then add user interface views and constraints as needed. Or he can start by defining an ontology, which is the RDF term for the data schema, or choose among existing published ontologies and only then add user interface views and data.

When there is data or constraints, the user interface is easy to define, as the system can suggest possible views ( for example possible columns of a table ) accoring to the data structure, which is defined by the data itself or the constraints on it. Also if there is data or user interface fiews, defining the constraints can be assisted by the system, as the data structure can be seen from the data or the user interface veiws. Obviously, when there is user interface views, entering data through them gives the data the right basic structure, as the views serve as simple schema definition.

Ontology mappings can be queried and organized like ordinary data.

Using standard ontologies provides an organized way of reducing system integration effort in the future.

The difference between a three tier architecture and Kaivo's architecture

The goal of Kaivo

Kaivo tries to provide an information system framework that let's the developer to add complexity only when it is needed. Handling simple information should be as simple as it is to handle it in a spreadsheet. But it should be possible to gradually add complexity as the information and the functional requirementes grow more complex. Spreadsheets make simple information simple to handle, but do not provide support for complex distributed and secure information management. SQL databases scale to complexity but make simple systems unnecessarily laborious to implement.

A big problem in software industry is that it is hard to specify systems when there is no system that could be commented on. The agile software processes try to cope with this problem by releasing small set of functionality often so that the future users of the system can see if their requirements are being met. Kaivo tries to make it as easy as possible to provide visible working functionality that shows the basic structure of the data that is going to be handled in the system. The development can start straight from the user intrface views and example data can be entered and experimented right away without any code, just like in a spreadsheet.

Ultimately the difference between developing and using an information system should be blurred. As the requirements for business change constantly, the information systems should be as agile as possible. The amount of code should be minimized so that the system users could gradually learn more and more of the system and make bigger and bigger changes when needed. There is a huge leap between using and developing modern information systems, but it does not have to be so. Of course it takes dicipline to develop complex systems even if the tecnical details would be trivial, but this does not mean that it should be technically as laborious as it is today to develop them.

Kaivo does not try to solve the complexity that comes from automation that operates over diverse data silos. It only removes unnecessary complexity from storing, viewing and manipulating the data. Complex data and complex computations are complex and this complexity can not be avoided. In modern software development environments there is just too much unnecessary details that make understanding the real complexity even harder.

The master's thesis

Kaivo is the subject of my master's thesis, which can be found here:

Reducing the implementation effort of information systems by using a single RDF like data model

15 minute presentation

Here are the slides of a short presentation that I gave concerning my master's thesis:

master's thesis presentation

Data Bound User Interface Components for RDF Data

This is an article that I wrote for the XML Finland 2007 seminar. It describes the User interface paradigm of Kaivo.

Data Bound User Interface Components for RDF Data

XML Finland 2007 slides

These are the slides for my presentation in the XML Finland 2007 seminar. They are in finnish, but there are lot's of beatifull hand crafted pictures :)

XML Finland 2007 slides PDF

XML Finland 2007 slides ODP ( with the fancy "appearance" effects )

Screen shots

Screen shots of the prototype are collected to the this page:

screen shots of Kaivo

Screen casts

Managing structured information with RDF and Kaivo

Discussion forum

To discuss Kaivo, post to the kaivo group in groups.google.com (I'm open for suggestions of a better place for discussions):

Kaivo group

Other similar or related systems

OpenRecord

Dabble DB

Haystack

DBin

Tabulator

SEEQ

The source code

The Kaivo source can be downloaded here:

kaivo_0.1.zip

This is code is only a prototype and is not indented to be a public development project!

I have only run it on two of my Debian Linux boxes. It has dependencies to at least the following packages with the corresponding Ruby bindings:

To run the database and the GUI unzip the source package and say:

             ruby gui.rb foo

where “foo” is the name of your new database. Kaivo creates a directory named “foo”, creates Berkeley DB environment in to it and opens a GUI for the new database. The same command is later used to open the GUI again for the same database.

About the author

My contact information and CV can be found here.

Recently I have been hanging around in semantic web interest groups IRC channel #swig in irc.freenode.net with the nick “juvi”