Thursday, May 21, 2009

[GSoC09] DB tryouts (Derby & BDB)

Premise
In this entry, I'll describe my experience in installing, configuring and applying of both database management systems, including Apache Derby and Oracle (SleepyCat) Berkeley DB.

Background Info
Derby
Derby is one of Apache's many successful entities known in the community of Java developers. It is a Java-based, embedded, JDBC driven SQL database. It also includes a client and Network Database server for client/server implementations.
Berkeley DB (BDB)
SleepyCat's Berkeley DB is Oracle's attempt at a fully embedded non-SQL database, including no ad hoc queries, but many quick indexing remedies for querying records. It's targeted for application environments that want to quickly implement an object-relational database management system with very little footprint and less dependent on DB administration. It has everything that you would expect of a database system and promises optimal solutions to retrieval and transactional operations.

Installation & Configuration
*Both platforms require some form of JRE ~v1.5 or above.
Derby
The retrieval and installation process of Derby was easy, simply download the tar/zip file, extract it to the location you want, then update your environment variables to include DERBY_HOME, and add a path to the executables (DERBY_HOME/bin). more
BDB
The download and installation process of BDB wasn't difficult either. It only required an additional environment variable (JE_HOME) pointing to the path of the downloaded package. more

Comparison
Derby
Derby provides an internal interactive SQL scripting tool, called 'ij' used to connect to an existing database, or create a new database. Coding in Derby is very SQL scripting heavy and it requires a planned out schema. Loading and connecting to Derby DB is simple, you just load the appropriate embedded or client driver (depending on your application). You can also pass ad hoc queries, basic SQL statements for creating, updating, or deleting (all fundamental CRUD). Derby is a great RDBMS for settings that require ad hoc queries, and for optimal access to SQL databases.
BDB
BDB is a whole different approach, in that it implements a Java DB without the need of SQL. This object-relational database approach offers more direct interactions of object based models. Actually BDB is stored by object values, rather than by traditional object reference which models similar benefits from EJB persistance APIs. Programming in BDB is quite intuitive and requires that you have strong knowledge in manipulating Java objects via understanding of Java APIs, especially with the Java Collections framework (since this applies to 'Cursor ' iterations through records stored in BDB). I have to agree, in theory with BDB's storage and indexing approach, since it does provide a low cache way of retrieving raw data and instantiating objects seem seamless. I wonder how BDB memory caching features perform once it goes against more storage data input. If I do decide to use BDB, I hope that I won't need to roll out my own solution for tuning the cache. Of course, this over bloating of memory depends solely with the complexity of the application and my usage of BDB. After coding a little in BDB, I also noticed some similarities to Hibernate since they have similar implementations using EJB and Java's Persistence APIs. Although BDB isn't a standard with most developers, since its approach to an embedded database requires no SQL seems unorthodox; I feel that its structural aspect and its targeted purpose of increasing performance in object relational coding feels much more natural.

The Verdict
At this point in time, I'm falling towards using BDB since the solution is quite simple and unobtrusive to the network environment setting. It will require less configuration on the administrative side, yet it gives more control over the configuration directly through the application itself. The other reason is that the class file structure is already in good standing for a BDB implementation. There will be no need for creating a client driver connection or bringing up a Network Server, as I previously planned on doing by using Derby.

The 'Vision' Document (unofficial)
The vision hasn't changed much since my first entry about Devcathlon. We are still pushing for a workable application that can be used by developers as an interactive assessment tool to better software project environments. At the moment no changes have been made to the current code base, rather more planning and testing have occured locally for assurance.
Right now, deciding on a preferable DB is the question at hand. Considering my research and tests, I would recomend using BDB as a possible DB implementation for Devcathlon. The DB implementation is pretty much a way of incorporating a persistent back-end for Devcathlon. I've yet to apply any specifics on future improvements, but I feel the need to get most of the targeted functions to work first in Devcathlon v1.0.

Plans
I'll need some time to develop a 'spike solution' of BDB with our current implementation of Devcathlon. This shouldn't be difficult considering that most of our models built in Wicket for Devcathlon are neatly separate object entities, so it makes for a perfect setting for using BDB.
I'll also be revising my vision document, since little has been put into making this a 'viral' record.
Also, since I'll be revising the application many times over, the use of a review (summary) at this time will not be committed until after a DB solution for Devcathlon is released.
*Notes on development environment: Since the project code base is housed via Google's Project Host, I've decided that we should create a branch for our DB solution. In addition to our branch, we should have a live development box that mimicks a production environment for shared consistency and remote testing.

3 comments:

austen.ito said...

I wonder if a non-sql based approach is the most flexible. With a regular relational database, you quickly write SQL to retrieve information. Can you do the same with BDB? Or do you have to manipulate everything through Java? SQL isn't the same across relational databases, but it can be ported. I'm not so sure it will be easy to move from BDB to another database.

Also with a relational database, you can always use Hibernate or something similar as your ORM. This would allow you to abstract the SQL away from you Java code.

I have no experience with BDB, but from what you've written, I'm not convinced it's the best approach.

Also, what are your thoughts on a normal relational database like postgres/oracle instead of an embedded db like derby?

austen.ito said...

(adding a post to receiving emails on follow up comments)

Anthony Du said...

First off, I agree with you about SQL being portable and much more standardized in many localized settings. However, since this project calls for low administrative access (zero, really) and local persistence, I decided to choose BDB because of that. But then again, you're right about using Hibernate as a SQL DB framework, which does make it easier to link relational databases. In the long run it would be beneficial to use a SQL relational DB, rather than BDB. I was a little skeptic too, but considering the small scope of this project so far, I wanted to get something out quick without dealing with SQL or having to touch any administrative configurations.

I've had experience in using PostgreSQL in RoR, along with MySQL within the same environment. PostgreSQL is more fully-featured than compared to MySQL, yet MySQL is still more popular and widely used. As for Oracle, I have no prior knowledge in using or experience in any enterprise-related (Oracle) products until now.

Thanks for putting this into perspective. I might have to rethink about a long-term advantage to using an ORM on top of a relational SQL DB.