Distributed, Decentralized Database For Mobile Devices

May 22, 2012

State of the art

It's been a while since my last post. Those past few weeks I was busy with multiple school related projects, exams and so on, but at the same time I was working on my final project too.

So, where am I now?

At the suggestion of my mentors I switched the Desktop Web Application from Java Servlets and JSP to Google Web Toolkit(GWT).

Why?

Because a Web application developed with servlets is working as a fat client. In a fat client system, a high degree of processing occurs on the client or desktop system, while relatively little is done on the server.

With GWT is totally different, because it is working as thin client, this is a model of computing whereby almost all the processing is done at the server end. This enables low powered computers to be used at the front end.

There are a lot of other pros and cons for each approach, though in the last decade most of the industry has moved largely in favor of thin client systems because thin client systems can support on-demand and other Internet-based applications with relatively little administrative or technical support.

To move on, a few days ago I've added a new feature to my API, delta updates. That means, now I am not exchanging the whole database between server and mobile device just data that has been modified since his last synchronization.

Also exchanging of c2dm messages are done, so when the client on the desktop web app is updating the data, a poke is sent to the android device, to inform that he has to synchronize his database.

With this features I have removed another two restrictions from the beginning:

Database from the server side (cloud) will be accessible only from Android devices.
Changes on the client side will result with an update of the entire database on the cloud.

Now, I have to continue with testing the API/Desktop Web Application, fix bugs if it is necessary and start to write the theoretical part.

April 10, 2012

Progress

Generalization of database schema is done for my system. Now an authenticated user can create through REST calls his own schema based on his preferences and interact with it.

Also a basic implementation of c2dm protocol is available. As a reminder c2dm is a protocol that will push notifications to client when something change on the server side.

For some of you who are interested on how to implement this on the server and the client, you can check this tutorial, is very useful and easy. It provides a simple example of getting the auth token from the c2dm servers, registering an android client to c2dm and exchange a message between client/server.

Next, I have to change the desktop web client as well and develop it based on functions that I've created.

Few days ago the first demo on Android platform was released. It is available here. I will appreciate any kind of feedback, I know that there are some bugs to fix, but the main functionalities are met.

Let's point out again the main functionalities:

The user can create a Gmail account if he doesn't have one already associated with the phone.
Authenticate with it on the server side.
View(GET) the current Tasks on the server(it there are any).
Sync(GET) the client database with server database.
Add(POST) a Task on the local database(Content Provider) and then on the server.
Update(PUT) a desired Task, form the current list of available Tasks.
Delete(DELETE) a Task.

The following pictures should clarify what I was describing:

Authorization of an account

Main Activity - RUD functions.

Create function

April 3, 2012

Low-Level Datastore API

I had to postpone the ACLs for a while and focus on a more important requirement. Until now my system offers data through REST architecture only for a single database schema, so it's time to make it more general and support any type of database structure based on user's preferences.

To achieve this I will use Java Low-Level Datastore API to work with the datastore, because it doesn't require a predefined schema and I can expose the service capabilities directly.

The datastore writes data in objects known as entities, and each entity has a key that identifies the entity. Entities can belong to the same entity group,which allows to perform a single transaction with multiple entities.

Let’s create an Entity representing the login information of an user.

import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.Key;
import com.google.appengine.api.datastore.KeyFactory;

String user    = "test@example.com"
String message = "Hello world!";
Date   date    = new Date();

Entity LoginInfo = new Entity("Login", user);
       LoginInfo.setProperty("date", date);
       LoginInfo.setProperty("content", content);

Above we defined an Entity with a raw constructor. We are passing two strings: the kind(or the name of the schema) and the key(or the unique identifier). Entities are typeless, so we can specify any string as a type. In fact the number of kinds is limited only by the number of kinds that we need, and as long as we don’t lose track of them, we could have many different kinds without having to create a class for each one.

The key name is what we’ll use to retrieve user later on when we need him again. Think of it as a Map or Dictionary Key.

Once we have an Entity object, we need to define the properties. In this example I defined the current authentication date and a welcome message as properties. Note that, again, we can define as many properties as we want.

After we construct the entity, we instantiate the datastore service, and put the entity in the datastore:

DatastoreService datastore = 
             DatastoreServiceFactory.getDatastoreService();
datastore.put(LoginInfo);

The low-level Java API provides a Query class for constructing queries for fetching and returning the entities that match the query from the datastore. Here is a simple example based on our code:

Query query = new Query("Login");
Iterator iterator = datastore.prepare(query)
                                     .asIterator();
while(iterator.hasNext()){
    Entity person = iterator.next();

}

This code creates a new query on the Login entity, which returns an interator to a list of Entity objects.
On the following days based on this approach I will rewrite the entire API to meet the requirement.

March 26, 2012

ACLs

Those past few weeks I successfully deal with this:

Add security constraint: an user is not allowed to access the API without being authenticated.
Full CRUD functions for desktop web application: add the Update(PUT) feature(an user can edit theirs entry).
Multitenancy: Creating namespaces for each user.
Changed the schema of the database and changed the API accordingly.
Bug fix: Get date as long and convert it to Date (public issue Gson)
Offer for requester's API the data from their own namspace.
Rearrange the entire project.

Now let's stop to a point from the list and discuss the pros and cons regarding to this:

Multitenancy: Creating namespaces for each user.

As I said in my previous post a good advantage of multitenancy is that it simplifies administration and data becomes easier to manipulate because all namespaces share the same database schema.

But, creating a namespace for each user will limit the boundaries, so we were thinking to split the systems on layers. For example two people could share a single list, so one person could add items to the list and the other person sees them populate on their phone.

To achieve this we have to implement some access control lists and offer to the client the possibility to create a shared namespace(where he will add people to the group). For that I have to create groups and for each member of the group to add some permissions(read, write, execute).

On the following lines I will describe the requirements that I want to achieve:

An user can create a desired namespace.
He can share it with the others.
The accepted users can read, write, execute on the namespace.
The owner of the namespace can delete the namespace.

Nice to have requirements:

The owner can see the members of their namespaces.
The owner of the namespace can set permissions for each member.

Now I am thinking what should I use: a specialized framework or to create my own. I already searched on the internet and I found some but they look too hard to follow. I will see.

March 7, 2012

/* TODO */

Google authentication for my system is done, BUT during the implementation, some problems / questions raised, so let's take a look and try to answer them:

1. Yesterday night while I was working on a totally different project, I was wondering if I can access the data from the database via GET method on the browser even though I'm not authenticated, and after trying I found out that I could :) because I forgot to set the security constraint on my system. To solve this issues I google it and I found out that Java web applications for Google App Engine use a deployment descriptor file to determine how URLs map to servlets, which URLs require authentication, and other information. This file is named web.xml, and resides in the app's WAR under the WEB-INF/ directory. web.xml is part of the servlet standard for web applications. In this file I added a <security-constraint> element who defines a security constraint for URLs that match a pattern. If a user accesses a URL whose path has a security constraint and the user is not signed in, App Engine redirects the user to the Google Accounts sign-in page. Google Accounts redirects the user back to the application URL after successfully signing in or registering a new account. The app does not need to do anything else to ensure that only signed-in users can access the URL.[Source] Below is the code that I had to add to fix this bug.

<security-constraint>
      <web-resource-collection>
            <url-pattern>/api/*</url-pattern>
      </web-resource-collection>
      <auth-constraint>
            <role-name>*</role-name>
      </auth-constraint>
</security-constraint>

2. Currently to distinguish the data between clients I've added beside their data, two more fields, the email address and the userid. Those kind of information(which by the way is unique) help me to easily get their data after logging into the system. Below you can see a snapshot of how I did it.

/* Get the instance of the Database */
PersistenceManager db = PMF.get().getPersistenceManager();
/* Create the Sql query */
Query q = db.newQuery("select from " + Note.class.getName()
          + " where userId=='" + user.getUserId()
          + "' && emailAddress=='" + user.getEmail()
          + "' " + " order by date");
/* Execute the query */
List<Note> list = (List<Note>) q.execute();

On the other side my mentors said that this approach is good, BUT a correct one should use the power of Multitenancy which is supported by the Google App Engine Api. Basically multitenancy is the name given to a software architecture in which one instance of an application, running on a remote server, serves many client organizations (also known as tenants). Using a multitenant architecture simplifies administration and provisioning of tenants. You can provide a more streamlined, customized user experience, and also aggregate different silos of data under a single database schema. As a result, the application becomes more scalable. Data becomes easier to segregate and analyze across tenants because all tenants share the same database schema[Source]. Below I've added an example of creating a namespace for an authenticated user.

if (NamespaceManager.get() == null) {
  // Assuming there is a logged in user.
  namespace = UserServiceFactory.getUserService().
              getCurrentUser().getUserId();
  NamespaceManager.set(namespace);
}

I will try to add this feature to my system, so this it will be another task on my TODO list.

3. A small modification on the server side application that I should do, is to allow a user to edit their entries(Tasks), so all CRUD functions will be met.

Ok so let's point the main tasks that I should solve in the near future:

Allow an user to edit their's entries.
Add Multitenancy.
! Add C2DM protocol, that will push notifications to client when something change on the server side.

P.S. The Demo is available on this link. It's just an small application that will prove the functionalities of the system.

March 2, 2012

App Engine connected to Android Device

Trying to implement google accounts login to my application and searching through internet information about Android and Google App Engine I found an interesting Google Talk event.

In that session two engineers from Google presented a new feature, App Engine Tooling for Android. It's a complete set of Eclipse-based Java development tools for building Android applications that are backed by App Engine.

Just create a new application "App Engine connected to Android device" and Eclipse make 2 projects: Android application and App Engine Application who provide a simple example of communication between the GAE server and the client application.

The interesting part is that this kind of application takes care of authentication with Google account and implementation of C2DM protocol, so basically you just login with your gmail account on both sides (server - GAE and client - Android App) and then you establish a communication between those two entities. Then you can send messages from the server to client.

The basic architecture of the "framework" looks very similar to a wide range of applications:

Unfortunately this is not suitable for my work because as my mentor says "it's not a good ideea to combine RPC with REST Architecture".

Anyway it was nice to "play" with this project and try to understand how it works, even though it took more than 5 hours to build and deploy it.

February 29, 2012

Where we are

For the past few weeks we are trying to improve the functionalities of our system, so let's take a look at what have we done:

We switched the exchange data format used for serializing from XML to JSON(GSON).
Improve the CRUD functions, user can add more than one entry to database when he calls the POST method.
We solve the problem of generating an unique identifier for an entry with Universally Unique Identifier.
To assure that the database is consistent we added Transactions so each operation in the process of a transaction is guarantee to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them.

Now I will describe next tasks:

Authentication with Google accounts - This kind of authentication is needed in order to use C2DM protocol (I will describe in a later post this feature) and to distinguish data between clients.
Delta Updates - In order to optimize the network traffic and save time I will focus on implementing Delta updates. This is an update that only requires the user to download the data that has changed, not the whole database. Any application ready for updating can be updated almost immediately due to this system. If, for example, a local database that is 100 megabytes is updated with a new amount of data that is 2 megabytes, the system will download only the 2 megabytes instead of 102 me gabytes.

In order to keep track of changes during development and testing phase we decided to create a repository. We have created a SVN based repository on Google Code.

February 16, 2012

Requirements and Restrictions

In this post I will try to describe the most important and natural requirements of my system.

Architectural Requirements

The system will be developed for Android Architecture using Google Technologies.
The persistence will be handled by a distributed database.
The database will be BigTable.
Database management will be provided by NoSQL.

Functional Requirements

The system will be designed for one application distributed on multiple mobile devices.
Transparent data (readable) will be supported by the system.
The system must be able to provide an internet connection to successfully synchronize the data between local database and cloud database.
The system will use a Wi-Fi connection when is available otherwise a GPRS standard - User can choose what type of connection will be used primarily, as default Wi-Fi will be set up. Also user application will be able to force this relation: in case only GPRS is available the application will load only relevant data.
Offline work will be supported by the system - The accuracy of the application will not be affected by the fact that at a time there is no available network connection, Content Provider will store data locally(cache approach) and synchronize it with the cloud when an internet connection becomes available.
Cloud to Device Messaging(C2DM) protocol will be used to tell the application that on the server side is new data, so that the application can fetch it. The C2DM service handles all aspects of queueing of messages and delivery to the target application running on the target device.
JSON(GSON) will be the data exchange format used for serializing and transmitting structured data over a network connection.
To ensure consistency the system will be able to replicate the data on several storage devices.
Timestamps will be used to solve conflicts during resynchronization. We will use the "last update wins" approach because it's the most simple and natural.
The communication between entities(server and client) will be ensured through REST architecture using Create/Read/Update/Delete functions.
The database on the server side will be able to maintain and distinguish pieces of data from multiple clients.
Optimize the network bandwidth and save time making Delta updates. This require the user to download the data that has changed, not the whole database.

Non-Functional Requirements

The programming language used for developing the system will be Java.

Beside those requirements, at the beginning I will add some restrictions just to be sure that in the end I will have a functional software. If the development of the project will go well, I will remove step by step one of them and I will try to solve it. The complexity of my system will increase with every removed restriction.

Restrictions

Network connection is available at any time.
One device will be used to avoid conflicts in synchronizing the cloud and local database.This implies that a network synchronization will take place before continuing working.
The client database will fit in the cellphone memory.
Every application has its own database.
Database from the server side (cloud) will be accessible only from Android devices.
Changes on the client side will result with an update of the entire database on the cloud.

The software development methodology that I will try to use is Agile. Those methods have proven their effectiveness and are transforming the software industry everyday. The main goal is to provide a functional software. Tasks will be divided into small increment parts and functional software is developed in short iterations("timeboxes"). A good feature of this method is that encourages rapid and flexible response to change.

It seems this method uses my design principle, Keep it simple, stupid!, we will see if is true.

February 2, 2012

Bigtable and The Skeleton of 3D for Android

Bigtable is a distributed database system, designed to scale to a very large size(petabytes) across thousands of servers. It is owned by Google and used on some of their applications(more than sixty) such as Google Maps, Google Earth, Gmail and so on.

It's closed source, although Google offers access to it as part of its Google App Engine. Since his deployment(late 2003) Bigtable has achived serveral goals: wide applicability, scalability, high performance, and high availability.

Each table on this system is a sparse, distributed, multi-dimensional map where data is organized into three dimensions: rows, columns and timestamps.

(row:string, column:string, time:int64) → string

In order to optimize the managing of a huge amount of data, the tables are split at row boundaries and stored as tablets. Each tablet hold contiguous rows and have between 100-200 MB distributed on several machines.

Each machine stores about 100 tablets(in GFS), this setup allowing good load balancing and fast recovery(if a system goes down, other machines take one tablet, so the load on each is fairly small).

When sizes threaten to grow beyond a specified limit, the tablets are subject of three different type of compaction:

Minor Compaction - creates new SSTables - who has two goals: to reduce memory usage and reduce the amount of data that has to be read during recovery if the server dies.
Merging Compaction, periodically executed in the background, reads the contents of a few SSTables and writes out a new SSTable.
Major Compaction rewrites all SSTables into exactly one.

More details about the implementation, data model and Google infrastructure on which Bigtable depends you can find on this lecture from University of Washington or on this paper.

Regarding the small Rest API that I was about to develop, it has proved to be quite easy considering that I had some experience with Google App Engine, Jersey and Java. So I have created a small application on GAE and through Rest calls via Http I can Create(POST/PUT), Read(GET), Update(POST), Delete(DELETE) data in my table(Bigtable) on cloud. Also I created a simple application for Android who can do those operations as well.
Now I will focus on describing the requirements of my future software implementation, but for that I have to do a research to find the best solutions who fits.

January 26, 2012

The Idea

With the fast advancement in information technology, database management systems are becoming more and more advanced.

If at the beginning system designers and architects thought that a central control is better for database management, nowadays along with the relatively cheaper hardware, distributed database has become a better choice.

The main purpose of the project is to allow users to reliably store and synchronize data between their mobile device and the cloud. When there is not a network connection between the client and the cloud, the data is stored locally, and when the connection becomes available, the cloud database is updated accordingly. The database will be replicated to improve reliability, availability and fault-tolerance. The communication between clients and the cloud will be made through REST API and C2DM protocol. Android SDK, Google App Engine and Big Table are the technologies which will be used to develop this application.

I think the picture below describe clearly the principle of the project.

Google App Engine connected to Android Architecture

This work will deal with the following problems: how to successfully integrate my system in an environment that allows replication and designing the algorithm for data selection.

It will proceed along the following points:

Analyze requirements for system storing data inside cloud (or environment allowing replication) allowing mobile devices to load relevant parts of the data. Focus only to database layer and interface.
Study and describe basic principles of databases and distributed systems.
Describe the algorithm used for data selection (range or context, etc.)
Select one DB and document reasons for decision.
Design system architecture of system.
Implement solution (DB side).

This is a team project. I'll focus on the server side of the system, the client will be described and developed by my colleague Andreea Sandu.

On the following days I'll describe the Big Table database provided by Google(GAE) and I'll try to develop a small REST Api in Java with the CRUD functionalities.

Until then, "Keep it Simple, Stupid"!

January 25, 2012

Hello World!

Who am I?

As I said in my short description, I am an exchange student at ČVUT, Prague. I am in my final year of a Computer Science Bachelor degree. I'm intrested in learning and excelling in new technologies and use my education and experience to accomplish my goals.

Currently I’m focused on cloud computing, distributed systems and web development based on Android platform. I have experience in C, Java, SQL and Networking.

What is this blog about?

During my thesis preparation I would like to share information about my progress and to describe step by step each milestone. Also I want to encourage everyone to give feedback and suggestions.

Acknowledgements

I would like to thank to my mentors Ing. Jan Sedivy, CSc. and Ph.D. Tomáš Bařina for their trust, time they will spend for regular consultations and willingness to share their rich experience.

I will describe in a further post the main idea of my project and the content of it.

P.S. It's not so simple as I thought to write a post, those few rows took me more than one hour. I will get used to it :).

"Keep It Simple, Stupid!"

Pages