February 29, 2012

Where we are

For the past few weeks we are trying to improve the functionalities of our system, so let's take a look at what have we done:

  • We switched the exchange data format used for serializing from XML to JSON(GSON).
  • Improve the CRUD functions, user can add more than one entry to database when he calls the POST method.
  • We solve the problem of generating an unique identifier for an entry with Universally Unique Identifier.
  • To assure that the database is consistent we added Transactions so each operation in the process of a transaction is guarantee to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them.

Now I will describe next tasks:

  1. Authentication with Google accounts -  This kind of authentication is needed in order to use C2DM protocol (I will describe in a later post this feature) and to distinguish data between clients.
  2. Delta UpdatesIn order to optimize the network traffic and save time I will focus on implementing Delta updates. This is an update that only requires the user to download the data that has changed, not the whole database. Any application ready for updating can be updated almost immediately due to this system. If, for example, a local database that is 100 megabytes is updated with a new amount of data that is 2 megabytes, the system will download only the 2 megabytes instead of 102 me gabytes.

In order to keep track of changes during development and testing phase we decided to create a repository. We have created a SVN based repository on Google Code.

February 16, 2012

Requirements and Restrictions

   In this post I will try to describe the most important and natural requirements of my system.

Architectural Requirements

  1. The system will be developed for Android Architecture using Google Technologies.
  2. The persistence will be handled by a distributed database.
  3. The database will be BigTable.
  4. Database management will be provided by NoSQL.
Functional Requirements
  1. The system will be designed for one application distributed on multiple mobile devices.
  2. Transparent data (readable) will be supported by the system.
  3. The system must be able to provide an internet connection to successfully synchronize the data between local database and cloud database.
  4. The system will use a Wi-Fi connection when is available otherwise a GPRS standard - User can choose what type of connection will be used primarily, as default Wi-Fi will be set up. Also user application will be able to force this relation: in case only GPRS is available the application will load only relevant data.
  5. Offline work will be supported by the system - The accuracy of the application will not be affected by the fact that at a time there is no available network connection, Content Provider will store data locally(cache approach) and synchronize it with the cloud when an internet connection becomes available.
  6. Cloud to Device Messaging(C2DM) protocol will be used to tell the application that on the server side is new data, so that the application can fetch it. The C2DM service handles all aspects of queueing of messages and delivery to the target application running on the target device.
  7. JSON(GSON) will be the data exchange format used for serializing and transmitting structured data over a network connection.
  8. To ensure consistency the system will be able to replicate the data on several storage devices.
  9. Timestamps will be used to solve conflicts during resynchronization. We will use the "last update wins" approach because it's the most simple and natural.
  10. The communication between entities(server and client) will be ensured through REST architecture using Create/Read/Update/Delete functions.
  11. The database on the server side will be able to maintain and distinguish pieces of data from multiple clients.
  12. Optimize the network bandwidth and save time making Delta updates. This require the user to download  the data that has changed, not the whole database.
Non-Functional Requirements
  1. The programming language used for developing the system will be Java.

     Beside those requirements, at the beginning I will add some restrictions just to be sure that in the end I will have a functional software. If the development of the project will go well, I will remove step by step one of them and I will try to solve it. The complexity of my system will increase with every removed restriction.

Restrictions
  1. Network connection is available at any time.
  2. One device will be used to avoid conflicts in synchronizing the cloud and local database.This implies that a network synchronization will take place before continuing working.
  3. The client database will fit in the cellphone memory.
  4. Every application has its own database.
  5. Database from the server side (cloud) will be accessible only from Android devices.
  6. Changes on the client side will result with an update of the entire database on the cloud.

     The software development methodology that I will try to use is Agile. Those methods have proven their effectiveness and are transforming the software industry everyday. The main goal is to provide a functional software. Tasks will be divided into small increment parts and functional software is developed in short iterations("timeboxes"). A good feature of this method is that encourages rapid and flexible response to change.
     
      It seems this method uses my design principle, Keep it simple, stupid!, we will see if is true.

February 2, 2012

Bigtable and The Skeleton of 3D for Android

      Bigtable is a distributed database system, designed to scale to a very large size(petabytes) across thousands of servers. It is owned by Google and used on some of their applications(more than sixty) such as Google Maps, Google Earth, Gmail and so on.

      It's closed source, although Google offers access to it as part of its Google App Engine. Since his deployment(late 2003) Bigtable has achived serveral goals: wide applicability, scalability, high performance, and high availability.


      Each table on this system is a sparse, distributed, multi-dimensional map where data is organized into three dimensions: rows, columns and timestamps.


(row:string, column:string, time:int64) → string


      In order to optimize the managing of a huge amount of data, the tables are split at row boundaries and stored as tablets. Each tablet hold contiguous rows and have between 100-200 MB distributed on several machines.
Each machine stores about 100 tablets(in GFS), this setup allowing good load balancing and fast recovery(if a system goes down, other machines take one tablet, so the load on each is fairly small).

       When sizes threaten to grow beyond a specified limit, the tablets are subject of three different type of compaction:
  1. Minor Compaction - creates new SSTables - who has two goals: to reduce memory usage and reduce the amount of data that has to be read during recovery if the server dies.
  2. Merging Compaction, periodically executed in the background, reads the contents of a few SSTables and writes out a new SSTable.
  3. Major Compaction rewrites all SSTables into exactly one.

    More details about the implementation, data model and Google infrastructure on which Bigtable depends you can find on this lecture from University of Washington or on this paper.





       Regarding the small Rest API that I was about to develop, it has proved to be quite easy considering that I had some experience with Google App Engine, Jersey and Java. So I have created a small application on GAE and through Rest calls via Http I can Create(POST/PUT), Read(GET), Update(POST), Delete(DELETE) data in my table(Bigtable) on cloud. Also I created a simple application for Android who can do those operations as well.

      Now I will focus on describing the requirements of my future software implementation, but for that I have to do a research to find the best solutions  who fits.