Latest News

Monday, 30 April 2012

5 Things all Java developer should know when developing for the cloud

The last couple of years, "Cloud Computing" replaced Web 2.0 as the new buzzword. You can read, hear and see everywhere the cloud is coming. To most developer, this is still the same old sh*t. If you have experience in developing distributed system then you should be fine, you say. Well not entirely true, the IT department wants to deploy on cheap cloud and therefore some restrictions now applies. I will list 5 things that I think all developers should know when working with cloud Platform as a Service provider such as Amazon Beanstalk or Google App Engine. This list also applies to IaaS architecture. Some of the points might be obvious to the more experienced, nevertheless, they need to be mentioned.

  • Static objects
We all know the difference between instance variable (non-static) and class variable (static variable). We use static to tell the JVM that they should only be one instance of this variable (singleton). If the static variable is declared with the "final" keyword, this will not cause a problem in a distributed environment as the value will never change. The problem is when we expect the value of the variable to change. As in a cluster environment, GAE and Beanstalk run your application in multiple JVM. If a the value of your static variable has changed in JVM, it will not be propagated to the cluster therefore leading to inconsistencies. I recommend that you avoid static variable unless that set as "final" and their values are hard-coded so there is no way to change their values are runtime.

  • Caching Objects
This one is related to performance in order to avoid expensive operations such as running database queries and others. Sometimes we need to cache objects in memory and therefore we implement our own caching strategy through the use of simple HashMap or some other caching solutions available outthere. Caching has many benefits but implementing a caching strategy should be approached with care. This is because caching has the same problem as static objects. Your cache will be in the local JVM therefore not it will not be visible in the cluster. There are some solutions, for example, GAE uses Memcached and Beanstalk can make use of Amazon ElastiCache which is compliant with Memcached. When developing for a PaaS environment, make sure to not implement your own caching system but look for one that is supported by the vendor. I know this can lead to vendor lock-ins.

  • Server-side Session
Something we do take for granted in single environment is storing application session data on the server. Based on experiences, mainly using GAE, I encountered multiple issues with session management. Since then, Google has fixed alot of the issues with the way GAE handle sessions for Java application. To minimize writing session to a datastore, we store application state in memory. Most application are written without any vendor approach in mind; so we use JEE as-is. This approach would work in you deploy in any self hosted clustered environment but Google PaaS. Google implements their own session management which is off by default therefore you need to enable it in appengine-web.xml and make sure that all your objects implements the java.io.Serializable interface. 
Note: Note, session data is always written synchronously to memcache. If a request tries to read the session data when memcache is not available (or the session data has been flushed), it will fail over to the datastore, which may not yet have the most recent session data. This means that asynchronous session persistence may cause your application to see stale session data. However, for most applications the latency benefit far outweighs the risk.

  • Event-driven Execution
This is more about running a process at a given time such as Scheduling task. Again, in a managed environment, it is straightforward to implement a timer or scheduler service. But this is a clustered environment which is not managed by yourself and their stack his different to yours. I personally use Quartz Scheduler when working in a single server environment. In a clustered environment such as Beanstalk or GAE, it is difficult to know which instance will be triggered and execute the task only once. The folks at Google have provided another solution with their own implementation of Cron for Java which can be used. At the time of writing, Amazon Beanstalk didn't have a solution yet. Therefore, consider before-hand when designing your system, which approach to take in order to create scheduled tasks for your application.

  • JRE white list
I believe this related to GAE J only. Google App Engine for Java doesn't allow the use for all available API in Java, especially if they do require access to the file system. The fact that there is a such a restriction impose by the Google has led us to look elsewhere for some of our projects. The cost of re-developing our application to please them is much higher than deploying them elsewhere. Also, another downside of GAE J is doesn't fully support JEE servlet specification. You cannot implement custom security for your application through your web.xml therefore pushing you to use Google own security mechanism. I would recommedn using GAE J when developing a greenfield project which can be built with these restrictions here  and here in mind. If you want to be locked-in using GAE J for your application, then I recommend it as a cost efficient way to testing your application otherwise, look somewhere else.

I hope this was helpful and if there's mistake, feel free to get back to me and I make any corrections. Also, I am sure that I am missing some other points, add them to the comments sections.

P.S. here is a nice comparison from IBM

Cheers and Happy Coding.



  • Blogger Comments
  • Facebook Comments

16 comments :

Post a Comment

Item Reviewed: 5 Things all Java developer should know when developing for the cloud Description: Rating: 5 Reviewed By: Armel Nene
Scroll to Top