Integrating Hazelcast and SimpleDB
Distributed programming and NoSQL is more popular in recent years. By the emergence of cloud computing platforms, anyone can try distributed and scalable solutions with small budget. So all developers can think bigger, imagine distributed and scalable systems, not required to work for a giant company. In this document I will try to design and implement a scalable and distributed solution for a simple phonebook app with millons of users. The case is so simple as I want it to be like "getting started tutorial" integrating technologies Hazelcast and SimpleDB.
The Scenario
Let's keep business requirements as simple as possible in order to focus on technical reqs. Here are our requirements:
- Phonebook will be a web application, public to everyone, no security credentials. Anyone can add a contact, with just name and phone number. Also anyone can search a contact giving her full name.
- Scalability: Take into account the contact data may reach millions. I do not want to change my db solution, I want it to scale horizantaly.
- Performance: I want my web app not to lose its performance as record and visitor numbers increase substantially.
Design Decision: Thinking on above conditions, I have decided on use a distributed data structures backed by a NoSQL database.
Why Distributed Data?
Due to 3rd requirement. I want the app to be fast despite millions of data. And I know fetching data from memory is faster than fetching it from disk. So I want to store contacts on RAM. May I use a single machine with big memory? Yes but that will be very costly as RAM is expensive, and also very risky as you will lose data in case this single machine crashes. So let's distribute the data on many single machines, with a fail-safe distributing framework.
Why NoSQL?
I want my data system to scale horizantally and the NoSQL databases are directly designed for this purpose. Relational databases may show poor performance on data-intensive applications. On the other hand NoSQL solutions can service high read/write workloads.
Technology Decisions
The business requirement is simple so we can handle it just with Java Servlet/JSP. As distributed data cache I will use Hazelcast. As NoSQL persistence layer we will use SimpleDB. I will deploy my apps to multiple AWS EC2 instances.
Why SimpleDB?
SimpleDB is the NoSQL solution serviced by Amazon Web Services. It is scalable and fast but these are also characteristics of NoSQL solutions, so you can prefer CouchDB or MongoDB. Why I prefer SimpleDB is more about its ease of use. You do not administer and maintain the database, you just connect to it with REST web service or its Java SDK. Also note that the bandwidth between EC2 and SimpleDB is free.
Why Hazelcast?
Hazelcast is open-source data distribution solution. It is actively developed, and its community is evolving. But what makes it attractive among alternatives is its simplicity. The job seems complicated -distributing data among machines- on the contrary its usage is "deadly simple". Just add a single jar to your project, dsitrbute it. The fact that giants like Mozilla and Ericsson are actively using Hazelcast gives you confidence about its reliability.
Ok let's start coding?
Note: I will not give detail each step of implementation, instead you can download my project and look into.
Step1: Create The Project
Thanks God, there is Maven. You can create a webapp from archetype, maven-archetype-webapp. But if you do not want to use maven, you should create a Java web project and add the needed jars from sites:
http://www.hazelcast.com/downloads.jsp
http://aws.amazon.com/sdkforjava
Here is the added dependicies to the pom.xml
Step2: Model Classes
There is nothing special on MyMapStore but it delegates the work to our singleton service class: DataService.java.
This is the service class that connects to AWS SimpleDB using AWS Java SDK. The methods are mainly used by MyMapStore which is directly called by Hazelcast when getting/putting on Hazelcast distributed maps.
At this point you will inform hazelcast about your mapstore implementation. This is possible just adding following to default hazelcast configuration under <map> section.
Step 4: User Interface
We will have just a single web page to search contacts and add new contacts. In fact search is map.get and "add new contact" map.put operation. So our servlet will just call hazelcast map put and get, nothing more. We handled add and search operation in the same servlet; the post method calls search or save method according the button's parameter "action".
Step 5: Deployment to AWS
Our application is almost complete. But also we should configure hazelcast so the instances we run discovers eachother. Just enable the following setting.
No you can deploy war to multiple amazon instances.
