Monday, November 24, 2008

Optimising RoR application for Amazon EBS

Amazon has a great feature called EBS which enables to have data persistence in the event of an instance failure.
We have configured mysql to use the EBS for datafiles. Details for this can be found here: http://docs.google.com/Doc?id=dcn2ckbh_21gznbbjhr

Though a great feature, EBS has a couple of operational limitations.
1. It has a cryptic billing structure which bills based on (capacity + usage). Now most of us can't really predict the usage of disk and risk overshooting this.
2. EBS will be slower that locally mounted storage.

To overcome these issues, in our RoR application, we decided make some changes.

We decided to upload all user data to TWO locations - 1st location is the usual "RoR-app-home/public/" directory. This directory is in the local storage of the instance.
The 2nd location is the EBS (/dev/sdh) mounted on /mnt/data-store of the instance. Within data-store, we created a few directories for storing various types of user data.
1. During upload, the data is copied to both locations. i.e. We write to both local and EBS.
2. During read - we read it from local directory of EC2. This local reads ensures that EBS is not hit with multiple read requests and our EBS costs are low. There is alo the speed benefit of reading from local storage as opposed to reading from EBS (Amazon AWs developers can correct me on the speed issue.)

Here is the sampe code where we are uploading a video and a thumbnail associated with the video: Keep and eye out for "video.rewind".

Add in app/model/video.rb (for our application - you will have adapt for you app.)

VIDEO_UPLOAD_PATH = "public/video_player/videos/"
THUMBNAIL_UPLOAD_PATH = "public/video_player/thumbnail/"

#Manage the path depending on OS
#Hard disk usage Optimisation for AWS.
#Replicate videos, images, into AWS local hard disk.

VIDEO_UPLOAD_PATH_FOR_AWS = "/mnt/data-store/app-data/videos/"
THUMBNAIL_UPLOAD_PATH_FOR_AWS = "/mnt/data-store/app-data/thumbnails/"

# create and upload video , thumbail
def self.create_video
if valid_video?(video) && valid_thumbnail?(thumbnail)
video_filename = sanitize_attachment_name(video)
thumbnail_filename = sanitize_attachment_name(thumbnail)
@video = self.new do |video|
video.video_name = video_filename
video.thumbnail_name = thumbnail_filename
end
self.upload_video(video, video_filename, @video.id) && self.upload_thumbnail(thumbnail, thumbnail_filename, @video.id) if @video.save
end
end

def self.upload_video(video, video_filename, video_id)
video_path = VIDEO_UPLOAD_PATH + "#{ video_id}_" + video_filename
File.open(video_path, "wb") { |f| f.write(video.read) }
# code to manage video upload file to EBS
video.rewind
video_path_for_aws = VIDEO_UPLOAD_PATH_FOR_AWS + "#{ video_id}_" + video_filename
File.open( video_path_for_aws, "wb") { |f| f.write(video.read) }
end

def self.upload_thumbnail(thumbnail, thumbnail_filename, video_id)
thumbnail_path = THUMBNAIL_UPLOAD_PATH + "#{ video_id}_" + thumbnail_filename
File.open(thumbnail_path, "wb") { |f| f.write(thumbnail.read) }
# code to manage thumbnail image upload file to EBS
thumbnail.rewind
thumbnail_path_for_aws = THUMBNAIL_UPLOAD_PATH_FOR_AWS + "#{ video_id}_" + thumbnail_filename
File.open(thumbnail_path_for_aws, "wb") { |f| f.write(thumbnail.read) }
end

Friday, November 21, 2008

Tutorial on hosting RoR app on Amazon AWS with EC2, EBS, Ruby Enterprise Edition (REE) and Phusion Passenger (mod_rails)

Background:
We built a tourism portal in RoR for one of our clients. You have a look at it - - www.tripladder.com
After building it, we were requested to host and manage it for them. Initially we went with knownhost which is OK but a production RoR application needs more RAM than what we get on most VPS plans - especially if we have image processing. We did consider AWS but at that time it did not have EBS and the client did not initially expect enough traffic to justify a 'scalr' managed cluster. We were looking for a replacement to a dedicated server. Once EBS was launched, we immediately decided to move the site to AWS. The Cost-benefit analysis is compelling.

The following tutorial starts off after signing up with AWS and configuring your desktop/laptop to be able to connect to AWS and launch instances i.e. we assume that you have completed the 'Getting Started' section of AWS.

We have started with the stock Fedora image and modified it to our requirements. We could have used CentOS but Fedora-8 appeared at the top of the list and we went ahead with it.

The application hosting has the following steps.
  1. Launching an instance.
  2. Installing RoR, gems, plugins...We used rmagick, hence we had to install Imagemagick too.
  3. Installing REE and Phusion (mod_rails)
  4. Installing mysql.
  5. Intalling the application (checkout from subversion).
  6. Creating and attaching a EBS volume. Mysql with data on EBS
  7. Modifying the RoR app to save user upload files to EBS.(http://docs.google.com/Doc?id=dcn2ckbh_20hk4kc4d4)
  8. Installing and configuring a production level ferret server
  9. Configuring Apache to serve the application, caching optimisations for performance.
  10. Configuring permanent public IP (covered) and DNS (we have the domain parked with go daddy but this is not covered in this article)
  11. Configuring smtp (email) support for RoR application.
  12. Once we have the perfect server setup, save it to S3.
  13. Periodic automated backups - Using Amazon snapshots.

The full tutorial is available here:

http://docs.google.com/Doc?id=dcn2ckbh_21gznbbjhr

Tutorial on hosting a RoR (Ruby on Rails) application on Amazon AWS with EC2 and EBS.

We built a tourism portal in RoR for one of our clients. You can have a look at it - - www.tripladder.com
After building it, we were requested to host and manage it for them. Initially we went with knownhost which is OK but a production RoR application needs more RAM than what we get on most VPS plans - especially if we have image processing. We did consider AWS but at that time it did not have EBS and the client did not initially expect enough traffic to justify a 'scalr' managed cluster. We were looking for a replacement to a dedicated server. Once EBS was launched, we immediately decided to move the site to AWS. The Cost-benefit analysis is compelling.

The following tutorial starts off after signing up with AWS and configuring your desktop/laptop to be able to connect to AWS and launch instances i.e. we assume that you have completed the 'Getting Started' section of AWS.

We have started with the stock Fedora image and modified it to our requirements. We could have used CentOS but Fedora-8 appeared at the top of the list and we went ahead with it.

The application hosting has the following steps.
  1. Launching an instance.
  2. Installing RoR, gems, plugins...We used rmagick, hence we had to install Imagemagick too.
  3. Installing mysql.
  4. Intalling the application (checkout from subversion).
  5. Creating and attaching a EBS volume. Mysql with data on EBS
  6. Modifying the RoR app to save user upload files to EBS.
  7. Installing and configuring a production level ferret server
  8. Installing and configuring mongrel cluster.
  9. Configuring Apache to proxy to mongrel cluster, caching optimisations for performance.
  10. Configuring permanent public IP (covered) and DNS (we have the domain parked with go daddy but this is not covered in this article)
  11. Configuring smtp (email) support for RoR application.
  12. Once we have the perfect server setup, save it to S3.
  13. Periodic automated backups -This is available here: http://docs.google.com/Doc?id=dcn2ckbh_21gznbbjhr

For the full tutorial - go to this link.

http://docs.google.com/Doc?id=dcn2ckbh_20hk4kc4d4

Wednesday, November 19, 2008

A perfetc world.. A world without IE ??

It becomes a huge list when you pen down the IE bugs(even features), some of which you can't even dream of.
I had my share of problems struggling for hours to get things work on a buggy IE(I refer to IE6 & IE7 which makes no difference).

As I feel it probably a fitting context, I can't stop quoting this by someone on Ajaxian.
"
If you've been working with the Ajax framework long enough, I'm sure you've run into at least a few speed bumps thanks to Internet Explorer. Not a day goes by that I don't have to rewrite a line of code, or tweak my css in order for IE to render what I think it should. But alas, this is the nature of software that comes from a company that views Standard Compliances as recommendations."

Let me share with you a couple of issues I had to face.

Recently, in my web-page, it was needed to dynamically change the options of a SELECT box. What I was doing was construct a string with options and append the the options string to the SELECT tag using innerHTML. This worked perfectly on my favourite FF and even on Safari. I am pretty sure it would have worked on any browser, you name it, with the exception of IE.

Here is my code:
var optionString = "";
for(option in options) {
optionString += ''+data[option]+'';
}
var sel = Document.getElementById("mySelectBoxId");
sel.innerHTML = optionString;
What's happening in IE was, the first option tag is getting truncated and my SELECT box is shown empty.
I broke my head examining my code for a bug but found none. The mighty Google to the rescue. It was revealed that this was because of this BUG in IE.
IE recommends using DOM to achieve this and which I did to effect.

The other issue is while using AJAX. I don't call it a bug(its rather more of a feature). I have a list of items which can be re-ordered by drag-drop. Whenever an item gets dragged, I make an AJAX GEt request to the server and a re-ordered list of items had to served as response. Again this works fine with FF and Safari. The problem is with IE again.

When one of the item is dragged, IE is not always making a request to the server but the list gets re-ordered wrongly on the browser. It took me a whole day to figure out where things were going wrong.

IE is not making the request but serving a stale cached response from previous requests which are identical. Some argue that its a feature with any browser to cache a response from a GET request. But IE's caching has always been an issue.

We have two solutions for the above problem.
One by making a POST request instead of a GET so that IE won't cache it.
"But I can't convince myself to make a POST request just to satisify IE where a simple GET best fits the job. "
The other is to make your every request unique. For this you can append a random string or a timestamp as a useless parameter at the end of your request URL.
"By making requests unique, you can achieve a proper response from the server in this case, but IE still caches this response. When these requests are numerous, the browser cache overflows and you may loose some important cached response of other requests."

So, there is always a trade off involved. Its upto the developer to choose a solution that best fits his job.

The list doesn't end here. It's just that I didn't come across but others might have. I will keep adding such issues if any. I wish I don't.

"
Have fun with you code and don't hate it :) hate the things that don't comply to the standards..."
Now is it a fitting description to call it A perfect world.. A world without IE ??