Monday, November 24, 2008

Optimising RoR application for Amazon EBS

Amazon has a great feature called EBS which enables to have data persistence in the event of an instance failure.
We have configured mysql to use the EBS for datafiles. Details for this can be found here: http://docs.google.com/Doc?id=dcn2ckbh_21gznbbjhr

Though a great feature, EBS has a couple of operational limitations.
1. It has a cryptic billing structure which bills based on (capacity + usage). Now most of us can't really predict the usage of disk and risk overshooting this.
2. EBS will be slower that locally mounted storage.

To overcome these issues, in our RoR application, we decided make some changes.

We decided to upload all user data to TWO locations - 1st location is the usual "RoR-app-home/public/" directory. This directory is in the local storage of the instance.
The 2nd location is the EBS (/dev/sdh) mounted on /mnt/data-store of the instance. Within data-store, we created a few directories for storing various types of user data.
1. During upload, the data is copied to both locations. i.e. We write to both local and EBS.
2. During read - we read it from local directory of EC2. This local reads ensures that EBS is not hit with multiple read requests and our EBS costs are low. There is alo the speed benefit of reading from local storage as opposed to reading from EBS (Amazon AWs developers can correct me on the speed issue.)

Here is the sampe code where we are uploading a video and a thumbnail associated with the video: Keep and eye out for "video.rewind".

Add in app/model/video.rb (for our application - you will have adapt for you app.)

VIDEO_UPLOAD_PATH = "public/video_player/videos/"
THUMBNAIL_UPLOAD_PATH = "public/video_player/thumbnail/"

#Manage the path depending on OS
#Hard disk usage Optimisation for AWS.
#Replicate videos, images, into AWS local hard disk.

VIDEO_UPLOAD_PATH_FOR_AWS = "/mnt/data-store/app-data/videos/"
THUMBNAIL_UPLOAD_PATH_FOR_AWS = "/mnt/data-store/app-data/thumbnails/"

# create and upload video , thumbail
def self.create_video
if valid_video?(video) && valid_thumbnail?(thumbnail)
video_filename = sanitize_attachment_name(video)
thumbnail_filename = sanitize_attachment_name(thumbnail)
@video = self.new do |video|
video.video_name = video_filename
video.thumbnail_name = thumbnail_filename
end
self.upload_video(video, video_filename, @video.id) && self.upload_thumbnail(thumbnail, thumbnail_filename, @video.id) if @video.save
end
end

def self.upload_video(video, video_filename, video_id)
video_path = VIDEO_UPLOAD_PATH + "#{ video_id}_" + video_filename
File.open(video_path, "wb") { |f| f.write(video.read) }
# code to manage video upload file to EBS
video.rewind
video_path_for_aws = VIDEO_UPLOAD_PATH_FOR_AWS + "#{ video_id}_" + video_filename
File.open( video_path_for_aws, "wb") { |f| f.write(video.read) }
end

def self.upload_thumbnail(thumbnail, thumbnail_filename, video_id)
thumbnail_path = THUMBNAIL_UPLOAD_PATH + "#{ video_id}_" + thumbnail_filename
File.open(thumbnail_path, "wb") { |f| f.write(thumbnail.read) }
# code to manage thumbnail image upload file to EBS
thumbnail.rewind
thumbnail_path_for_aws = THUMBNAIL_UPLOAD_PATH_FOR_AWS + "#{ video_id}_" + thumbnail_filename
File.open(thumbnail_path_for_aws, "wb") { |f| f.write(thumbnail.read) }
end

Friday, November 21, 2008

Tutorial on hosting RoR app on Amazon AWS with EC2, EBS, Ruby Enterprise Edition (REE) and Phusion Passenger (mod_rails)

Background:
We built a tourism portal in RoR for one of our clients. You have a look at it - - www.tripladder.com
After building it, we were requested to host and manage it for them. Initially we went with knownhost which is OK but a production RoR application needs more RAM than what we get on most VPS plans - especially if we have image processing. We did consider AWS but at that time it did not have EBS and the client did not initially expect enough traffic to justify a 'scalr' managed cluster. We were looking for a replacement to a dedicated server. Once EBS was launched, we immediately decided to move the site to AWS. The Cost-benefit analysis is compelling.

The following tutorial starts off after signing up with AWS and configuring your desktop/laptop to be able to connect to AWS and launch instances i.e. we assume that you have completed the 'Getting Started' section of AWS.

We have started with the stock Fedora image and modified it to our requirements. We could have used CentOS but Fedora-8 appeared at the top of the list and we went ahead with it.

The application hosting has the following steps.
  1. Launching an instance.
  2. Installing RoR, gems, plugins...We used rmagick, hence we had to install Imagemagick too.
  3. Installing REE and Phusion (mod_rails)
  4. Installing mysql.
  5. Intalling the application (checkout from subversion).
  6. Creating and attaching a EBS volume. Mysql with data on EBS
  7. Modifying the RoR app to save user upload files to EBS.(http://docs.google.com/Doc?id=dcn2ckbh_20hk4kc4d4)
  8. Installing and configuring a production level ferret server
  9. Configuring Apache to serve the application, caching optimisations for performance.
  10. Configuring permanent public IP (covered) and DNS (we have the domain parked with go daddy but this is not covered in this article)
  11. Configuring smtp (email) support for RoR application.
  12. Once we have the perfect server setup, save it to S3.
  13. Periodic automated backups - Using Amazon snapshots.

The full tutorial is available here:

http://docs.google.com/Doc?id=dcn2ckbh_21gznbbjhr

Tutorial on hosting a RoR (Ruby on Rails) application on Amazon AWS with EC2 and EBS.

We built a tourism portal in RoR for one of our clients. You can have a look at it - - www.tripladder.com
After building it, we were requested to host and manage it for them. Initially we went with knownhost which is OK but a production RoR application needs more RAM than what we get on most VPS plans - especially if we have image processing. We did consider AWS but at that time it did not have EBS and the client did not initially expect enough traffic to justify a 'scalr' managed cluster. We were looking for a replacement to a dedicated server. Once EBS was launched, we immediately decided to move the site to AWS. The Cost-benefit analysis is compelling.

The following tutorial starts off after signing up with AWS and configuring your desktop/laptop to be able to connect to AWS and launch instances i.e. we assume that you have completed the 'Getting Started' section of AWS.

We have started with the stock Fedora image and modified it to our requirements. We could have used CentOS but Fedora-8 appeared at the top of the list and we went ahead with it.

The application hosting has the following steps.
  1. Launching an instance.
  2. Installing RoR, gems, plugins...We used rmagick, hence we had to install Imagemagick too.
  3. Installing mysql.
  4. Intalling the application (checkout from subversion).
  5. Creating and attaching a EBS volume. Mysql with data on EBS
  6. Modifying the RoR app to save user upload files to EBS.
  7. Installing and configuring a production level ferret server
  8. Installing and configuring mongrel cluster.
  9. Configuring Apache to proxy to mongrel cluster, caching optimisations for performance.
  10. Configuring permanent public IP (covered) and DNS (we have the domain parked with go daddy but this is not covered in this article)
  11. Configuring smtp (email) support for RoR application.
  12. Once we have the perfect server setup, save it to S3.
  13. Periodic automated backups -This is available here: http://docs.google.com/Doc?id=dcn2ckbh_21gznbbjhr

For the full tutorial - go to this link.

http://docs.google.com/Doc?id=dcn2ckbh_20hk4kc4d4

Wednesday, November 19, 2008

A perfetc world.. A world without IE ??

It becomes a huge list when you pen down the IE bugs(even features), some of which you can't even dream of.
I had my share of problems struggling for hours to get things work on a buggy IE(I refer to IE6 & IE7 which makes no difference).

As I feel it probably a fitting context, I can't stop quoting this by someone on Ajaxian.
"
If you've been working with the Ajax framework long enough, I'm sure you've run into at least a few speed bumps thanks to Internet Explorer. Not a day goes by that I don't have to rewrite a line of code, or tweak my css in order for IE to render what I think it should. But alas, this is the nature of software that comes from a company that views Standard Compliances as recommendations."

Let me share with you a couple of issues I had to face.

Recently, in my web-page, it was needed to dynamically change the options of a SELECT box. What I was doing was construct a string with options and append the the options string to the SELECT tag using innerHTML. This worked perfectly on my favourite FF and even on Safari. I am pretty sure it would have worked on any browser, you name it, with the exception of IE.

Here is my code:
var optionString = "";
for(option in options) {
optionString += ''+data[option]+'';
}
var sel = Document.getElementById("mySelectBoxId");
sel.innerHTML = optionString;
What's happening in IE was, the first option tag is getting truncated and my SELECT box is shown empty.
I broke my head examining my code for a bug but found none. The mighty Google to the rescue. It was revealed that this was because of this BUG in IE.
IE recommends using DOM to achieve this and which I did to effect.

The other issue is while using AJAX. I don't call it a bug(its rather more of a feature). I have a list of items which can be re-ordered by drag-drop. Whenever an item gets dragged, I make an AJAX GEt request to the server and a re-ordered list of items had to served as response. Again this works fine with FF and Safari. The problem is with IE again.

When one of the item is dragged, IE is not always making a request to the server but the list gets re-ordered wrongly on the browser. It took me a whole day to figure out where things were going wrong.

IE is not making the request but serving a stale cached response from previous requests which are identical. Some argue that its a feature with any browser to cache a response from a GET request. But IE's caching has always been an issue.

We have two solutions for the above problem.
One by making a POST request instead of a GET so that IE won't cache it.
"But I can't convince myself to make a POST request just to satisify IE where a simple GET best fits the job. "
The other is to make your every request unique. For this you can append a random string or a timestamp as a useless parameter at the end of your request URL.
"By making requests unique, you can achieve a proper response from the server in this case, but IE still caches this response. When these requests are numerous, the browser cache overflows and you may loose some important cached response of other requests."

So, there is always a trade off involved. Its upto the developer to choose a solution that best fits his job.

The list doesn't end here. It's just that I didn't come across but others might have. I will keep adding such issues if any. I wish I don't.

"
Have fun with you code and don't hate it :) hate the things that don't comply to the standards..."
Now is it a fitting description to call it A perfect world.. A world without IE ??

Sunday, September 28, 2008

auto complete issue with non-Mozilla browsers

Hi,
Playing with auto complete in Firefox is fun all the time , but this is not going to be same in Others Browsers(IE & Safari),
after Googling for many days i came up with a pleasant solution.

Rails2.0: auto_complete is now a plugin (it’s not in the core anymore), you will install it before using it:
script/plugin install auto_complete

Somewhere in your views you’ll want code somewhat like the following.
<%= text_field_with_auto_complete :article, :contains, { :size => 15 }, :skip_style => true %>

I specified a size for the text area with the :size => 15 values in the hash. I also included :skip_style => true which keeps the helper from automatically inlining CSS styles into the page.

and now the major issue i faced with IE and Safari browsers are :
Q. Up, Down arrow keys are not working for auto complete in non-mozilla browsers when drop down list appears
The solution is ..
This is due to a bug in Scriptaculous. At the time of writing you will need to apply the following patch that will fix it:
in controls.js around line 86 you will observe these lines
Element.hide(this.update);
Event.observe(this.element, 'blur', this.onBlur.bindAsEventListener(this));
Event.observe(this.element, 'keydown', this.onKeyPress.bindAsEventListener(this));
},

and now you modify above lines with the following code of lines :

Element.hide(this.update);
Event.observe(this.element, 'blur', this.onBlur.bindAsEventListener(this));
Event.observe(this.element, 'keypress', this.onKeyPress.bindAsEventListener(this));
// Observe keydown for non-Mozilla browsers per http://dev.rubyonrails.org/ticket/10126
if (Prototype.Browser.Gecko) {
Event.observe(this.element, 'keypress', this.onKeyPress.bindAsEventListener(this));
} else {
Event.observe(this.element, 'keydown', this.onKeyPress.bindAsEventListener(this));
}
},
and restart the server ,now the arrow keys will work in both IE and Safari browsers.

Wednesday, September 24, 2008

improving performance with :select in rails

For newbies, rails is amazing. With its constituent modules such as active-record, action-pack, action-mailer etc.. acting behind the scenes, rails provides a lot of abstraction to the developers making things more simpler. Given this simplicity through abstraction, there follow some issues which may degrade your application performance when not taken care of properly.

Let's have a look at Active Record. It is Active Record that makes people go crazy about rails. It shoulders the responsibility of database operations for the users providing them with different flavours of methods to deal with. But there are some pitfalls to consider:

1. The default 'find' method fetches all columns from a table row:
Active Record works at the row level but not at the column level. Consider a table "employees" having emp_id, emp_name, emp_slary, emp_address and etc.. upto 50 columns. When you want to make a detailed list of all the employess, you may tend to write

@employees = Employee.find(:all).

The above statement fetches all the fifty columns of every single employee from the table and converts it as Employee objects. What if you need only a set of
columns(say
emp_id, emp_name, emp_slary, emp_address) but not all. Now you can achieve this using :select option in the find method
@employees = Employee.find(:all, :select => ['
emp_id', 'emp_name', 'emp_slary',
'emp_address
'])
What this does is select only the specified columns from the table and converts it into Employee objects. Accessing unspecified attributes from the resultant Employee objects may reult in an Error/Exception, but saves a lot of overhead in selecting all the columns and turning the rows into heavy objects.

Another case may be an articles table. Though it may seem to contain less number of columns, one tends to use a column for article body which may contain text as well as images(usually these type of columns are set to BLOB type). Here also, to make a list of all the articles, you can write


@articles = Article.find(:all, :select => ['article_id', 'article_title', 'author_name',
'published_date'
])

and avoid the body column if you feel not needed.You are free to use all the other options along with :select.

2. Eagerloaded associations that contain heavy data:

class Author < ActiveRecord::Base
has_many :articles
end

class Article < ActiveRecord::Base
belongs_to :author
has_many :comments
end

class Comment < ActiveRecord::Base
belongs_to :article
end

Assuming that you know how to eagerload associated models using :include option , lets look at how we can finegrain the eagerloaded models using the :select option with :include.

@author = Author.find(:id, :include => [:articles])

This fetches an author's record and all the article records that belong to this author. But how to avoid fetching the heavy 'body' column from articles table. Can we use the :select to fetch only the desired columns from the associated model through :include ? This is not possible because eagerloading generates SELECT statement too, the use of :select together with :include is not supported . This can be achieved with a bit of extra code:

Download the patch from http://dev.rubyonrails.org/attachment/ticket/7147/init.5.rb submitted by mrj to Rails Trac.
Place this file in your lib directory and require it in environment.rb. For ex, if the file is named 'include_with_select.rb' in your lib, you can write in your environment.rb as:
require 'include_with_select'.

With this setup, you can freely select the desired columns in the :included associations as:
@author = Author.find(:id, :include => [:articles[:article_id, :article_title, :author_name, :published_date]])

If you want to eagerload a set of comment attributes for every article, you can write it as:
@author = Author.find(:id, :include => [{:articles[:article_id, :article_title, :author_name, :published_date] => :comments[:comment_text, :comment_by]}])

This way we can achieve a better performance using :select with/without :include.

Thursday, September 18, 2008

have you ever measured your rails application performance?

Well, you have been developing rails applications for over a year or couple. You did every bit of it The Rails Way keeping it DRY, writing migrations, associating your models and built-in test cases. The app seems to work great on your local machine. But are you sure it does on the production server? Have you ever used any profiling or benchmarking tools to measure the performance of the apps? If not its time to use one and make sure your apps run faster not only during development but also in the production environment.

As there are many such tools that serve the purpose, the two head-to-head competitors, RPM from new relic and Five Runs from fiveruns , which are strongly backing the Rails framework to become enterprise ready, make a good choice.

Given the task to finetune a rails app, I preferred relic over fiveruns as it is very easy to get going. RPM Lite, a standard version of RPM is free as long as you want and you can always upgrade it to enjoy more featues. For a developer use, RPM Lite is enough.

RPM Lite is available as a plugin. I got the plug-in installation link through subscription at new relic, installed it and restarted my server.
Note: The plug-in installation creates a
config/newrelic.yml file. I didn't pay much attention to it. But it may contain some interesting configuration.


Now, whenever I made a request to the server, the relic plugin monitored my request and measured the time for processing my request. This is available at http://localhost:3000/newrelic.

Amazingly, relic tracked my request right from the controller to the model to the view.
1. It gave me an analysis of time spent in the controller, the model/db calls executed and to
render the results to my view to an accuracy of milli seconds.
2. It also produced a pie-chart of the processes involved.
3. It provided me with an sql view, where I could track the sql executed.
4. Number of such requests that could be server per second.
5. It also tracked AJAX requests.

From the analysis, I learnt that the app is spending most of the timein DB calls which I eventually reduced.
This way, I could make a better use of RPM and finetuned the app resulting in a notable performance boost. So, why not give it a try right now. I hope RPM will serve the purpose for you too.