5/30/2008, 11:45AM, Portland, OR
Here are some of my notes from Railsconf 2008. Warning these notes are only vaguely edited (but arguably still useful). There are certainly typos and errors. If executing code from one of these articles, your mileage may vary (from spontaneous self destruction of everything you know and love to spontaneous coolness). Also, you may want to double check “facts”. You definitely should not use these articles as a spelling reference. Feel free to post corrections.
Engine Yard Panel:- Tom Mornini – CTO
- Taylor Weibley – Application Support Director
- Edward Muller – Automation Manager (fluent in binary)
- Ezra Zygomowitz – System Architect
- Jamie van Dyke
Problems they have had and how they solved most of them. Then questions and answers.
Main problem with rails is active_record
Great for quickly building with RESTful resources. But…- find(:all).each do |leak|
- testing is usually with 20 records.
- no indexes.
Engineyard has a team of SQL guys to optimize in this area.
One of the first problem they noticed that you need indexes on all foreign keys. If you don’t have them they are full table scans. Migrations automatically index ids but not foreign keys. Basically anything that ends with _id needs an index. There may be some database experts that will disagree but this is not usually the case. Use explain in mysql /G to make sure that the indexes are being used in the queries. Take a look at the new Relic optimization stuff it’s super cool.
Plugins that give common problems
- Most common and hated plugin is ferret. Indexes grow past 1G and indexes frequently corrupted. Use sphinx instead. Sphinx will just update indexes on the delta.
- Image Science uses ruby inline which needs environment variables to be set. Can be used perfectly in development and then when they shift to production monit clears those variables out. Not a huge problem but you should be aware of it.
- Hodel 3000 loggers default is a verbose logging tool and it can generate huge log files. Adjust settings. Logging should be set to info.
How much traffic can you take?
Depends on coding
General stats:- Digg 10,0000+ visitors and 1% signup. What should you do? Cache it. At engine yard a slice could handle this quite easily. Customers are often concerned with this sudden out of the blue Digg or TechCrunch. But in reality you usually are and there is no surprise. Makes it so you can be ahead of the curve.
- TechCrunch 1000+ visitors but they are more interested in what you are doing.
- Ruby Inside / Flow 500 – 1000 visitors a day, also don’t get many sign ups. It’s more about what can the visitor do.
- The Today Show – 100,000+ visitors in the first hour or so. Can get up to 10,000 signups in the first couple hours. You will have advance warning which is great.
- Fox Business – 2000+ connections.
How many queries can your site take?
It depends on code.- Queries per page
- But you should cache!
- Best way to handle hike in traffic is cache / memcache
- Keep file input output down as low as possible
- Separate out to different disks, you don’t want simultaneous reads and writes
Deployment
Lots of customers not up to speed on Capistrano, git / subversion. Engine yard gives deploy.rb (shameless plug from them :-) gem source – a http://www.engineyard.com … Includes cap examples list of extra tasks – loads of things that it can do.
ebb | mongrel | thin. Which do customers prefer?
Alternatives to mongrel are possible but not necessarily production ready (ebb). Don’t need to be super concerned about it unless you are doing 1,000 hits per second. The bottleneck is usually the user code, this is far more relevent. That said, thin may be 13% faster but you need to be in the top 5% of internet traffic for this to matter.
Audience Q&A
What happened with NGINX? You haven’t heard about much new because it’s in use and it just works. They haven’t really had any problems with NGINX. Some minor issues but they have been ironed very quickly. They have seen 40MB/sec of static images served and not show up top command.
Interpreters?
Rubinius and Modrubinius are going to be the way to go in the future.
Thin instead of mongrel because of lower memory footprint?
There are a lot reasons to save memory but mongrel is a minor concern. More than 3-4 mongrels per CPU is pointless. The bottleneck is file io rather than memory. A common misconception is that you need more mongrels. Thin tends to backup processes more than mongrel. It is important to watch the real behavior of the app because every app uses resources differently.
Should static files be local or shared across cluster?
Cluster is fine @ Engine Yard, but depends on file system. NFS was cool in 1979…
How to handle asynchronous background processes?
Background Job (BJ) is pleasurable to work with. Written by NGINX.
Do you leave keep alive with NGINX?
Untested. But historical use with apache pre-rails was not succesful.
Seperate server for search workload? How should resources be distributed across server farm and how does this map to virtualization? Engine Yard are strong believers of virtualization. Intended from earlier on to only have virtualized servers. The more that they have gotten involved in it they have discovered that if you have a well built virtualization environment you quickly get the sense that you are operating on separate machines. NGINX, 3 Mongrels, and an memcache per instance. If you have a solid architecture you could have databases on multiple instances. Single server rack at engine yard with 18 servers would run bottom 97% of all internet apps. There’s huge CPU capacity in a virtualization environment. CPUs are not typically the bottleneck (unless you are doing fluid dynamic calculations).
What is the Engine Yard uptime?
3-4 9s up time. Rack Space rates using a questionable rating system.