Thoughtlets, Music, and Code from Noah Thorp

Railsconf 2008 Scribblings: Hosting and the Woes (Engine Yard Panel)

Posted: June 13th, by Noah

5/30/2008, 11:45AM, Portland, OR

Here are some of my notes from Railsconf 2008. Warning these notes are only vaguely edited (but arguably still useful). There are certainly typos and errors. If executing code from one of these articles, your mileage may vary (from spontaneous self destruction of everything you know and love to spontaneous coolness). Also, you may want to double check “facts”. You definitely should not use these articles as a spelling reference. Feel free to post corrections.

Engine Yard Panel:
  • Tom Mornini – CTO
  • Taylor Weibley – Application Support Director
  • Edward Muller – Automation Manager (fluent in binary)
  • Ezra Zygomowitz – System Architect
  • Jamie van Dyke

Problems they have had and how they solved most of them. Then questions and answers.

Main problem with rails is active_record

Great for quickly building with RESTful resources. But…
  • find(:all).each do |leak|
  • testing is usually with 20 records.
  • no indexes.

Engineyard has a team of SQL guys to optimize in this area.

One of the first problem they noticed that you need indexes on all foreign keys. If you don’t have them they are full table scans. Migrations automatically index ids but not foreign keys. Basically anything that ends with _id needs an index. There may be some database experts that will disagree but this is not usually the case. Use explain in mysql /G to make sure that the indexes are being used in the queries. Take a look at the new Relic optimization stuff it’s super cool.

Plugins that give common problems

  • Most common and hated plugin is ferret. Indexes grow past 1G and indexes frequently corrupted. Use sphinx instead. Sphinx will just update indexes on the delta.
  • Image Science uses ruby inline which needs environment variables to be set. Can be used perfectly in development and then when they shift to production monit clears those variables out. Not a huge problem but you should be aware of it.
  • Hodel 3000 loggers default is a verbose logging tool and it can generate huge log files. Adjust settings. Logging should be set to info.

How much traffic can you take?

Depends on coding

General stats:
  • Digg 10,0000+ visitors and 1% signup. What should you do? Cache it. At engine yard a slice could handle this quite easily. Customers are often concerned with this sudden out of the blue Digg or TechCrunch. But in reality you usually are and there is no surprise. Makes it so you can be ahead of the curve.
  • TechCrunch 1000+ visitors but they are more interested in what you are doing.
  • Ruby Inside / Flow 500 – 1000 visitors a day, also don’t get many sign ups. It’s more about what can the visitor do.
  • The Today Show – 100,000+ visitors in the first hour or so. Can get up to 10,000 signups in the first couple hours. You will have advance warning which is great.
  • Fox Business – 2000+ connections.

How many queries can your site take?

It depends on code.
  • Queries per page
  • But you should cache!
  • Best way to handle hike in traffic is cache / memcache
  • Keep file input output down as low as possible
  • Separate out to different disks, you don’t want simultaneous reads and writes

Deployment

Lots of customers not up to speed on Capistrano, git / subversion. Engine yard gives deploy.rb (shameless plug from them :-) gem source – a http://www.engineyard.com … Includes cap examples list of extra tasks – loads of things that it can do.

ebb | mongrel | thin. Which do customers prefer?

Alternatives to mongrel are possible but not necessarily production ready (ebb). Don’t need to be super concerned about it unless you are doing 1,000 hits per second. The bottleneck is usually the user code, this is far more relevent. That said, thin may be 13% faster but you need to be in the top 5% of internet traffic for this to matter.

Audience Q&A

What happened with NGINX? You haven’t heard about much new because it’s in use and it just works. They haven’t really had any problems with NGINX. Some minor issues but they have been ironed very quickly. They have seen 40MB/sec of static images served and not show up top command.

Interpreters?
Rubinius and Modrubinius are going to be the way to go in the future.

Thin instead of mongrel because of lower memory footprint?
There are a lot reasons to save memory but mongrel is a minor concern. More than 3-4 mongrels per CPU is pointless. The bottleneck is file io rather than memory. A common misconception is that you need more mongrels. Thin tends to backup processes more than mongrel. It is important to watch the real behavior of the app because every app uses resources differently.

Should static files be local or shared across cluster?
Cluster is fine @ Engine Yard, but depends on file system. NFS was cool in 1979…

How to handle asynchronous background processes?
Background Job (BJ) is pleasurable to work with. Written by NGINX.

Do you leave keep alive with NGINX?
Untested. But historical use with apache pre-rails was not succesful.

Seperate server for search workload? How should resources be distributed across server farm and how does this map to virtualization? Engine Yard are strong believers of virtualization. Intended from earlier on to only have virtualized servers. The more that they have gotten involved in it they have discovered that if you have a well built virtualization environment you quickly get the sense that you are operating on separate machines. NGINX, 3 Mongrels, and an memcache per instance. If you have a solid architecture you could have databases on multiple instances. Single server rack at engine yard with 18 servers would run bottom 97% of all internet apps. There’s huge CPU capacity in a virtualization environment. CPUs are not typically the bottleneck (unless you are doing fluid dynamic calculations).

What is the Engine Yard uptime?
3-4 9s up time. Rack Space rates using a questionable rating system.

Railsconf 2008 Scribblings: Joel Spolsky (Keynote)

Posted: June 13th, by Noah

5/30/2008, 9:15am, Portland, OR

Here are some of my notes from Railsconf 2008. Warning these notes are only vaguely edited (but arguably still useful). There are certainly typos and errors. If executing code from one of these articles, your mileage may vary (from spontaneous self destruction of everything you know and love to spontaneous coolness). Also, you may want to double check “facts”. You definitely should not use these articles as a spelling reference. Feel free to post corrections.

Intro by DHH on Joel:
  • Joel is author of Joel On Software
  • Generated programming language to generate other programming language
  • fighting complexity (anti-architect astronaut)

Joel’s Talk Begins…

Why does Brad Pitt make so much more money and have a bigger brand than Ian Somerhalder? What becomes “blue chip software” rather than off brand?

Herman Miller Aeron Chair was the bluechip chair. All types of people make clones. Clones didn’t have market share. This happens in every field. Happens in music as well.

Great software has three components that make them blue chip:
  • Make people happy
  • Obsess over aesthetics
  • Observing the culture code

Make People Happy (Agency)

[Very funny example about logging in to windows to post a photo and then having to install windows downloads and then use original install a file from the insert original install disk. In a word you feel enslaved (you are not in control of the process)]

Martin Seligman pioneering psychologist of “Learned Helplessness” liked to study happy people which is rare amongst psychologists.

Example: Someone dies > you are stuck in bed depressed > you loose your job and that you begin to believe that you lack agency and then you begin to teach yourself that you are helpless. Seligman’s strategy was to teach people that they have power over there environment through tricking the brain into a virtuous cycle of having agency (initially through accomplishment of simple tasks).

Abercrombie ecommerce checkout forces you through 4 pages. Amazon gives you all the options at once (change address, shipping options, etc). This is Joel’s first example of giving the user agency.

Make people happy by putting them in control (speed of response is also important here – the business case for AJAX).

Obsess over aesthetics

Samsumg Blackjack vs iPhone.
  • Blackjack faster
  • iphone bricks if you install third party software
  • Blackjack ugly & iphone is seamless and shiny (if you accidentally swallowed one it would go right down)
  • Blackjack replaceable battery – iphone doesn’t (avoids the non aesthetic seam). Another example is looking at the bottom of the MacBook vs. thinkpad.

Apple design decisions are about fashion. Another example is a historical Paris apartment building in Paris has no fire escapes. It turns out fire escapes are zoned, but the appartment buildings are reclasified as monuments and they are not inspected. This is another example of taking aesthetics as more important than core functionality.

Programers call this lipstick and say I want to work in the guts (the 90% of the iceberg below the surface). But it turns out that aesthetics are very important. To programmers it is not clear why artists such as Basquaiat etc. sell paintings for $50+ million. Specifically programmers don’t get modernist architecture.

Modernist principles remove all decoration from architecture. There is a parallel here between the programers gravitation to the command line vs. glossy operating systems.

The Culture Code

Example: the promotion of sport utility vehicles. Deaths per million cars:
  • Toyota Camry 41
  • Ford Explorer 88
This has to do with a number of factors. But you feel safe in these vehicals. This has to do with promotion (see the culture code).
  • Everything around you should be round and soft
  • You should be up high (on the reptilion level if you feel tall you feel you are dominating)
  • Key element of safety as a child is being fed if everything is round, soft, high, and you feel fed (cupholders) you feel safe.

Another example like this is the word “Enterprise”. Recalls tokyo at night, a clean server room, the starship enterprise, people in clean business suits looking up(image).

Another example is Web 2.0. Lots of logos. They don’t have visions they go to parties… etc. Why the luck stiff :-) How could beauty, happiness, motivation, pride, pleaseure, enthusiasm be involved in coding. Very funny counter examples for python and java go here…

This all boils down to Misattribution. You have a physical reaction and you think that it has something to do with the mental plane but it doesn’t. People on the right hand side of the movie have 10% greater ratings than the left.

Erlang Pragmatic Studio - Day 3 Notes

Posted: February 15th, by Noah

Erlang Pragmatic Studio with Joe Armstrong & Dave Thomas – Chicago, Feb 13-25, 2008.

Day 3 – Feb 14,2008.

Here are my notes from Day 3 of the Pragmatic Studio Erlang course. Note that these notes are only vaguely edited. Be aware that these are notes so there are certainly typos and errors.

Beginning of the Day

Always good to develop from the specific to the general. e.g. make a simple app, then parameterize it with functions. Universal server that evaluates F is a very powerful generalization of a specific application (e.g. messaging).

can send a message to your self:
self ! a. % Could tie into recursion

a lot of users Erlang guru’s (tailf, Kredita) don’t use OTP, as they participated in the OTP creation. But if you build up the Gen server from incremental steps, you may find it not so difficult. The question is, does OTP really fit the problem at hand.

If you do spawn_link from the shell, report errors back to the shell.

erl -bootstart sasle % spelling system application support library. Get error logger and a few other items.

sasle libraries were added into Erlang at a later date for historical reasons.

Bit Syntax

In ruby or other languages would be fairly tedious (ands, ors, bitshifts, etc). Erlang has a high level syntax that lets you specify the structure of binary data.
Red = 2, Green=61, Blue=20.
=> 20
Bin = <<Red:5,Green:6,Blue:5>>
=> <<23,180>>
io:format("~8.2.0B~8.2.0b~n", binary_to_list(Bin)).
=> 00010111 10110100

Bit Twiddling History:

Erlang had lots of protocols internally in Erickson. Decided to implement every RFC – but was too difficult. Lots of bit twiddling. Because of this they made a sub-language for doing this bit-twiddling so that this was implement any protocol. Optimal and efficient unpacking of datastructures.

1/3rd of details were in the Erlang book. The rest needs to be found in the man pages.

There are also bit comprehensions – generators of bit stream. Good for huffman encodings, MIPS processing, etc. Very efficient. This is in the experimental phase but will be integrated into the system. Undocumented items are not official and may be changed. Once the docs are written it’s official.

Erlang mailing list erlang.org > FAQs and mailing lists.

Book page 86 great shoutcast server for distributed audio. There is an excercise for bit processing, but we are skipping it in the interest of time.

file at a time I/O

Reading file at at time is about 10-20 times faster than reading line at a time. Erlang is not as efficient as some other languages for string matching. Use [H|T] to parse. Only multi-gig pattern are difficult for this. binary input and output is definitely fastest.

file:consult(File) -> {ok, Term} | {error, Why}

Term Access – read configuration file
-module(test4).
-
read_config() ->
    {ok,[{host,Host,port,Port}]} = file:consult(...)
    ...

Term IO

  • inefficient

io:read('enter a term').
enter a term > {hello, "joe"}
io:get_line(...) %read from a shell

erl_scan:string("abc,123,{hello,joe}"). % turn into erlang tokens - very efficient
% written with explicit recursion that is more efficient than regex

file:pread ...
map onto standard file manipulations:
file:list_file(Dir)
file:delete()
...

filename:split(FileName) => [Component]

DNS servers are actually fairly easy. In a real DNS server there is tree walking for the sub-domain, domain, and tld xxx.xxxx.xxx

lots of DNS servers have interesting traffic patterns… tie ins with advertising sites, etc.

Erlang doesn’t link code into it’s Kernal. Need to establish a socket for C libs. Send messages using binary.

Orientation:

  • nodes – when you say erl it starts a node. Doesn’t register this node by default.
  • distributed erlang consists of a number of nodes that know about eachother.
  • SMP takes advantage of multi-core – this is not distributed Erlang
  • Socket distribution – is also not distributed Erlang
erl -sname %short name erl -name % different names

There is a magic cookie that they must share. Simple challenge request, traffic is not encrypted. Can communicate over SSH. DNS and name recognition systems are often poorly configured and this causes some difficulties for setting up distributed erlang apps. There are som FAQs on debugging distributed Erlang.

Three new primitives:

  • spawn(Node, Mod, Func, Args) – links work the same
  • alive(Node) – tell the system you are alive

Main libs:

  • rcp
  • global
Dave: mnesias cool be cause you can query it using list comprehensions.

Test on local nodes, then over a cluster.

erl -sname dave

rpc:call('joe@Daves-Powerbook', erlang, node, []).
rpc:call('joe@Daves-Powerbook', erlang, exit, [kill]).
rpc:call('joe@Daves-Powerbook', erlang, halt, []).

Shell commands for interacting with erl nodes:

  • ^g gets you into the shell
  • j – jobs
  • r ‘dave@Daves-Powerbook’ % open a remote shell
  • c 2 – change to the second process
  • now at (dave@Daves-Powerbook)>
  • toolbar:start(). % show current processes – queries OTP structure

Security:

  • Security on distributed Erlang is very course grain. It’s all access or nothing.
  • Perfect for a fire-walled corporate cluster.

Need to shore cookie to other machines. Check your home directory – .erlang.cookie

OTP – Open Telecomms Platform

This is the platform / framework for building Erlang applications.

  • Error logs
  • Hot swapping modules
  • Common tasks that everyone needs
  • etc.
  • Mnesia
  • SASL
  • SNMP Agents
  • Web Server

Lots of undocumented corners. Programming Erlang book is very helpful. It’s Open Source in the sense that you can see the source. But commit rights are controlled. It is product quality. There is no difference between Erickson’s products released and the Open Source distribution. There are 15 people working full time to maintain this – stable, battle tested.

OTP Principles

  • Joe Armstron PHD Thesis
  • erlang.org/doc/design_principles/part_frame.html
  • client-server gen_server
  • finite state machines gen_fsm
  • event handling gen_event
  • supervisor gen_sup
  • Applications – e.g. mnesia, standard libraries, collections of processes shipped in it’s own right. start and stop etc.
  • Releases – 4-5 applications combined form a product
  • Application upgrade – upgrade a running system

Finite state machines

  • State x Event -> State1 x Action
  • Way of writing pattern matching on state and event.

gen_event

  • no reply back (error log)
  • event to new state, no reply

Order of reading:

  • Joe’s thesis – best intro to OTP: sics.se/~joe/thesis
  • erlang.org/doc/design_principles/part_frame.html
  • forthcoming O’Reilly book

Behaviors:

  • OTP name for “design patterns”
  • Callback module

Case studies:

  • AXD301 – 1.6 million lines of Erlang, built using OTP.
  • Nortel Networks

Factoids:

  • gen_server is the most used behavior.
  • supervisor bridge – allows a C module to be fault tolerant in use with Erlang.
  • gen_server – A generic client-server model

Types of Supervision:

  • 1:1 vs 1:N supervision
  • 1:1 – if child dies only 1 is restarted
  • 1:N – if challed dies all child sibling processes restarted

trapexit.org is a good site.

Distributed applications:

  • Must have the same beam code on all machines. There are checksums.
  • Must have the same version of Erlang.

One way of solving this is to have the share the same backend NFS system.

Explicit code distribution. Send the code in a message:
{ok, Bin} = file:read_file("xxx.beam"),
Term = {apply, erlang, load_module, [Mod, Bin]}),
gen_tcp:send(Socket, term_to_binary(Term))
run it:
{apply, xxx, start, [...]}

dynamic code upgrade:

  • Mod:Func(Args, ...) the latest version of Mod is called
  • If you reload Mod then you can run two versions of the code at the same time
  • You can only have two versions of the code running at the same time, an old and a new version
  • To load a third version you must call erlang:purge_module(Mod) before reloading the module (think of this as a two place shift register)
  • to load code call erlang:load_module(Mod, BeamCodeBin)
One process per whole Erlang instance on the OS. This would be very dangerous. But GC and as many things as possible should be written in Erlang. Hierarchy of tests:
  • Unit test framework that comes from France (used by ejabber).
  • Erickson has a test server that is distributed with Erlang. Massive regression and unit tests.
  • Credita has modified to run test any time anything is checked into their repositories.
  • Quick check written by John Hues. Was written in Haskell has been ported to Erling. Generates random tests that satisfy certain properties. Does some automatic reduction of code execution to minimize error conditions. This is an expensive use.
  • E Unit – Dave found this somewhat cumbersom. Straight pattern matching was easier. Didn’t seem to have a huge value add. Would like to see how he can use differen’t naming conventions to automate and have some sort of runner. Find *.erl run anything named test and report if it doesn’t return a pass. Gen server seems easily testible by calling handler methods.
Not currently.

Petri Nets or activity diagrams?

Message sequence diagrams are used frequently. Joe is not a fan of drawing programs. Although, message sequence diagrams are extremely useful (Y = Time. X = Process). Joe is writing a program for animating Erlang.

erlang:trace(Pid, [Msgs, Call, list of things to trace]). 

This process get’s lots of messages. Joe is writing a program that stores these in a file and then animates the activities of these processes over time. Working with a games developer to animate this.

Erlyweb?

Joe – hasn’t spent much time with this and YAWS (Yet Another Web Server).

Erlang graphic user interface?

Lots written. Tend to be hard to use – wswidgets being used for 3D process modeling. Interfaces to SDL, GTK, Cairo, etc. TCL library is the mainly supported one that works everywhere. Lots of people have been doing GUIs in web browsers – action script 3 and flex. Can also use flex2 as a device driver for Erlang – e.g. video stream. Makes it possible to make these applications more portable. Therefore can use Flash as a device driver. Drop dead beautiful – flash with AIR may be the way to go.

Redeployable packaging?

Erlang is packaged for win, linux, and mac. CEAN (Comprehensive Erlang Application …) ... Wings. Martin Logan, faxian. Martin Logan also has generator applications for Erlang (Erlware?).

Good frameworks for doing ontological modeling and reasoning?

Expert systems shell. University of Corona. Reported at the Erlang user conferences. Multi agent listener but not generalized to owl.

List of reference projects?

No canonical reference. But check trapexit.org.

Where is Erlang center of gravity?

Attracting lots of interest in Financials – especially in London. Finance industry targeted conference coming up in London. Much bank interest – extreme real time demands. Number of pure erlang based trading companies. Debt buying company – purchases debt and collect. Buys these debts a few moments after the transaction. Mnesia founder involved.

Future changes – where is it going?

Not many changes to the language. Millions of legacy lines of code needed for compatibility – syntax unlikely to change. But, implementation speed optimization and multi-processor support is likely to improve.

Embedded Erlang?

4MB probably the minimum size. Used to be 640k but may be a better domain for C.

String localization – UTF8?

In a sense it is solved. In another sense it is not solved. In Erlang there is no string type – it is a list of integers. UTF8 is an integer and can be stored. However, the interpreter from text to binary and back is not written. However, it is not difficult. It is in fact a library change.

What is a good candidate for Erlang?

Telecomm applications, lots of small processes. Individual processes have low ammount of short computations. Good at coordination, concurrency, switching. Not good at matrix multiplication, gif encoding etc. Financial applications fit the erlang profile.

Libraries for interfacing to SQL databases?

ODBC libraries. Could just open a socket to the DB. Java has a good arbitrary support for sockets. Check Yariv’s Erly web implementation.

Summary

Fault tolerance drives a lot of the design decisions behind Erlang. Simple functional language. Function selection is by pattern matching Variables are immutable

Concurrent Erlang

  • 3 primitives: Spawn, Send, Receive – very simplified from Object Oriented model.
  • register / unregister can be used to associate a name with a process

Fault Tolerant

  • catch .. throw, try … catch … end
  • link, process_flag(trap_exit, true)

Distributed Erlang

  • spawn(Node, Mod, Func, Args)
  • or explicit term passing

Benefits of Erlang:

  • Technologies becoming more concurrent and people are looking for solutions to that probelm. Cooperative applications, etc. Erlang is a working solution. Not operating system dependent. 20 – 30 years of experience with fault tolerance measured in years.
  • Multi-Core ready
  • Processes in the language

Some projects to research:

  • AXD301 – biggest Erlang and Functional programming ever made 60 programmers x 3 years. Runs backbone of british telecom. Market leading in that sector. Peaked at 9×9s of reliability. This wasn’t a commercial project on 2-3% of the market.
  • Kreditor (Kreditor.se) – buys debt. #2 IP startup in sweden. $150 year. Self funded. Founded by alumni of Blue Tail.
  • SimpleDB (Amazon)
  • CouchDB (text db)
  • MociWeb (Mochimedia)
  • Ejabberd (jabber server – by extension Twitter?)
  • ErlyWeb (Erlang web framework inspired by Rails)