Erlang Pragmatic Studio with Joe Armstrong & Dave Thomas – Chicago, Feb 13-25, 2008.
Day 2 – Feb 14,2008.
Here are my notes from Day 2 of the Pragmatic Studio Erlang course. Note that these notes are only vaguely edited. Be aware that these are notes so there are certainly typos and errors.
Concurrency
review of an implementation of map:map(F,[]) -> [];
map(F,[H|T]) ->
[F(H) | map(F,T)].
Erlang is a concurrent language and a functional programming language second. Why concurrent? The world is concurent – external world & computing world. Often you inherit all the limits of the operating system in regards to processes. Erlang is not bound by the operating system. The emulator handles concurrency.
- Programming concurrent activities in sequential languages is artifically difficult.
- Shared memory (locks, mutexes, coroutines, processes, thread, deadlock, livelock, failure, thread-safe)
- Message passing
Web servers must spawn a new thread for every request. In Erlang when one thread crashes, only that one crashes not the whole system. This is not the case in other environments. Isolation of components is the key to the pattern.
Dave:- process = object
- message = method
- can apply the same techniques in OOD. The small talk paradigm for passing messages unifies this (to a certain extent).
Joe: Alan Kay in his origination of object orientation, messaging was more primary. The problem with a function call is that the message must return to the caller.
Concurrency in other languages- can only create very small number of processes
- heavy weight
- only supports message passing and not the kind of error handling semantics that Erlang has
- Pid ! Message
- If Pid is hidden you cannot send a message to the process
Message arrives in a mailbox (like email, can be stacked). You never know if a message arrives (if you want to be sure send a message back).
Only three primitives for turning sequential program in to concurrent code:- spawn – create new process
- send
- receive – try to match pattern
Spawn
Three syntaxes:- spawn(fun foo/0) – runs foo/0 in a new process
- spawn(fun() -> ... end) – spawns and inline fun
- spawn(Mod, Func, [Arg1, Arg2, Argm]) performs
apply(Mod, Fun, [Arg1, Arg2, ..., Argn]) in a new process.
In Pid1 evaluate Pid2 = spawn(fun() -> ... end)
Send
- Pid ! Msg
- returns the message (e.g. A ! B ! C ! Msg)
Receive
same as a case statement:
receive ->
pattern
...
end
self() is the pid of the current process. including it in a message allows the recipient to respond to us.
% receive whatever message comes first
receive
Msg ->
...
end
% receive messages in order
receive
foo -> true
end,
receive
bar -> true % leave bar in the mailbox until foo is received
end
B!{transfer,self()}
receive
{transfer, A} ->
C ! {transfer, A}
end
receive
...
promise and yield are non-primitive abstractions.
Are we hiding the wrong thing? Should we expose the protocol? Pros- Hides message structure
- Makes module reusability easy
- we know how to define APIs
- More code
- We don’t know how to describe protocols
- Protocols do not exist in Erlang as first class objects
Key factor minimizing cross machine boundary cost.
Abstracting server functionality- run a server that can run in the background
- needs some kind of state
- pass in state at the start
- in case of counter pass in 0
- then cast
Mutate state through recursion. Two more items.
Registered processes.- Any program that wants to send a message to a Pid must have Pid in a local variable.
- Use register / unregister
- register(name, Pid), name ! Term sens a message to Pid
- whereis(Name) returns Pid or undefined
- mind the propigation delay
Example of register: register(counter, spawn(fun() -> counter(1)end)).
From shell see all the registered processor with: regs().
Registered names are global resources. If two processes register with the same name one will fail.
receive… after
wait for a certain period of time – timeoutsreceive
pattern1 -> ...
pattern2 -> ...
after
TimeInMilliseconds -> %can be atom {infinity}
Actions
end.
Pid = spawn(fun() -> wait() end),
Pid ! {do, F}
wait() ->
receive
{do,F} ->
F()
end.
Security risks.
SMP and Processes- Erlang can take advantage of multiple processes.
- Enable using -smp +S n options only if compiled with correct flags
- spawn
- send
- receive
- default max processes is 327xx
- increase processes with: erl +P 1000000000
- once it passes
processes:max(1).
MACRO: Debugging
-define(DEBUGGING,true).
-ifdef(DEBUGGING).
-define(DEBUG(X),X).
-else
-define(DEBUG(X),void).
-endif
?DEBUG(io:format("Found:~p~n"),[Files]),
...
Two ways to do distributed computation:
- distributed Erlang as cluster with one name space. Consider different machines may have different versions of Erlang. Getting the right code onto the right machines may also cause issues. Fully connected machines don’t scale well – > 90 machines gets difficult.
- Explicit sockets. No issues of ownership. More easily scalable as other issues are encapsulated – memory is fully not shared.
Client
{ok, Socket} = gen_tcp:connect(Host, Port, [Options]),
ok = gen_tcp:send(Socket,Data),
...
receive
{tcp, Socket, Data} ->
%% do something with the data
...
{tcp_closed, Socket} ->
%% take care of this ...
How does TCP work?
- server needs to be able to hand multiple clients trying to connect to it. The protocol is slightly different.
- c level api – listen(ip, port);
all that listen does is say I will support people on this given port. call accept and turn it into an IO object that you can use. Typically each connection spawns a different process.
a single-shot server accepts one connection: start_server()
{ok, Listen} = gen_tcp:listen(Port, ...)
spawn(fun() -> par_loop(Listen) end).
par_loop(Listen) ->
{ok, Socket} = gen_tcp:accept(Listen),
spawn(fun() -> par_loop(Listen) end),
loop(Socket).
Packet lengths:
- A 4 byte length header is automatically added/removed by the system when calling gen_tcp:send and messages are assembled to the correct length before {tcp, Sock, Data} message are sent to the controlling process.
gen_tcp:connect(Host, Port, [..., {packet,4, ...}])
gen_tcp:listen ....
Sending Erlang terms
gen_tcp:send(Socket, term_to_binary(Term))...
receive
{tcp, Socket, Data} ->
Term = binary_to_term(Data),
...
The Middle Man Pattern: - Middle man has to be written for a particular protocol. - like unix filters (Dave) – piping
loop(Pid,Socket) ->
receive
{Pid,Msg} ->
gen_tcp:send(Socket, term_to_binary(Msg)),
loop(Pid, Socket);
{tcp, Socket, Data} ->
Pid ! binary_to_term(Data),
loop(Pid, Socket);
{'Exit',Pid} ->
gen_tcp:close(Socket);
{tcp_closed, Socket} ->
exit(Pid, socket_closed)
end.
use term_to_binary and binary_to_term.
when to use:- , separate individual expressions (smallest scope)
- ; pattern matching
- . at end of function (biggest scope)
% from simple_coordinate_server.erl
% pass in milliseconds or infinity to sleep forever
sleep(T) ->
receive
after T ->
void
end.
gen_txp:listen(0 ...) % gets a free port
Concurrent patterns:
- client server model is the most used pattern when writing concurrent programs.
- can also use worker manager pattern.
- or finite state machine.
OTP course would require Erlang background.
try
...
catch
exit:Why ->
From ! {Name, die, why}, % die and tell calling process
loop(State)
end
delegation is a nice place to do load balancing: client, delegator, responder
Erlang was developed in a trusted environment, many of the techniques are applicable to this environment.
This is an empty server that will run anything passed to it (guest account slightly safer but still incredibly permissive):Pid = spawn(fun() -> wait() end),
...
Pid ! {do, F}
...
wait() ->
receive
{do, F} ->
F()
end
Many variations of concurrency:
- Resource managing
- Mobile Code
- Mobile code with transactions
- Systems have non-functional behavior (scalability, fault tolerance) – usually defined very weakly.
- These are usually vaguely defined.
- In Erlang these things can be specified clearly.
In Erlang you often write functional and non-functional code separately. These can then be used independently. Can abstract things like scalable fault-server and fault tolerant processing. Cool!
You can say how many failed processes are allowable in a given period. This can be specified programatically or in product requirements. e.g. if there are three machines and one machine fails and data is lost it is warranted, but if two machines die and data is lost it is out of warranty.
links
links define error propagation paths If A is linked to B then:- If A fails then B will be notified
- If B fails then A will be notified
- messages are sent with send
- signals are sent when processes die. Signals are not messages.
- if process A is linked to B it A will be sent an exit signal if B dies.
- application level
- level 2 – trap exits
- level 1 – core level to trap exits
- if you do spawn followed by link the process might die before you get to the link statement
- spawn_link is like spawn followed by link only the two are performed atomically.
- trap_exit
- fault tolerant systems need two separate computers
- let one process do the work
- let some other process fix the error, even remotely. If a whole machine is crashed it must be fixed remotely. This pattern should also be applied on a local system – cross machine boundary should be handled in the same manner for scalability.
- exit signals are like uncaught exceptions that escape from the process
- failing processes should fail early
How do you know when a machine fails? In distributed Erlang, there should be a heartbeat process that is sent between machines (1-10 times a second).
use halflink for monitoring.
Link methods:- link
- unlink
- process_flag(trap_exit, true) – tap exits
- {‘EXIT’,Pid, Why} – message sent when a process dies These methods are orthogonal to spawn, send, receive