Erlang Joe Armstrong & Dave Thomas Chicago, Feb 13-25, 2008

Here are my notes from Day 1 of the Pragmatic Studio Erlang course. Note that these notes are only vaguely edited. Be aware that these are notes so there are certainly typos and errors.

Overview

“First public Erlang course in US.” Joe invented Erlang in 1986.

Fault tolerant systm:

  • Two computers, no shared data. Needs to be isolated computation.
  • Concurrent & distributed by default.
  • Then becomes scalable. Not sharing data key for fault tolerance.

Next to tackle – low power computing. Lots to tackle. Computing and MIDI messaging interest.

Uses:

  • Faster rules engine. Better distributed system than Java. Distributed locking across nodes is really error prone.
  • P2P. Center server is used to coordinate edge – erlang.
  • Internet worm emulator. Evaluate security.

Functional programming overlap with Ocaml, Haskel, Erlang family. Central idea of non-mutable state is critical to all functional languages – and in stark contrast to object oriented programming.

  • Day 1 – Sequential programming
  • Day 2 – Parallel programming
  • Day 3 – Distributed programming and OTP

Pairing suggested for the labs. One core concept for the labs.

  • Write a simple distributed client-server.
  • Write a DNS like server resolver.
  • Write a universal server and register it with DNS.
  • Send code for a web server to the universal server and tell it to execute it (mobile code)
  • Correct an error in the code and fix the error without stopping the server

Going from one to many machines can take hours to solve (e.g. sorting out why packets aren’t making it through firewalls).

The erl shell

Erlang shell: More a controlling environment than it is a full fledged interpreter (in comparison to say ruby’s IRB). How to get out:

  • unix \ aborts everything
  • G => submode => press q
  • ^C – a for abort

When you type ^G it’s like ssh:

  • Can jump to different shells. It’s distributed.
  • type h for help

Shell has standard keybindings – has emacs keybindings as well.

Data Types and Assignment

Integer arithmetic in Erlang is very good. However, if you want extremely fast integer arithmetic you would writ it in C.

X = 1.

Works differently than object oriented languages. Once value is assigned it cannot be changed. Variables are unbound or bound. Thread safety is anathema to Erlang. This is because there is no mutable state and semiphores are not necessary for protecting them.

In Erlang X=1 is a pattern matching in operator (not traditional assignment).

”=” is a very conventionally overloaded opperator. = was chosen for ease of use, but it is not used as it is in other languages.

Worker supervisor model of programming. Assign process to do something. If it fails get some other process to do it. Likewise if pattern fails, it just throws an error. Not like the constant error checking in mutable state applications.

Note: pattern matching errors in Erlang 5.6 are nicer than in 5.5.

Variables uppercase: X, Y, Bob

Numbers:

  • Integers – 1 34
  • Base – 2 2#101010
  • Base – 8 8#0723
  • Floats – 12.34e-07

Atoms:

  • lowercase
  • true false hellow_world
  • they evaluate to themselves

In Erlang everything is a 32 bit or 64 bit word. Atoms are stored once in memory (like ruby symbols). For comparison:

  • X = joe.
  • Y = joe.
  • X = Y. %These point to the same memory location

Tuples:

  • {food, egg}
  • can be nested {person, {firstname, “joe”},{lastname,”armstrong”}}
{C, D, C } = {1, 2, 3}.

Atoms can also be matched: {person, Name} = {person, “Dave”}.

f(). %forget all bound variables

Note: you may get an error in the book if you already have a variable bound and attempt to reuse it in another example.

h2. Lists List are used in 95% of your functions.
[1,2,3,4]
[] %the empty list

A = [1,2,3].
[C, D, E] = A.

Lists are a naturally recursive data structure. Lists can be expressed as a head and a tail: [H | T]

Head is something, tail is always a list. In C this would be a linked list. This notation originated in Prolog. Much of Erlang originated in Prolog – in fact the earliest versions of Erlang were implemented in Prolog.

Why Lists and Tuples? (these are the glue in the language):

  • Lists are for variable number of things. Lists are traversed and the first element is the only one that can be efficiently accessed through head and tail notation.
  • Tuples are a container for a fixed number of things.

Construct a new list (on the right hand side of the operator it’s constructor):
A = [1,2,3].
B = [ fred | A].
Deconstruct a list using pattern matching (on the left hand side of the operator it’s a deconstructor):
[ H | T ] = A.
H = 1.
T = [2,3].

Head of the list could have multiple values: [ H1, H2 | T] = [1,2,3].

Compiler optimizes your code so that it is only patterns. If and case expressions are eliminated.

h2. Strings
Strings are actually integers:
A = "Dave".
[H | T] = A.
H = 68.
Lists of numbers will be returned as characters:
[65, 66, 67].
=> "ABC"
”_” matches anything and doesn’t bind:
A = {name, "Dave", state, "TX"}.
{ name, Name, _, _} = A.

++ concatenates lists.

Interesting list length counts:
A = "hi,".
B = "there".
length([A,B]).
=> 2
length(A ++ B).
=> 8
length([A | B]).
=> 6
[A | B]
=> ["hi,",116,104,101,114,101] %"hi" is a nested list

Modules

Useful shell commands:
help(). %show all shell commands
b(). %display all bindings
v(N) %value of a query. Type the number of the prompt or -X and it will return the value at that point in your program.
creating a module:
-module(dave).
-compile(export_all). %export all of the 

Functions are defined by their name and “arity” (number of arguments that they take). Semicolons separate pattern matches and the functionality that should be applied. In erlang there is a tendency to write patterns rather than to write conditional loops.

Example of sum function with a test:
-module(dave).
-compile(export_all).

test() ->
    6 = sum([1,2,3]),
    10 = sum([1,2,3,4]),
    0 = sum([]),
    yes.

sum([H | T]) ->
    H + sum(T);
sum([]) ->
    0.

Why doesn’t erlang have a loop?

  • Joe: you don’t need it.
  • Dave: A loop usually includes a variable with state. Implicitly this has mutable variables.

Could write a test that runs all test methods found in each module and make sure final atom “hooray” or “yes” is returned.

Erlang idiom is not to do defensive error case checks. Program for the happy case in Erlang. Then the question is what are you going to do with errors?

The set of things that can happen outside of the specifications is nearly infinite. In Erlang only code that is specified and nothing else. If it goes outside, let it crash. This is very different than defensive programming.

  • Let it crash.
  • Crash early. If it crashes early it will be available in the logs and the state will be traceable.

On shipping, wrap the errors.

In a concurrent world there are many more things that can go wrong and are not reproducible. Joe’s thesis was on postmortem analysis of bugs.

%%File name is amod.erl
-module(amod).
%%these can be called from outside the module
-export([foo/1,bar/2]).
compile from command line:
erlc amodule.erl
=> outputs amodule.beam

run it: erl -noshell -pa /path/to/code -s Mod Func Args

erl has about 60 flags.

erl scripting
escript:
/#!/opt/local/bin/escript
main(List) -> %%main is first function called?
    ...
Code paths etc:
./erlang or ${HOME}/.erlang
code:add_patha("/path/").
loads using undefined function – done using an exception. Tail recursive variable that carries through the state. Equivalent of System.out.printline:
    io:format("Hello from: ~p~n",[:file:get_cwd()]).
  • p
  • n
If you installed from macports you will have the man page. If you compile you will have to add them in explicitly. View man pages like this:
erl -man io

Erlang is also on the website. May be faster than local man pages. erlang.org/doc/man

datatypes:

  • integers
  • floats
  • atoms

structured datatypes:

  • lists
  • tuples

Functions

”->” translates to “returns”.

BIF means “Built in Function”. Allows you to get out of Erlang and closer to the metal. These are not written in Erlang. Can see them in:

  • erl -man erlang
  • or in the book
gen_tcp:send(Socket, term_to_binary(Term)), ...

Any term can be serialized into binary. Joe: This is very optomized, a highlight of the language:

receive
    {tcp_data, Socket, Bin} ->
        Term = binary_to_term(Bin)
end.

tuple_to_list sometimes erlang:now(). In theory erlang: BIFs are implemented in erlang. But it turns out that this is not completely true. This is quirky.

Select a function to apply (Erlang meta-programming anyone?):
apply(Mod, Func, [Arg1, Arg2, ...]). %this is like a ruby send
M = list_to_atom("erlang"). %string is in fact a list
Funs, a bit like calling a lambda:
Double = fun(X) -> 2*X end. %define a function and then use it as a parameter!
lists:map(Double,[1,2,3,4])
lists:map(fun(X) -> 2*X end, [1,2,3,4]). %more idiomatic

There is no currying.

Writing your own control structures is common. For is not built into the language. See Joe’s Erlang book.

Funs can return funs:
MakeAdder = fun(Inc) -> fun(X) -> X + Inc end end. %this is a closure
Add5 = MakeAdder(5).
Add5(10).

Can write programs that only generate closures and evaluate them later (lazy evaluation). This is a technique that haskell programmers often use.

Some of the most useful functions are:

  • map – takes a list produces a list of values.
  • fold – takes a list and produces a single value.

lists:map(fun(X)-> X* X end, [1,2,3]).
=> [1,4,9]

Joe calls this list at a time programming. Makes programs much shorter.

foldl

  • applies a function to every element in a list, updating an accumulator as it goes. Returns a single value (the accumulator).
  • lists:foldl(Fun, Acc0, List)
  • great for summing

write sum using an accumulator (may be a bug/typo here):
sum(L) -> sum(L,0).

sum([H|T], Sum) ->
    Sum1 = Sum + H, %tail recursion is applied to Sum1 since it is called once and never accessed again & garbage collected.
    Sum(T, Sum1);
sum([], Sum) ->
    Sum. %at the end just return the accumulation value

Custom folds are extremely useful….

List Comprehensions:

  • come from set theory
  • eg {x : x is even} – the set of all even numbers
  • in erlang the colon is written as ||
  • [X || X <- L, (X rem 2) =:= 0]. % same as {x : x is even}
  • the above should be more efficient than map – compiler is optimized for list comprehensions.

[ Constructor || Pattern <- List, Predicate, ... ]

Generate the cartesian project:
[ {X, Y} || X <- [a,b,c], Y <- [1,2,3]].
[{a,1},{a,2},{a,3},{b,1},{b,2},{b,3},{c,1},{c,2},{c,3}]
The quicksort list comprehension:
-module(qsort).
-export([sort/1]).

sort([Pivot]|T) -> %arbitrarily choose first element as pivot
  sort([X || X <- T,X < Pivot])
    ++[Pivot]++
    sort([X || X <- T, X >= Pivot]);
sort([]) ->
    [].

Note that the quicksort function in Erlang actually is quite a bit longer than this and handles lost of special cases.

Accumulators pass extra argument(s) into the function:
-module(accum).

evens_and_odds(L) -> evens_and_odds(L,[],[]).

evens_and_odds([H|T,E,O]) when H rem 2 =:= 0 -> evens_and_odds(T, [H|E], 0); %gaurd in the first clause
evens_and_odds([H|T, E, O]) -> evens_and_odds(T,E,[H | O]); %no gaurd in the second clause
evens_and_odds([], E, O) -> {E, O}.

Adding gaurds allows the compiler to compile better. Erlang has multiple entry points to the function unlike Java and C. If you write your code without multiple entry points and with an if statement, you don’t have the opportunity to reorder the clauses. If you write this with an if statement you have done the compilers job. Better not to.

There are a few equality operators. Only different when comparing floating point numbers:
* == %fuzzy equal for integers and floats comparison
* =:=

Can’t call functions in gaurds. Know that they can be parsed into optimal decision trees.

  • Single Gaurd
  • Full Guard – comma seperated list of single guard conditions.

Gaurds sequence:

  • Commas are interpreted as and.
  • Semicolons are interpreted as or.

It’s messy to write a new function for every condition. This is reduced by writing guards:

case and if

%case statement
goto_work(Day)->
    case classify(Day) of
        weekday -> true; %could add a gaurd after weekday
        weekend -> false
    end.

goto_work(Day) ->
    if
        Day =:= monday -> true;
        Day =:= tuesday -> false;
        true -> false
    end.
compiler reduces case and if to:
f(pattern1) ->
    action;
f(pattern2) ->
    action2.

Booleans:

  • true and false are atoms
  • these are by convention only – but this is strongly recommended convention

Given a temperature that returns hot or cold. It is better to use is_hot and is_cold and return true and false. This will allow you to use functions that work with booleans. e.g. lists:filter(Fun, L) -> L’

Tame Large Tuples with Records. These expand into tuples. Joe thinks this syntax may improve (e.g. he says “it sucks”).

Example:
%Define a record - module syntax only. Won't work on the command line.
-record(person, {name, firstname, lastname, age, sex, weight, height})

X = #person{name="jane",age=32, sex=female} %%creates a new record
birthday(#person{age=N} = X) ->
    X#person{age = N+1}

There are quite a few functions that return tuples that are generated by records. These are records that are returned as tuple. If you include a specific header file it will be interpreted as a record.

It’s better not to use records for short tuples. This is the idiom.

Not mentioned here:

  • macros – first introduced for language customization (internationalization), then grew a bit in scope.
  • parse transforms – for implementing extensions to the language

Exceptions.

Overview:

  • exceptions are generated when errors occur
  • exceptions can be generated by a program: throw, exit, error.
  • convert exceptions into values

X=atom_to_list(123).
error...
X. % is unbound

catch F(X) returns F(X) if no exception is generated while evaluating the function. if the error is thrown returns a tuple.

some_function(X, Y) ->
  case something_that_might_crash(X,Y) of
    {'EXIT',Why} ->
        %% handle error;
... Or try catch. Gives more fine grained control.
start(Arg1, Arg2) ->
    try
        begin
            ok = start_server(),
            L1 = dothis(Arg1),
        end
    end
    catch
        exit: Pattern ->
        ...
    after
        ...
    end.
Abnormal termination with throw:
..
    catch
        throw:X ->
            X
    end.
Use exceptions for “impossible” situations – thinking defensively goes awry. Everything has return value so the above actually returns the tuple {ok}. This takes a correct crash and returns something incorrect:
opcode(load) -> 1;
opcode(store) -> 2;
opcode(X) -> %%invalid: what I do now?
    io:format("bad opcode:~p~n",[X]).
a “correct solution” to the above:
opcode(load) -> 1;
opcode(store) -> 2;
opcode(X) -> exit({ebadOpCode, X}).
Solution two: do nothing (best):
opcode(load) -> 1;
opcode(store) -> 2.
Turn values into exceptions:
exitIfCannotOpenFile(File) ->
    case file:open(File, [read]) of
        {ok, Pid} ->
            Pid;
        {error, Why} ->
            exit(Why)
    end.
Improving and error:
readConfigFile ->
    case file:consult(File) of
        {ok, Stuff} -> Stuff;
        {error, enoent} -> exit({eBadConfigFile, File})
    end.

Three ways of generating exceptions:

  • exit – punish the programmer
  • throw
  • error

Alternative to defensive programming – one big catch at the top of the program. Crash early, crash often.

Main sequential libraries:

  • lists.erl – list manipulation. Extremely common, should be familar with these. There are lots of these. Joe says about 15 are in common use, then diminishing returns after that.
  • file.erl
  • ets.erl – Hash tables in memory.
  • dets.erl – Persistent hash tables on disk
  • dict.erl – fast key-value dictionary
  • io.erl and io_lib.erl
  • filename.erl
  • filelib.erl
  • file.erl

Above the BIF level you can actually read the source code of the lbiraries

  • find using code:which(lists)
  • mac location of libraries (for maports install)/opt/local/lib/erlang/lib/stdlib-1.15
  • have a look at /opt/local/lib/erlang/lib/stdlib-1.15/src

Three common patterns

Pattern 1 – Terminate after value is found
do(..., [H,T], ...)     -> ...;     % found pattern, terminate recursion return value
do(...,[_|T], ...)     -> do(...,T);     % pattern not found, pass in tail and recurse further
do(..., [], ...) -> ... .             % end of recursion return value
example of pattern 1:
member(H, [H|T]) -> true;
member(H, [_|T]) -> member(H,T);
member(H, []) -> false.
Pattern 2 – Build lists as the output of a recursion.
filter(Fun, [H|T]) ->     %grab a function and a list split into head and tail
    case Fun(H) of        %evaluate function on head
        true -> [H|filter(Fun,T)];    % if true, return append head to the filter called on tail
        false -> filter(Fun, T)        % if false, return apply filter to tail without appending
    end;
filter(Fun,[]) -> []. % if passed the empty set return the empty set
Pattern 3 – Build results in reverse order in an accumulator
partition(Fun, L) -> partition(Fun,L,[],[]). %delegate public function to another function

partition(Fun,[H|T],Yes,No) -> %take a list of a function, split list, Yes list, and No list
    case Fun(H) of               %apply function
        true     -> partition(Fun,T,[H | Yes], No); %if true recurse Fun, Tail, Add head to Yes, and No.
        false     -> partition(Fun,T,Yes, [H | No]) %if false recurse Fun, Tail, Yes, Add head to No.
    end;
partition(Fun,[],Yes,No)-> 
    {reverse(Yes),reverse{No}}. %if list is empty pass back the reversed list
Methods of reversing a list:
[H | Yes] %adding and needs reverse
Yes ++ [H] %right order but slower than reversing at the end

Remember, all of these can be done with pattern matching:

  • validation
  • deconstructing lists
  • case statements
  • list comprehensions

They all use this one form:
f(Pattern1) ->
    action1;
f(Pattern2) ->
    action2.