Seven Languages In Seven Weeks: Erlang - Day 1
I just started the Erlang chapter of my Seven Languages in Seven Weeks book by Bruce Tate. Erlang is a functional language; that is, it relies heavily on functions and the concept of immutable state. In an object oriented language, we typically have objects that change state over time. In a functional language, we have data types that remain static. And, when we try to alter the state of these immutable data types, we actually end up creating new values altogether. For example, adding an item to a tuple creates a new tuple, leaving the original tuple unchanged (thinks Lists in ColdFusion).
Apparently, by creating these immutable states, it allows the language to be perfect for concurrency; that is, it allows the language to excel at distributed, parallel processing by making it unlikely that concurrent threads will be competing for shared resources. As you saw in Scala - Day 3, however, concurrency is something that currently confuses me; as such, I don't want to get any deeper than that in my explanation.
I found the syntax of Erlang to be a bit confusing. It borrows heavily from Prolog in its form and seems to take after both Prolog and Scala in the way that it does pattern matching. But as far as low-level syntax, I kept getting tripped up by commas and semi-colons. Some lines ended in a comma and some in a semi-colon; however, the requirement of semi-colons was dependent on the parent clause and the statements that might come after the parent clause.
HW1: Write a function that uses recursion to return the number of words in a string.
As someone who would traditionally use listLen() or reMatch() to get the number of words in a given string, it wasn't readily apparent to me how recursion could even be used to solve such a problem. But, then I remembered an example that I saw in Prolog in which an array was used to accumulate data over the course of iteration. I thought that I might use the same approach here - I could use an array to build up the collection of words as I recursed over the "tail" of the given text. Then, the length of the resultant array would define the number of words in the given string.
%-- Write a function that uses recursion to return the number
%-- of words in a string.
-module( hw1 ).
-export( [get_word_count/1] ).
-export( [collect_words/3] ).
%-- -------------------------------------------------------- --%
%-- -------------------------------------------------------- --%
%-- I get the number of words in the string (as delimited by spaces).
get_word_count( Text ) ->
%-- Gather the words and then return the length of the resultant
%-- collection.
length(
collect_words( Text, [], [] )
)
.
%-- -------------------------------------------------------- --%
%-- -------------------------------------------------------- --%
%-- I gather the words in the given Text into the Words collection.
%-- I use the Word list to build intermediary words.
collect_words( Text, Words, Word ) ->
%-- Check to see if we have any more text to recursively
%-- iterate over.
if (length( Text ) == 0) ->
%-- No more text to examine; check to see if our current
%-- word has a length. If so, then we can add it to the
%-- aggregate set. If not, just return the previously
%-- built collection.
if (length( Word ) /= 0) ->
%-- Add the current word to the collection.
lists:append( Words, [Word] );
true ->
%-- Just return the current collection.
Words
end;
%-- We do have more text to examine (and therefore more words
%-- to collection). --- ELSE
true ->
%-- Pop off the first character of the given text.
[FirstCharacter | RestOfText] = Text,
%-- Check to see if the first character is a space. If
%-- so, then we want to add the aggregated word to the
%-- the current word collection and continue examining
%-- the rest of the text.
%--
%-- NOTE: When we pop off the first character, we actually
%-- get an integer, NOT a string (there are no *real* strings
%-- in Erlang. As such, we have to compare the FirstCharacter
%-- to the ASCII value 32, or as short hand $C.
if (FirstCharacter == ($ )) ->
%-- Check to make sure our word has any length. If not,
%-- then we are dealing with multiple spaces in a row
%-- and shouldn't bother adding the words.
if (length( Word ) == 0) ->
%-- Just pick up at the next character.
collect_words(
RestOfText,
Words,
Word
);
%-- We have an intermediary word of non-zero. length
true ->
%-- Append the current word and move on to the next
%-- character to conitnue iteration.
collect_words(
RestOfText,
lists:append( Words, [Word] ),
[]
)
end;
%-- The first character is not a space - keep building
%-- up the current word. --- ELSE
true ->
collect_words(
RestOfText,
Words,
lists:append( Word, [FirstCharacter] )
)
end
end
.
The first part of this code defines the module. Most of the code that you write in Erlang centers around modules of code that contain name-spaced functions. In my solution, the module is called "hw1" and it exports two functions - get_word_count/1 and collect_words/3. get_word_count() is the function that determines the number of words in the given text; but, it doesn't really do any of the heavy lifting. Rather, it hands the processing off to collect_words(), which gathers all the space-delimited words into a list and returns it. The get_word_count() function then returns the length of that resultant array.
As you can see, collect_words() takes a list as its 2nd and 3rd parameters. The first list is the current collection of known words; the second list is the current word being parsed. The function iterates over each character in the text recursively, building up a word. When it hits a space character, it adds the current word to the list of known words and then resets the intermediary word character list (that 3rd parameter).
When we compile the above code and then query it, we get the following output:
1> c( hw1 ).
{ok,hw1}
2> get_word_count( "Hey Katie, you look really beautiful today." ).
** exception error: undefined shell command get_word_count/1
3> hw1:get_word_count( "Hey Katie, you look really beautiful today." ).
7
4>
The c(FILE) command compiled my Erlang code and loaded the module. I believe the name of the module needs to match the actual file name("hw1" :: "hw1.erl"), but I'm not sure on that. After I loaded the module, you can see that I tried to invoke the exported method, get_word_count(). My first attempt errored-out because I forgot to name-space the invocation. My second attempt, the one that worked, started with my module name-space, "hw1:". And, as you can see, it reported back 7 words.
HW2: Write a function that uses recursion to count to ten.
%-- Write a function that uses recursion to cound to ten.
-module( hw2 ).
-export( [count_to_ten/0] ).
-export( [count_to_ten/1] ).
%-- -------------------------------------------------------- --%
%-- -------------------------------------------------------- --%
%-- Our base version of the function allows for invocation without
%-- any arguments.
count_to_ten() -> count_to_ten( 1 ).
%-- -------------------------------------------------------- --%
%-- -------------------------------------------------------- --%
%-- Our recursive version of the function allows for invocation with
%-- a given starting number.
count_to_ten( N ) ->
%-- Check to see if we have reached our target value.
if (N == 10) ->
%-- Output the final number.
io:format( "10~n" ),
%-- Return true for successful completion.
true;
%-- Check to see if we are less than 10 - if so, we'll stil
%-- have some recursion to do.
(N < 10) ->
%-- Output the current number.
io:format( integer_to_list( N ) ++ "~n" ),
%-- Count the next value and return the result.
count_to_ten( N + 1 );
%-- If neither of the above two guard statements executed, then
%-- the user is trying to count above 10. Return false to
%-- indicate that no successful action has taken place.
true ->
false
end
.
This solution was cool because it allowed a multi-signature rule-base like we could do in Prolog. Here, I am defining one version of count_to_ten() that doesn't require any input; this will assume that you are starting from 1. I then define another version of count_to_ten(N) that takes a starting value (and counts up to 10).
This problem threw me through a loop a bit because I kept getting errors like this:
9> hw2:count_to_ten( 15 ).
** exception error: no true branch found when evaluating an if expression
in function hw2:count_to_ten/1
As you can see in my code, I am using an IF/ELSE statement to determine the recursive nature of the counting. The problem is, if you start with a value greater than 10, there is no counting left to do. Unfortunately, it appears that an IF statement must have a valid branch at all times - you can't simply skip over it when it doesn't apply. As such, I needed to use a "true" statement at the end to make sure that all routes through the IF statement would execute successfully. This "default" branch, however, simply returns "false" to indicate that no counting took place.
NOTE: Notice how all but the last IF/ELSE bodies need to end in a semi-colon. Try to put one after that final "false" and you'll get a syntax error.
Anyway, when we compile the above code and query it, we get the following console output:
12> c( hw2 ).
{ok,hw2}
13> hw2:count_to_ten().
1
2
3
4
5
6
7
8
9
10
true
14> hw2:count_to_ten( 8 ).
8
9
10
true
15> hw2:count_to_ten( 15 ).
false
As you can see, both rules (with and without a parameter) were able to recursively count to 10.
HW3: Write a function that uses matching to selectively print, "success" or "error: message," given input of the form {error, Message} or success.
If you're like me, whenever you see the term "matching," you think "Regular Expression." And, while regular expressions are beyond sexy, that's not what matching means in the context of languages like Prolog, Scala, and Erlang. In this context, matching deals with data types. As you'll see in the following code, the incoming parameter is matched against two patterns: an atom and a tuple. The match then determines which branch the engine executes:
%-- Write a function that uses matching to selectively print
%-- "success" or "error: message" given input of the form
%-- {error, Message} or success.
-module( hw3 ).
-export( [print_message/1] ).
%-- -------------------------------------------------------- --%
%-- -------------------------------------------------------- --%
%-- I print a success or error message.
print_message( Value ) ->
%-- Check to see see what kind ova value we are working with.
%-- We can use the CASE statement to do pattern matching on
%-- the variable.
case Value of
%-- The value is the atom - success.
success ->
io:format( "success~n" );
%-- The value is an error tuple. When this branch is
%-- executed, the Message variable is unified with the
%-- second element within the incoming tuple.
{error, Message} ->
io:format( "error: " ++ Message ++ "~n" )
end
.
As you can see, the incoming parameter, Value, is being matched against two case statements: success and {error, Message}. The first case is the atom, success. The second case statement is a tuple composed of the atom, error, and the variable, Message. Just as with Prolog and Scala, when the incoming parameter is found to be a tuple, it is then "unified" with the case statement. During that unification, the error atoms match up and the second value of the incoming tuple gets copied into the local Message variable.
NOTE: An atom is a literal - a constant with name.
When we compile the above code and query it, we get the following console output:
18> c( hw3 ).
{ok,hw3}
19> hw3:print_message( success ).
success
ok
20> hw3:print_message( {error, "Uh-oh, now you've done it!"} ).
error: Uh-oh, now you've done it!
ok
21> hw3:print_message( noMatchForMe ).
** exception error: no case clause matching noMatchForMe
in function hw3:print_message/1
As you can see, the code works great for our success atom and our error tuple; however, it raises an exception when the internal switch statement cannot match the incoming value (in this case, the noMatchForMe atom). If we wanted to make sure that the method never raised an exception, we could add a wild-card case statement to match any incoming value:
%-- Write a function that uses matching to selectively print
%-- "success" or "error: message" given input of the form
%-- {error, Message} or success.
-module( hw3 ).
-export( [print_message/1] ).
%-- -------------------------------------------------------- --%
%-- -------------------------------------------------------- --%
%-- I print a success or error message.
print_message( Value ) ->
%-- Check to see see what kind ova value we are working with.
%-- We can use the CASE statement to do pattern matching on
%-- the variable.
case Value of
%-- The value is the atom - success.
success ->
io:format( "success~n" );
%-- The value is an error tuple. When this branch is
%-- executed, the Message variable is unified with the
%-- second element within the incoming tuple.
{error, Message} ->
io:format( "error: " ++ Message ++ "~n" );
%-- This is our wild-card match.
_ ->
%-- Since our method wasn't really meant to handle this
%-- form of input, just return false.
false
end
.
In Erlang, the single underscore, "_", is a wild-card. It is used when you don't necessarily care what the value is (ie. that you don't need it to be unified with any statement). I'm not suggesting that this approach makes any sense in this particular situation - I just wanted to demonstrate it.
I really didn't like Prolog at all; Erlang, on the other hand, seems to have some of the powerful features of Prolog without dogmatic use of unification. That is, I can still leverage all the goodness of pattern matching; but, I can execute more traditional IF/ELSE and CASE statements to make the algorithms much more readable. I'm also starting to like the concept of immutable state. It feels weird to constantly make copies of data values; but, at the same time, there is something comforting about the required thought-process.
Want to use code from this post? Check out the license.
Reader Comments
Sounds like a major memory leak to me. Did you run across any text that addressed that concern?
Well written article, if only all dudes offered the same shtuff as you, the series of tubes would be a much better place. Please keep it up! 80's Sitcom.
@Randall,
It's definitely not a memory leak - it's a "feature" of the language. I believe (but don't quote me) that it's done to make sure that you can't have two parallel threads working on the same piece of data.
Think about ColdFusion where you have a struct that is passed by reference. You could theoretically have two threads mutating the struct at the same time. As such, you could have a race condition where something like a structCount() provides an inaccurate number in one of the threads.
If everything is passed by value, then you don't have to worry about any processes having "side effects" on your data.
But, like I said, concurrent processing is still something I've just barely started to scratch the surface on.
I gathered that; maybe you misunderstood my concern.
If you've got a "Google-sized" application and you've got millions/billions/googolplex of users all making copies of data structures, seems to me you'd eat up your RAM.
I see how it's good in certain situations. I'm just concerned how it would play out in a large-scope arena, and wondered if you (or anybody reading this) would also be concerned. At what point does the language call a time out and collect garbage versus having 5,000 copies of a variable? Or am I being paranoid for no good reason?
@Randall,
I definitely can't speak to that effect. What I can say, though, is that Erlang does power CouchDB (whatever that means exactly). And, I believe that CouchDB is being used in very large, many-gig apps; so, I think it can work, as long as the situation is appropriate. But, really, I know very little about this stuff.
@Randall, the simple answer is: you are "being paranoid for no good reason". That Erlang creates new copies of transformed data (along with Scala on immutable data and Clojure) is just a feature of functional languages and it's handled very efficiently under the hood, both in terms of performance and in terms of memory usage - and garbage collection. It's no different from most Java programs which create lots of small interim objects and immediately throw them away to get their job done.
You'll only get a memory leak if you hold onto the old version of the data as well as the version - in other words if you specifically create a data structure that grows over time. That's no different to any other language where creating such a data structure eventually causes you to run out of memory.
Erlang, in particular, was created for very high performance telephony systems, often running on fairly limited hardware I believe.
Hope that helps?
FWIW, I think functional programming is finally going to be the next big thing since it lends itself so well to concurrency and parallelism - and with immutable data, code is easier to test (and develop correctness proofs for). I started doing functional programming in the early 80's and I've watched it with interest, sitting quietly in the background, waiting for the shift to multi-core, multi-processor, multi-server, message-passing architectures we are starting to see in the mainstream :)
I've bought the book and I am just taking the challenge. Ben has helped me to solve a very difficult problem I had with the cfhttp.
Jon, From Colombia. Many thanks
@Sean,
Thanks for the insight. Functional programming does seem to be cool; but, I think it requires a good amount of syntactic sugar in order to be convenient (which these languages seem to have).
@Jon,
Outstanding :) I hope you enjoy the book. I have been loving it.
About the syntax: the easiest way, and also the correct way, is not to view ',' and ';' as terminators (as in C/Java) but as separators. The ';' can only occur between clauses (function, case, if, and receive) and means that either the clause before the ';' is chosen or a clause occurring after it. The ',' means first evaluate the expression before the ', and then the expressions following it.
With these most blocks are implicit and there is very rarely need for writing explicit blocks (begin ... end). It helps keep the code shorter. http://ferd.ca/on-erlang-s-syntax.html is a good expo on the syntax written by the author of http://learnyousomeerlang.com/, which is well worth reading.
Thanks, your solutions helped me - as always - a lot to do my homework.
FYI: I think I've found a much easier (IMHO) solution of the count to ten:
A more quick solution to "count to ten" problem.
The function requires, as input, a number from which to start counting.
I hope this is useful to other readers.
A much simpler approach to the word count problem:
It doesn't handle corner-case strings like "", but at least it's consistent with the material covered in the first Erlang day -- it doesn't rely on conditionals and what-not.
@Ben
Thanks for writing such great articles. They helped me a lot with the homework.
@Brent
Your solution is consistent with the words function of the string module in the standard library:
A solution for the count to ten task which uses a guard:
See also http://www.erlang.org/course/sequential_programming.html#guardedfuncs
Selective matching without using case can be achieved in the following way:
I hope this approaches are helpful for some readers.
@Tom,
Some comments on your comment:
- io:format takes a format string that works in a similar fashion to C's printf so you could do it as
io:format("~w~n", [N])
- You should really only return booleans, true/false, when you actually mean a boolean value. A common value to return to show that the function has succeeded is 'ok'. Especially when the function is called for its side-effects.
- We generally avoid adding catch-all clauses as you do in print_msg/1 but prefer to let the function generate an error when it gets an unknown/illegal argument.
Hi,
I've been an erlang lover for a while, and since a year i work on an application written in CFML. It's cool to see this article because it's kind of the opposite.
But I don't agree with "the need for syntaxic sugar" and that constructs like if/else make the algorithms more readable.
(Here there's no need to handle everything else that could come as a parameter, in Erlang as in other FP languages like Haskell)
Erlang has very little syntactic sugar which I think is a Good Thing. The syntax is very consistent, especially in its use of ',' and ';' which seem to cause a lot of problems. 'if' does have its uses and could not comfortably be replaced with case or functions, although because of its definition it is not used that often.
And as in @niahoo example we tend not to do defensive programming and let things crash when there is an error. It is the erlang way!