Data Structures in Elm

When I first started learning Elm, I had very little sense of what data structures were available to me, and which of these data structures I should actually use. Do I use an Array or a List? Should I use a Set if I’m also going to be mapping over it? What’s the difference between an Object in JavaScript and a Record in Elm? Why does the compiler tell me there’s a problem with Dict.Dict, when I’m definitely using a Set? The distinctions and best use cases of each structure become more clear with practice, but hopefully also through reading this blog post.

Records

What are Records and How Do We Make One

Records (semantically named) are frequently employed to record state or to model information. They employ key-value pairs, with (likely-familar) dot notation for accessing values.

To create a record, you can write out the desired key-value pairs: { name = "Wuthering Heights", author = "Bronte", readByMe = True }. (Note that notation here may be backwards from what you expect. Type signatures, in Elm, use a colon. Assignment uses an equals sign)

If we are planning to create more than one record with the same fields on it, we may want to create a type alias that will describe the shape of these records.


type alias LibraryEntry =
    { name : String
    , author : String
    , readByMe : Bool
    }

Creating a record with these fields looks like this:


wutheringHeights =
    { name = "Wuthering Heights"
    , author = "Bronte"
    , readByMe = True
    }

Now we can use LibraryEntry in type signatures, rather than writing out all three requisite fields. We can also use the LibraryEntry as a constructor. LibraryEntry takes three arguments, in order: name, author, and readByMe. It will hand us back a record of the form we desire.


wutheringHeights =
    LibraryEntry "Wuthering Heights" "Bronte" True


diaspora =
    LibraryEntry "Diaspora" "Egan" False

As you can imagine, this pattern comes in very handy when there is a great deal of data to model, or when there are many records that you’d like to create that should all share the same type.

Getting and Setting on Records

Now we have wutheringHeights and diaspora, representing books in our library. But wait! I actually read Diaspora by Egan two weeks ago. We need to update the readByMe field.


{-

Updating records follows this pattern:

newRecordName =
    { oldRecordName | fieldName = newFieldValue }

-}

newDiaspora =
    { diaspora | readByMe = True }

Note that diaspora is unchanged by this operation: diaspora.readByMe == False and newDiaspora.readByMe == True.

Dot notation is not the only way we can access a value on a record. For every field name on a record, we are given a corresponding accessor function named after the field. For example: .readByMe diaspora == False and .readByMe newDiaspora == True.

You will never have to write (\book -> book.name); you can use .name newDiaspora without defining anything additional.

Extensability and Polymorphism

The first iteration of the Elm record was based heavily on (‘Extensible records with scoped labels’ by Daan Leijen), and many features of the Elm record reflect Leijen’s work. The paper is well worth a read for those interested in modeling data. However, the paper’s titular descriptors–“extensible”, “with scoped labels”–no longer apply to Elm records.

Extensibility (which was paired with restrictability) of a record value meant that a record could have fields added to it and removed from it: e.g., { wutheringHeights - name } == { author = Bronte, readByMe = True }. In practice, developers did not find this feature useful, and extensibility of record values was removed.

The removal of the extensibility of records also meant the end to scoped labels (labels of the same name pointing to different values on the same record, e.g. `{ diaspora + name | name {- This type alias uses "a" to say that a LibraryEntry is any record with name, author, and readByMe fields. -} type alias LibraryEntry a = { a | name : String , author : String , readByMe : Bool }

{- This type alias says that it is a LibraryEntry with exactly one more property–isbn. -} type alias LibraryEntryWithISBN = LibraryEntry { isbn : String }

wutheringHeightsWithISBN : LibraryEntryWithISBN wutheringHeightsWithISBN = { name = “Wuthering Heights” , author = “Bronte” , readByMe = True , isbn = “1853260010” }

Records in Elm remain polymorphic, meaning they can be comprised of fields with all sorts of different types. While I’ve only used Strings and Bools in the library example above, records can hold any values. The following is reasonable, for instance:


type alias Widget entry =
    { text : entry -> Html
    , attributes : entry -> Html
    , getState : entry -> Bool
    }

Records are also transitive with respect to their fields. That is, defining:


oneBeforeTwo =
    { one = 1
    , two = 2
    }


twoBeforeOne =
    { two = 2
    , one = 1
    }

We will find that oneBeforeTwo and twoBeforeOne are equal, even though the fields are declared in different orders. If, though, in twoBeforeOne our field values were strings instead of numbers, we’d find that the compiler would not let us make the comparison at all: we’d be causing a type mismatch.

This is because Records in Elm are structurally typed: the use of records is dependent on the type of their contents. We can’t just compare Records with each other because they’re both Records–we can compare Records if all their fields have the same types.

When we create functions consuming Records, they can be as general or as specific as we please. I can write a getName function that takes any Record with a name field, or I can write a displayHeader function that requires name, author, publishDate, editDate, yourFirstBorn, and maybe a list of related titles too, just because.


getSentenceForLibraryEntry : LibraryEntry -> String
getSentenceForLibraryEntry {name, author} = -- note that Elm gives us clean desctructuring
    author ++ " wrote " ++ name ++ "."


getSentenceForLibraryEntry : LibraryEntry -> String
getSentenceForLibraryEntry libraryEntry =
    libraryEntry.author ++ " wrote " ++ libraryEntry.name ++ "."

Tuples

The Basics of Tuples

Tuples are similar to records, except that rather than having named fields, the meaning of the fields comes from their order. Plus, Tuples are comparable (which, for our purposes, is most relevant when talking about Sets and Dicts, so we’re going to leave it for now).

Let’s talk middle school math for a hot second. Suppose I want to graph some points on to the XY plane. I could create a point without Tuples:


type alias Point =
    { x : Float
    , y : Float
    }

aPoint =
    Point 1 0

So that’s fine. But we have a well-known pattern for expressing XY coordinates. Do we really want to have to say Point or type out { x = 1, y = 0 } every time we want to make a point? With tuples, we don’t need to.

Tuples are denoted with () and contain comma-separated values. So writing the coordinate (1,0) in an Elm Tuple just looks like (1,0).

To access the values of a Tuple, we are given a couple of helpers. fst (1,0) gives us the first value, 1, and snd (1,0) gives us the second value, 0. So, using aPoint from above, I might have x = fst aPoint and y = snd aPoint.

There’s more than one way to create a tuple.


{- We can write it out explicitly. -}
yeOldeBasicCoord =
    (1, 0, 2)


{- We can also use a prefix operator, `,`, to construct the tuple. -}
yeFahncyCoord =
    (,,) 1 0 2

Destructuring Tuples

Wait–but now there are three values in the tuple… So far we’ve only seen fst and snd for accessing the values in a tuple. Elm’s destructuring is quite powerful, and we can use it to access any value in a tuple. We can write our own function for accessing the third value in a tuple, if we want:


{- "_" here is used to indicate that we don't use the first two values from the tuple. -}
z : (Float, Float, Float) -> Float
z (_, _, thirdVal) =
    thirdVal

We can write out a function that uses more than one value from a tuple:


{- Note the type signature here! Tuples' values don't all need to be of the same type. -}
formatNameAndAge : (String, String, Int) -> String
formatNameAndAge (firstName, lastName, age) =
    firstName ++ " " ++ lastName ++ " (" ++ (toString age) ++ ")"

So suppose we wanted to refer to author Neil Gaiman in this text: we might say formatNameAndAge ("Neil", "Gaiman", 55), or “Neil Gaiman (55)”.

Lists

Square Brackets and What’s Inside Them

In Elm, [] signifies an empty list. Lists are just linked lists (ordered collections where each node of the collection points to the subsequent node, if there is one), where all the values must be same type. You get head and tail, just like you’d expect.

head gives you a Maybe with the value of the first element. tail gives you the rest of the List. Want to add an element? Cons :: is an infix operator that adds to the front of the List. You can concat, append, or intersperse. Or partition or unzip or pretty much whatever else you can think of to do. My point: yay lists!

List Operations in Practice with Elm Html

Lists are vital for something I do a lot of: writing HTML in Elm through the elm-html package. The elm-html DSL is very clean: the functions generating HTML nodes are named familiarly (e.g., div, span) and take two lists as arguments, which leads to very readable code.

When using the node-generating Html functions, the first argument is a list of attributes on that node (we also have functions like id and class for producing these attributes). The second argument is a list of Html–that is, a list of children.

So, something like this is reasonable:


renderHello : Html
renderHello =
    div
        [ class "hello" ]    -- We could also throw in a click listener here, or add our own custom attribute or property
        [ text "Hello world!"
        , span [] [ text "oh look I'm a child of the div" ]
        ]

Reasonable, but boring. Let’s go back to our library.


type alias LibraryEntry =
    { name : String
    , author : String
    , readByMe : Bool
    }


wutheringHeights =
    LibraryEntry "Wuthering Heights" "Bronte" True


diaspora =
    LibraryEntry "Diaspora" "Egan" False

We need more books. And do we really need to have wutheringHeights and diaspora floating around cluttering the namespace? Let’s make a list of our books instead of naming each individually.


myListOfBooks : List LibraryEntry
myListOfBooks =
    [ LibraryEntry "Wuthering Heights" "Bronte" True
    , LibraryEntry "Diaspora" "Egan" True
    , LibraryEntry "A Handmaid's Tale" "Atwood" True
    , LibraryEntry "Doors of Stone" "Rothfuss" False
    , LibraryEntry "The Taming of the Shrew" "Shakespeare" False
    , LibraryEntry "The Outsiders" "Hinton" True
    , LibraryEntry "A Scanner Darkly" "Dick" True
    ]

Let’s create a function for displaying a single book:


viewBook : LibraryEntry -> Html
viewBook {name, author, readByMe} =
    div
        [ classList [ ("is-unread", readByMe) ] ] {- classList takes List (String, Bool),
                                                     where each tuple represents a className and whether to apply it -}
        [ div [ class "book-name" ] [ text name ]
        , div [ class "book-author" ] [ text author ]
        ]

Great! Now how should we display our bookList?

Our bookList is just a list of LibraryEntry’s, and the second argument to the containing element needs to be a List of Html… We just need to map from data to Html directly.

My favorite style is to use the |> operator. This style makes it very easy to follow what data is important and most complex. This version of viewBookList emphasizes that the contents of the “book-collection-container” div is just a transformation of bookList.


viewBookList : List (LibraryEntry) -> Html
viewBookList bookList =
    bookList
        |> List.map viewBook
        |> div [ class "book-collection-container" ]

Here’s another version of the same function. This version mirrors the familiar shape of HTML more closely. Some disadvantages: if we decide later to add another mapping or operation on the bookList, it will be harder to add if using the style below (above, it’s just adding a line: List.filter .readByMe, for instance).


viewBookList : List (LibraryEntry) -> Html
viewBookList bookList =
    div
        [ class "book-collection-container" ]
        (List.map viewBook bookList)

Having our data stored in a List means that it is very, very easy to map from data -> view. And immutability means that we can do whatever we want to our Lists without worrying about causing unintended changes somewhere.

The ease with which Lists can be manipulated can be helpful with odd-shaped data. For instance, suppose I want to combine my bookList with someone else’s bookList.


viewTwoBookList : List (LibraryEntry) -> List (LibraryEntry) -> Html
viewTwoBookList bookList willsBookList =
    bookList
        |> List.append willsBookList
        |> List.map viewBook
        |> div [ class "book-collection-container" ]

I could also create my Lists of Html with viewBook, and then append the Lists.

Suppose I stole one of Will’s books. I don’t want to store it in myListOfBooks, because it’s not mine, but I do want this one book to show up in my Html. I can add this book I’ve stolen to the front of my bookList, wherever I’m calling viewBookList:


viewAllBooks =
    let
        theBookIStole =
            LibraryEntry "Axiomatic" "Egan" True

        completeBookList =
            theBookIStole :: myListOfBooks
            {- This infix operator (::) is pronounced "cons".
               That is what I did to steal his book. ;)
            -}
    in
        viewBookList completeBookList

One core function worth mentioning is indexedMap. Coming from JavaScript, this may feel like it’s going to be extremely important. In JS, I found that I often used indices as a key way to access my data (this is a joke. Key? Get it? Because Arrays are just fancy Objects?). Depending on what you’re doing, this function might be helpful to you too.

Maybe you want to zebra-stripe your book entries in a silly way (no nth-child(2n+1) selectors for you).


viewBook : Int -> LibraryEntry -> Html
viewBook index {name, author, readByMe} =
    div
        [ classList
            [ ("is-unread", readByMe)
            , ("is-even", ((index + 1) % 2 == 0))
            ]
        ]
        [ div [ class "book-name" ] [ text name ]
        , div [ class "book-author" ] [ text author ]
        ]

As you may notice from the modular arithmetic going on in this example, Lists in Elm are 0-indexed.

Other functions to be aware of are concatMap and filterMap. Both are semantically named. concatMap applies a function to each element in a List and concatenates the resultant Lists. filterMap applies a function to each element in a List, and keeps the successes.

As you start doing all kinds of fancy nonsense, take a look at elm-list-extra.

Arrays

Arrays are ordered collections, where each item in the collection must be of the same type. In Elm, [] doesn’t correspond to arrays ([] is an empty list, discussed above). To create an empty array, use Array.empty.

To create an Array of a given length with a specific default element, do: Array.repeat 9 "Tessa". This will give you an Array filled with 9 “Tessa”’s. Some might say this is too many. A third option is to use Array.initialize, whose first argument is the desired length of the array. Array.initialize’s second argument is a function expression operating on the index; the result of this operation is the value of the array at that index.


arrayOfTessas =
    Array.initialize 9 (\index -> "Tessa" ++ (toString index))

Then arrayOfTessas will be an Array that in JavaScript-land looks like: ["Tessa0","Tessa1","Tessa2","Tessa3","Tessa4","Tessa5","Tessa6","Tessa7","Tessa8"].

Okay, sweet. But obviously “Tessa3” is my favorite, and the one that I want. How do I get it?

In JavaScript, you get a value back for any index or key you ask for in an array, since an array is just an object. (In JavaScript, arr["Something I want"] tends to be undefined, but what can you do.)

In Elm, accessing a value at an index that doesn’t exist will fail explicitly–there’s no question of whether you got undefined because you were doing something marvelous with your array like [0,1,undefined,"Bwahaha that's going to confuse me later"]. Instead, in Elm, you will get back Nothing if there was nothing at that index (this is a Maybe).

Suppose I have an array comprised of my LibraryEntrys from before.


myBooks : Array LibraryEntry
myBooks =
    Array.fromList [ wutheringHeights, diaspora ]
    {- For convenience, we're using Array.fromList to convert this list of library entries into an array. -}

Suppose I try to get the value at the second index. This is going to fail, since arrays’ indexing starts at 0, and the length of the array is 2. But I try it anyway: myBooksSecondIndex = Array.get 2 myBooks. So what’s myBooksSecondIndex? It’s Nothing.

So maybe you’re thinking that there are situations where you might have Nothing in an Array. That’s fine too.


maybeMyBooks : Array (Maybe LibraryEntry) -- Note that the type signature indicates that the elements are Maybes
maybeMyBooks =
    Array.fromList [ Just wutheringHeights, Just diaspora, Nothing ]

Now, if we try to access the second index: myBooksSecondIndex = Array.get 2 maybeMyBooks we have myBooksSecondIndex == Just Nothing. This is unlikely to be a useful way of structuring things, and I’d encourage you to stahp, but it does illustrate the clarity that Elm brings to a familiar JavaScript problem: is that undefined because it’s supposed to be, or because I failed to set a value?

Speaking of setting values: we can push a value to the end of the array, append an Array to another to create a new array, or we can set a value at an index. Setting values is a bit different than in JS.

In JavaScript, I can do something like this:


var aRiddle = "What's the longest word in the English language?"

var myAnswerArray = []

// Hmmmmmmmmmmmm

myAnswerArray.push("S")
myAnswerArray[5279] = "S"

console.log(myAnswerArray.length + " feet") // => "5280 feet"

var myAnswer = "Smiles! There's a mile in the middle!"

In Elm, not so much.


myAnswerArray =
    Array.fromList ["S"]
        |> Array.set 5279 "S"

Array.set will not update the Array it’s passed if the given index is out of bounds. We tried to set “S” as a value at index 5279, but that’s way out of bounds, so this set will just not occur. Array.toList myAnswerArray is just ["S"]. No mile in the middle here!

Array Implementation

Arrays are implemented as relaxed-radix balanced trees. They’re an extension to vectors (as used in Clojure and Scala), but with optimized concatenation. Effectively, even though they’re immutable, they have constant time getting and setting. Obviously, this is magic (read: math). To read the math and learn the magic: Bagwell and Rompf’s “RRB-Trees: Efficient Immutable Vectors”

To read more about Elm’s implementation, read this blog announcement.

As cool as the math behind arrays is, and as familiar as they might seem, we tend not to use them. Currently, there are reliability issues with arrays and many Elm developers tend toward using lists instead. But fear not! Array is being rewritten, in pure Elm instead of Native. Be excited.

Sets

What are sets, and when should you use them?

Elm’s sets are a very valuable data structure, and one that’s worth knowing well. All the basic functionality of sets will be familiar from middle and high school math (also, college level, if you took Analysis or Real Analysis or Abstract Algebra or Logic or Philosophy). A set is a collection of unique items; in Elm, these items must be comparables (ints, floats, chars, strings, lists, or tuples) and of the same type.

So when might it make sense to use a set? Whenever you find yourself doing a lot of filtering to check for List entries’ inclusion in other Lists, you should think about using Sets instead. Is your collection of elements closed? Is your collection of elements a subset of another collection? Should the elements of your collection be unique? Here too you may want to use a Set instead of a List.

In deciding to use a set rather than another data structure, there’s one other important consideration: are the elements of your collection comparable? Sets can only contain comparable values. So our collection of sweet library books? Not going to work as a set.

Using a Set

Suppose that our LibraryEntrys had unique ids. We could store a string id in a set.


type alias LibraryEntry =
    { id : LibraryEntryId
    , name : String
    , author : String
    , readByMe : Bool
    }

type alias LibraryEntryId =
    String

With these ids, we can build a library basket (because library baskets are definitely a thing don’t worry about it there are totally baskets) and a collection of checked out books.


model =
    { yourLibraryBasket = Set.empty -- Set.empty constructs an empty set
    , checkedOutBooks = Set.empty
    }

Now, checking to see if a book (a LibraryEntry with, say, id "10") is checked out is simple: Set.member "10" model.checkedOutBooks returns a boolean describing whether or not “10” is an element in model.checkedOutBooks. Is the book already in our basket? Set.member "10" model.yourLibraryBasket will tell us.

Implementation

Sets are actually just a wrapper around Elm’s Dicts, which are implemented with a red black tree. Be aware that as a result of this implementation, == is not reliable (with Sets or Dicts). The order in which elements are added to a RB-tree impacts the form the tree takes, which makes equality a sticky concept.

Let’s walk through an example.


set1 = Set.fromList ([1,2,3,4])

set2 = Set.fromList ([1,2,3,4])

set3 = Set.fromList ([4,3,2,1])

We have three sets, each with the same values: 1, 2, 3, and 4. The only difference between the Sets is in the order in which these values are added. For these sets, does the root node have value 2 or 3? Who knows?


listVersionsAreOrdered =
    Set.toList set1 == Set.toList set2 && Set.toList set2 == Set.toList set3 -- True

setsCreatedInSameOrderAreEqual =
    set1 == set2 -- True

uhOhNotEqual =
    set2 == set3 -- False

Moral: don’t ask whether sets are equal, because what you’re asking is “are these two trees made out of the same values && were these trees balanced in the same way?” It’s worth noting that this warning about equality might not always apply. Current plans for Elm include changing the equality behavior for sets and dicts to not be dependent on the balancing of an RB-tree. For now, the documentation provides a warning.

Dicts

What is a Dictionary?

A dict maps unique keys to values. There’s a fair amount of overlap in the use cases for dictionaries and sets, but dictionaries allow for storing a bit more information. Since values are indexed by unique keys, we can try to Dict.get 'some comparable' Dict.empty, which may give us Nothing.

Our Library Has a Dictionary

What’s one book libraries should definitely have? That’s right, dictionaries.

Our library book loaning system from the set example might work better as a dict. With a dict, we can tie checked out ids to library entries directly.


type alias LibraryEntry =
    { id : LibraryEntryId
    , name : String
    , author : String
    , readByMe : Bool
    }


type alias LibraryEntryId =
    String


model =
    { yourLibraryBasket = Dict.empty -- Dict.empty constructs an empty dict
    , checkedOutBooks = Dict.empty
    }


henryV = LibraryEntry "0" "Henry V" "Shakespeare" True

Now, suppose we want to add Henry V to our library basket: Dict.insert henryV.id henryV model.yourLibraryBasket will give us back a dictionary with a key “0” pointing to a LibraryEntry describing the play Henry V. This can make it very easy to work with henryV.

Suppose we just want to display books that are in our basket. When we were using a set to hold the ids of our books, this would have required some filtering/checking every book id to see if it was in the set of basket ids or not. With a dict, we can do something similar, (Dict.member "0" model.yourLibraryBasket == True), but we can also just grab all of the values in our Dict:


type alias Model =
    { yourLibraryBasket : Dict LibraryEntryId LibraryEntry -- Note the type signature here
    , checkedOutBooks : Dict LibraryEntryId LibraryEntry
    }


model =
    -- Dict.singleton creates a dictionary with 1 key-value pair. henryV is this value.
    Model (Dict.singleton henryV.id henryV) Dict.empty


viewBook : LibraryEntry -> Html
viewBook {name, author, readByMe} =
    div
        [ classList [ ("is-unread", readByMe) ] ] {- classList takes List (String, Bool),
                                                     where each tuple represents a className and whether to apply it -}
        [ div [ class "book-name" ] [ text name ]
        , div [ class "book-author" ] [ text author ]
        ]


viewBookList : List (LibraryEntry) -> Html
viewBookList bookList =
    bookList
        |> List.map viewBook
        |> div [ class "book-collection-container" ]



viewLibraryBasket : Model -> Html
viewLibraryBasket model =
    let
       yourBasket =
            model.yourLibraryBasket
                |> Dict.values -- This gets us just the values in a list
                |> viewBookList
    in
        div
            [ class "basket" ]
            [ h3 [] [ text "Your Basket" ]
            , yourBasket
            ]

Now, how do we remove books from our basket? The semantic Dict.remove "0" model.yourLibraryBasket. Let’s write out a function that will either remove books from our basket or add them to our basket, on a case-by-case basis.


type Action
    = RemoveFromBasket LibraryEntryId
    | AddToBasket LibraryEntry


update : Action -> Model -> Model
update action model =
    case action of
        RemoveFromBasket libraryEntryId ->
            { model | yourLibraryBasket = Dict.remove libraryEntryId model.yourLibraryBasket }

        AddToBasket libraryEntry ->
            { model | yourLibraryBasket = Dict.insert libraryEntry.id libraryEntry model.yourLibraryBasket }

At this point, you might be seeing the Elm Architecture beginning to emerge. We’re not going to wire everything together here, since our focus is on the data structures rather than the architecture. As an exercise, try building an mini-application out of what we’ve begun here.


Tessa Kelly
@t_kelly9
Engineer at NoRedInk

10 notes

  1. noredinktech posted this