Wednesday, May 15, 2013

Groovy Tutorial Pt. 2

The most underrated question ever: Why?

I hope you've asked this question. "Why learn Groovy or any other programming language?" There are many reasons. Take it from Will I Am. In short, as Cars became ubiquitous throughout the world, it became a necessary life skill to learn how to drive. As computers rise and the availability and use of data, programming is, and will be, the new driver's license. At least, Wolfram thinks so.

Identify The Problem

All around us we repeat work by hand when we could be automating it. For instance, many times we have two sets of data that need to be linked. Consider a file with the following contents:
item name, relationship, other item name
Tree, Parent Of,Branch
Branch,,Leaf,Child Of,Branch
You might recognize this as a CSV file. These files are often a text result of excel spreadsheets or output from some other program. With this file we want to match up the relationships of each task. Let's start with the conceptual organization of this data. Obviously we want to treat each line as it's own set of information with three distinct parts. In Java we would use Classes to represent this data. In Groovy we can write a class like so:
// title, relationship, other
public class Thing {
    public String title
    public String rel_type
    public String o_title
}
This gives us a way to represent many Thing's. The public operator is saying "let anyone use this data I'm about to define". Then it defines the data type and data name. When you are at the mall, you might be a public Human jake. We first define a class (the formal container to hold the data) and then three String's. Each String can hold a string of letters or digits (characters). When then give each String a different name since we have three distinct pieces of textual data from our CSV file. Later I'll show you how to read the data from an actual text file, but for now let's just represent the data in a way that lets us manipulate it easily. The most basic way to create this data for use is as follows:
// Create a new instance of the Thing
Thing a_thing = new Thing()
// Assign values to the class members.
a_thing.title = 'Tree'
a_thing.rel_type = 'Parent Of'
a_thing.o_title = 'Branch'
In line two we are creating a new container, a new well-defined backpack. In lines 4-6 we use the dot operator "." to peak inside the Thing called a_thing and give each attribute its own value. To test if everything is working as expected print the object.
println a_thing
Ut oh.. what happened? What is that thing (pun intended)? That is called the "address of" our created Thing, which you really don't need to know anything about. Every time you run the program that will look different. 'Nough said. So how do we easily view the data we just created? Groovy actually provides a method called dump() that will look inside your object and try to give you a textual representation of it's contents.
println a_thing.dump()
// title=Tree rel_type=Parent Of o_title=Branch
Now that looks pretty sweet. You should get an output like what's in the comment in line 2 above. But wait- where does dump() come from and what else is out there? Well, to understand where dump() comes from we have to give a short intro to Object Oriented Programming. Remember in Grammar class, when you learned that a Noun was a person, place, or thing? This means that everything tangible has a sort of root attribute: it is a noun. All nouns have some kind of name, some kind of weight, and other things that describe them all. In contrast, not all nouns have eyeballs. Humans, which are nouns, have eyeballs. So this way of qualifying everything under a single root and then recursively grouping, like the Latin names of living creatures, is core to OOP. In Groovy, that root "noun" is called an Object. All objects have a set of core functions like dump(), each(), and toString(). So that's where babies- I mean dump()'s come from: logical groupings of data.
Checkpoint: Add a int  member to the Thing class and dump the contents.

Abstract & Simplify

Now, in Groovy we have a few ways to simplify this classic Java approach. Instead of listing the parameters one by one, we can just call out each one in the constructor of the object like so:
Thing my_thing = new Thing(title: 'Tree', o_title: 'Branch')
println my_thing.dump()
You'll notice this time that we chose not to set a relationship. In the output you see a null value as you should expect. So we've just minimized the number of lines of code and it's just as readable.
Now, we've talked about Lists, but if we stare at our Thing class we see that it resembles a phone book (where names map to numbers). Each attribute name maps to a String value. So we can get a further abstract view of the data and represent the object creation with a Map:
def my_thing = [ title:'Tree', o_title: 'Branch' ] as Thing
println my_thing.dump()
Notice the syntax of the map. Where a List was [], a Map is [:]. One the left side of the colon is your key on the right side is your value. Consdier a phonebook: left side is a name of a person and the right is his phone number (or vise versa). The structure of the mapping is up to you and your needs. We don't need the new Java keyword because Groovy by default creates a new List or Map when you use this syntax. You also see that we can replace Thing with the keyword def and everything works peachy. It works because we've thrown in the clause 'as Thing'. as is a keyword in Groovy that lets you duck-type or cast an object from one type to another.
Checkpoint: Remove the as Thing clause and see how the output changes. Hint: remove the .dump() call to get cleaner output. Next, add a rel_type of "Parent Of" to the map. Finally, add the as Thing statement back and compare outputs.
Let's talk about this Map. To access the members of a Class we use the dot operator in Groovy right? Well, to access the keys of a Map we can use it in the very same way as seen below in line 2. In fact, we can pull this information out of our map object in a bunch of different ways.
def my_thing = [ title:'Tree', o_title: 'Branch' ]
println "The title: " + my_thing.title
println "The title: " + my_thing.'title'
println "The title: " + my_thing['title']
def title = 'title'
println "The title: " + my_thing[title]
But wait a minute, if we can describe our data in the form of map, and we don't loose any data or any ability to manipulate the data, why do we have this whole public class Thing blah blah blah!?
Excellent question, and the answer is... we don't! I showed you the concept of Classes because Java was built on an Object Oriented paradigm and the sooner you begin to think of your data as just objects with relationships to each other, the better programmer you'll be. Finally note that in many cases of complex systems it's good to have well defined types, but for simple scripting like what we're doing Lists and Maps can get almost any job done.
Checkpoint: Represent all the data in our text file as three maps.

Answer:
def tree = [id: 1, title:'Tree', o_title: 'Branch', rel_type: 'Parent Of']
def branch = [id: 2, title:'Branch', o_title: null, rel_type: null]
def leaf = [id: 3, title:'Leaf', o_title: 'Branch', rel_type: 'Child Of']