I'll be attending ParaPLoP 2009 over the next few days. It's a patterns workshop along the traditions of PLoP but focusing exclusively on patterns for parallel programming.
I am the author and co-author of the following three patterns paper that you can find on the program page:
- Barrier Synchronization
- Patterns for Collective Communication
- Patterns for Topology Aware Mapping
Most of the patterns submitted are part of the Our Pattern Language (OPL): A Design Pattern Language for Engineering (Parallel) Software catalog proposed by Kurt Keutzer (EECS UC Berkeley) and Tim Mattson (Intel). OPL is based on Tim's earlier book Patterns for Parallel Programming.
While we have already identified quite a few patterns, there's definitely still a lot of work that needs to be done to produce a sizable pattern language that covers most of the patterns that most programmers will encounter in their programming career. However, I think that the layered approach that the Berkeley folks are advocating has great potential. The layered approach allows programmers to focus on different patterns depending on their skill sets and contributions to their project. At the top layer, application programmers focus on understanding the high level patterns and take advantage of the parallelism by using libraries and frameworks. And at the bottom layer, platform programmers focus more on understanding the low level patterns and develop libraries and frameworks to be used by other programmers.
Writing good software is hard. And writing good parallel programs is even harder. Patterns help make the task easier by showing the best practice principles that novice parallel programmers can learn from. And it's great that both UIUC and UC Berkeley are working together to catalog such patterns.
Ralph Johnson is leading the patterns project as part of the UPCRC at UIUC. And one of his projects is to mine for patterns by examining how different algorithms are expressed in different languages and frameworks, in particular those developed at UIUC (e.g. Deterministic Parallel Java, Charm++, Actor Foundry). Kent Beck, Samira Tasharofi, Amin Shali and I will be contributing to this project and our preliminary results can be found here.
Room 101: Newspeak Prototype Escapes into the Wild:
"The Newspeak prototype is now available at http://newspeaklanguage.org/downloads/ . "

Yes, the long awaited Newspeak is finally released! It's far from being complete but at least something that runs is out. I was in a class about Actor Programming Languages and Systems last semester and we were supposed to talk about Newspeak (and possibly do a group project on it if people were interested). Unfortunately, at that time, only the language specifications were released. And it's not as fun just reading about the specs and proving things about it.
The other good news? It runs on an Intel Mac. I was initially under the impression that it would only run on Windows.
Other initial impression: the UI is much improved compared to other Squeak implementations (which it is based on). Like the Pharo project, Newspeak is probably aiming at creating an professional, open-source Smalltalk platform. Unlike Pharo, it still has remnants of eToys though.
Looks like I might be canceling some initial reading plans to play around with this over the weekend.
This article is a summary of these three papers about Statistical Debugging: Holmes: Effective Statistical Debugging via Efficient Path Profiling (technical report from Microsoft to appear in ICSE 2009), Scalable Statisical Bug Isolation and Bug Isolation via Remote Program Sampling.
Statistics is often used (and misused) as an effective way to finagle people into believing something that isn't really significant. There are many such cases (and you see them a lot on TV too). The one that I remember off the top of my head is a poster by Hi51 where they had a nice bar graph which showed that they were the "fastest growing social network in 2007" compared to other sites such as MySpace, Facebook, etc. You have to read their claim (in quotes) properly to actually see what they are trying to claim and why it isn't really significant. Hint: they needed to use the word "growth". So, if hi5 had 100 members in 2007 and grew to 100 000 members in 2008, then its growth is 1000-fold (pretty impressive). But if Facebook had 100 000 members in 2007 and grew to 1 000 000 in 2008, then its growth is only 10-fold. So hi5 wins in terms of "growth" but in terms of users, that is pretty insignificant (how much more can Facebook grow once almost everyone you know is already on it?) So, what exactly is hi5 trying to sell to the people who are reading the poster? Trying to hire undergraduate students to work for them? Trying to show that they are going to overtake Facebook soon?

As the example shows, you can use statistics for your own personal gain. And still keep your conscience clean because you are not really lying at all. And it is up to the reader himself to sieve through those numbers and decide whether they even mean anything significant. I believe that inundating the reader with such insignificant numbers is a prime example of using statistics for evil.
So it is refreshing to read a paper where statistics is used as part of the fundamentals of the technique itself and presented up-front in a convincing and compelling manner; Statistical Debugging (SD) is the use of statistics for good where numbers can actually help programmers locate pernicious bugs that manifest themselves infrequently and can be hard to track down.
SD relies on the fundamental idea of sampling a program. There are many tools out there that can sample programs but they are mostly used in profiling the hotspots of a program; in other words, the parts that are frequently encountered. However, such sampling is not useful for debugging since pernicious bugs are often not in the hotspots of a program. Thus, for SD to work, you need a different kind of sampling – one where it would be possible to sample the infrequently encountered parts of the program itself. This sounds simple but it can be prohibitively expensive if you are trying to observe all those little facts.
Solving the issue of collecting useful debugging information is the focus of their first paper, Bug Isolation via Remote Program Sampling where the authors present a novel way of doing the sampling so that they can collect interesting properties of the system. They identify a set of predicates (statements that can be true or false for a particular execution of a program) that is useful for C/C++ programs2. And for each run of the program, they would sample each predicate by modeling a Bernoulli process (like flipping a coin whenever each predicate is encountered to see if its needs to be sample). Basically, this means that for each execution of a program, the predicate has a fair chance of being sampled. Contrast this to the typical way of doing a profile i.e. using a periodic timer. If the execution of a program is short, some of the important predicates might be missed. So using a Bernoulli process guarantees a fair sample for each predicate.
Instrumenting a program so that we can collect information about each of its predicate can be very expensive because they can easily be a million predicates3 for a typical program. And that is where the authors introduce a novel method for collecting the information which relies on the Law of Large Numbers. It is infeasible to collect all the results of all those predicates from just one person on one machine. Instead it would be better to collect the information from as many people as possible. And each version of the program that runs on a different machine is designed to monitor a subset of all the predicates. Given enough machines, it is possible to merge all the results and get a useful matrix.
A sample matrix might look like the following:
| Executions | Predicate #1 | Predicate #2 | Predicate #3 | Predicate #4 |
|---|---|---|---|---|
| Pass | True | False | True | Not observed |
| Fail | False | False | True | Not observed |
| Fail | True | True | Not observed | Not observed |
| Pass | True | True | True | False |
| .. | .. | .. | .. | .. |
For each execution, we check the value of each predicate: true, false or not observed. And from that matrix (which should be much larger) we try to find a correlation between program failure and a particular predicate. The paper Scalable Statistical Bug Isolation discusses how such a correlation is found and how noise is reduced. This technique is usable even if they are multiple bugs in the same program; SD is able to generate a correlation for each different bug. And it's also usable when you have non-deterministic bugs. SD works best when you have a large set of data for each execution of the program.
It's important to note that SD does not tell you that a particular predicate is the root cause of a bug. All it tells you is that there is a high probability (80% or higher) that this predicate is the root cause of a particular bug. And the programmer can then start focusing his efforts on figuring out why that particular predicate is a point of failure. Sometimes the bug itself does not lie in that particular predicate, but it is usually within close proximity so examining the predicate itself is a good start.
This is useful enough because sometimes it is almost impossible to detect such bugs otherwise since they manifest themselves infrequently. And because the overhead of collecting such information is low (remember that each instrumented program is specialized to only monitor a subset of the predicates) it is possible to run the collection on a user machine where SD can capture the execution of the program under the actual environment of a real user (vs. a controlled environment on the developer's machine).
The latest paper in SD, Holmes: Effective Statistical Debugging via Efficient Path Profiling evaluates using a different set of predicates (path profiles) that might help programmers identify the bug more easily. The previous method of using predicate will report the location (file and line number) of the predicate that is believed to be the root cause. While the predicate is usually within close proximity to the actual bug, the programmer still has a lot of code to sieve through. Path profiles provide more information about the execution of the program and contains the actual execution path that the program went through. This is useful for identifying bugs that have complex control flows through the program.
In conclusion, SD is a novel idea in helping programmers find subtle bugs that might be hard to identify otherwise. Good unit tests can usually help eliminate a lot of bugs but it is hard write tests for everything (and the cost of maintaining such detailed tests is very expensive). SD is not perfect nor does it claim to be. Instead it is another useful tool in a programmer's arsenal for hunting down bugs when trying to manually locate them has proven to be unfruitful.
The papers also impress me because of their honesty in doing their own evaluations and experimentations; they explicitly state that their experiments are done under controlled environments. And they actually have an on-going project where they are evaluating SD in real applications.
Prediction is very difficult, especially about the future.

During New Year's week I was watching tv and happened to switch over to the History Channel. It was showing something about the Bible code. More specifically, the title of the show was The Bible Code: Predicting Armageddon. Don't ask me why the History Channel decided to show something apocalyptic during New Year's week. I guess most people are happy to associate impending doom with the New Year instead of blooming hope.
A slight tangent before I get to the gist of this post....
For those unfamiliar with the Bible Code, it's a book that postulates that the Hebrew bible contains hidden messages in the form of a code – the Bible code – that hold predictions about the future. The exact details of the postulated cipher can be found on its wikipedia page.
Anyway, by using the Bible code, the authors claim to be able to find records of all the major historical events that have transpired including the two World Wars, the Holocaust, the assassination of prominent figures, etc. They concluded that there was "strong statistical evidence" that such encodings could not just be random.
Interesting. So the Bible code actually encodes all the events that have happened. Then could it be deciphered so we could use it to predict events that have yet to transpire? Sure. But there's a catch. We won't actually know how to look for those predictions. It's easy to look for things that have happened because we have clues and keywords to look for in the code. But for predicting the future, we have no idea on what to actually look for. Catch-22.
And that, to me, is a prime example of confirmation bias. The wikipedia article illustrates this easily with the 2-4-6 problem. We only look for what we seek to discover in the first place. And we conveniently ignore what we don't want to discover (or don't really know about yet). We conduct experiments and case studies but all too often we interpret the results to suit what we want to verify.
All right, back to the gist of this post. I wrote this post with a focus on TDD: Test-driven Development. TDD is one of the more controversial practices in agile software development today. And it is also one of the most misunderstood practices.
In Aim, Fire, Kent Beck says:
"Test-first coding isn't testing."
It's more about design. Writing tests first forces the developer to think about the design of the different units. Each unit should be designed so that it can be unit tested easily (and preferably in isolation) from other units.
I'll be honest and say that the first time I heard about TDD, I didn't grasp this fundamental concept. Instead, I too thought that it was all about writing your tests up-front. And, initially, I wasn't very keen on the idea. I believe that adequate testing is definitely useful. But I wasn't really convinced why we needed to do test-first. Wasn't it just as useful to have tests slightly later after the initial design so that your tests actually have a chance of, erm..., passing?
So I used to read papers studying the success of TDD with my own confirmation bias. I always looked out for little things that the authors missed that could invalidate their claims of the success of TDD. There weren't hard to find since it was impossible to do a fool-proof study of TDD in any actual environment.
But here's the interesting part. Now that I am more in favor of TDD, those little things still cause me to be skeptical on how useful TDD is (especially if the authors forget the part that TDD isn't just about testing first!). The case studies aren't really conclusive enough to help me predict if using TDD is a requirement for good software. N.B. Evaluations on small projects aren't particularly helpful either because when your project is small it is likely to succeed even if you don't have a proper process.
Sure, TDD's proponents are still enamored by it. But the views of its opponents (maybe that is too strong a word) cannot be ignored either.
Some of the most important things about writing software include delivering the product to the client on time, ensuring that the product has good quality, ensuring that the product fulfills the requirements and also ensuring that the code is maintainable for subsequent releases.
And right now, we don't have strong evidence that TDD is essential in accomplishing those tasks. There are teams that do not do TDD (whether for design or testing) and yet produce exceptional code. There are teams that start of being gung-ho about TDD and stop doing it halfway because they run into problems. So what does following TDD actually tell us?
And it's not just about TDD. What about things such as refactoring, pair-programming, and all the other pillars of agile development. Or what about all the latest trends in software development such as SOA, cloud computing, etc.
We still don't have a good way to evaluate such things other than to try it out. Trying it out isn't a bad thing but some of these practices cost time and money and could be prohibitively expensive to try out on a whim. And while some would justify it as paying the cost up-front instead of later during the maintenance stage, no one actually knows for sure whether the cost is worth it. And after trying it out, unless we do proper experiments we can't measure the actual merit of that technique. And without proper data, we are inclined to make skew predictions about our ability to replicate the success we had in one software project in our other projects.
And when you cannot actually verify those claims, you run into the danger of herd mentality, religious debates, zealotry. And when something new comes along, you either obstinately stick to you old practices or apostatize and switch over to newer paradigms.
There needs to be more research on studying how to effectively measure1 the effects of some software development technique. Now, it could be extremely hard to do or even impossible. But without proper studies, we only have our gut instincts to rely on and that is no better than flipping a coin and letting it predict what software practices to follow....
Now I like all the agile development practices. I find that it makes me feel more productive. And it gives me better confidence that I am writing good code. But is that enough as a measurement of how useful a practice is?
Here are some of the TDD papers that I have read that some might find interesting:
- Evaluation of Test-Driven Development
- A Prototype Empirical Evaluation of Test Driven Development
- Realizing Quality Improvement Through Test Driven Development: Results and Experiences of Four Industrial Teams
- On the Effectiveness of Test-first Approach to Programming. See discussion here.
In Outliers, Gladwell presents an interesting chapter about how Asians are better in Math; he tries to bring his point across by illustrating how the traditional Asian habit of planting rice in the paddy field can be seen as an influence on the way Asians approach life (and its problems) and how that approach helps them to be better at math.

Like most of Gladwell's books so far, that particular chapter is punctuated with a whole bunch of anecdotes and narratives. They make for an interesting read but do not help make the chapter more cohesive. In fact, they even appear to be a series of unrelated events that misguide the reader into thinking that they are more strongly related than they appear.
In particular, Gladwell has this story about the way numbers are represented in different languages. He gives the example of this random series of numbers: 4, 8, 5, 3, 9, 7, 6. And he wants the reader to memorize that series. In English, one would read it out as: four, eight, five, three, nine, seven, six. In Malay (another language that I know), one would read it as: empat, lapan, lima, tiga, sembilan, tujuh, enam. And finally, in Cantonese (or other Chinese dialect) it would be: say, putt, mmm, sum, kau, chud, loke. So in terms, of syllables, Chinese trumps English; and both Chinese and English trump Malay which is really inefficient when it comes to counting syllables. Gladwell suggests that the way numbers are pronounced impacts the ability of the user to memorize the series. And this impacts the ability to count and thus perform other math skills such as addition. So, his observations suggests that Chinese users should be able to do math better than other language users.
As someone who knows three languages, I find this really hard to believe. If I were to count out loud, I would do it in English (so English is my primary language for math). However, most of the time, I don't need to count out loud. I just do all the calculations in my head with the most concise notation possible: numerals e.g. 1, 5, 12, etc. Why would I even care about the number of syllables when I can efficiently use the numeral notation?
Things might be very different, if I were forced to use the roman numeral notation which does not lend itself well for mathematical manipulation. However, as far as I know, most languages will use the decimal numeral system that we are used to. In fact, if Gladwell were to extrapolate his notion of "efficiency in representation", he might be shocked to discover that the Chinese way to represent numbers is fairly inefficient compared to our decimal system. For instance to represent, 999, we use three digits. In Chinese, it would be 九 百 九 十 九 (if the text does not come out right, see minute 3:49 of this video). Now, I don't count in Chinese so I am not sure how much this impacts the ability to add. The decimal system feels natural to me since I can add in each position and just carry over to the next position as necessary. The Chinese could have some special way to do the addition that accounts for the longer notation.
Anyway, my point in bringing this up is not really to criticize Gladwell. Rather I want to point out the fact that there are a lot of unrelated events that if taken superficially seem to corroborate something that isn't quite right; it could be because we only have a shallow understanding of the matter and have not even begun to dig deeper into the crux of the problem. Or it could be that we are not defining the problem that we are trying to solve properly.
And that brings me to two papers in CS that I have read recently: The Geography of Programming and Towards Harmony-Oriented Programming. The former was the basis for the latter paper.
The first paper, The Geography of Programming, was based mostly on the observations from the book The Geography of Thought which I have read and do not really agree on. I found the observations and experiments in the book to be jaded and outdated. Most students in the current era have already been affected by the globalization movement (or rather the diffusion of Western thought). Therefore, most students aren't easily classified as Western or Eastern thinkers anymore. In fact, most students today are hybrid thinkers, being able to utilize both Western and Eastern thought processes as necessary.
Which is why I don't really agree with how the authors presented in their first paper. I do not have a problem with the idea of creating a less rigid programming paradigm – Harmony-Oriented Programing – that would allow for greater flexibility in creating software. I think that idea is interesting and has already been attempted in various forms throughout the years e.g. aspect-oriented programming, conscientious software, agents, etc. What I do not like about the paper is the mythicism that the authors choose to surround their work with. In particular, they choose to shroud their work with invocations of Eastern thinking and philosophy. I found that to be completely unnecessary and confusing. And fundamentally, wrong because they have taken the notion of Eastern philosophy completely out of context. Making it a part of a series of unrelated events to corroborate their work (much the same as what Gladwell has done).
As for the second paper, my colleague, Jeff has already written a blog post on it during the presentation in OOPSLA 2008. Like I said, I like the work and research that they are doing, but I find that the approach of promoting it to be rather unappealing. It's what I would call a series of unrelated events slapped together to serve as a rather weak metaphor to differentiate it from other existing work.
Harmony-Oriented Programming (HOP)
The problem that HOP is supposed to address is the rigidity of software. However, Brooks has already pointed out that sometimes, people are too enamored with the perceived malleability of software. To promote malleability, HOP advocates the separation of code and data. Inherently code needs to act on data. But instead of binding the data directly to the code, the data is supposed to come in from different sources. One can think of this as being similar to the idea of eval in dynamic languages where a snippet of code is executed in some environment that provides bindings for the variables. However, in HOP, the environment is not provided explicitly -- the environment exists and permeates all the code snippets. All code snippets are supposed to pick up whatever they need from the rich environment. And in turn, those code snippets will contribute back data into the environment that other code snippets can take advantage of (sounds a lot like a proliferation of global variables to me). The authors are trying to create an actual programming environment to evaluate HOP. However, in both papers, they have not yet address the important fundamental issue of how to prevent unwanted interactions between code snippets if all of them will be accessing the same environmental primordial soup of data at the same time.
In Classic Testing Mistakes, Brian Marick presents several problems with the way testing is done nowadays. One of the mistakes that I have found most interesting is the over-reliance on beta testers (or just anyone who uses the software).
"Beware of an over-reliance on beta testing. Beta testing seems to give you test cases representative of customer use - because the test cases are customer use. Also, bugs reported by customers are by definition those important to customers. However, there are several problems...."
He gives several reasons for this (the original paper is very readable so you might want to read it first) but I have recently encounter two of them personally:
- (Someone that I know...)When your product doesn't perform properly, some users not only do not report a usability problem, but they also quickly conclude that your product sucks. They don't bother looking on the forums or seeking help; instead they completely conclude that your product is not usable and it's not their fault that they don't RTFM. Now, as a developer you would be interested to at least capture what prompted this review of your software instead of seeing the whole bunch of "x product sucks" on the internet(try it: google for "eclipse sucks", "netbeans sucks", etc). Most problems can be easily resolved by asking on the forums and the data that the developer gets from this could be used to improve the product in the next release to make it more usable.
Scenario 1 usually happens when someone is forced to use a particular product. He usually doesn't feel satisfied that he was not given the freedom to choose the product (in particular an IDE). And when things begin to go wrong, instead of thinking that he might be doing it wrongly, he concludes that the product sucks.
Of course, Scenario 1 would be a completely different case if the user had actually spent some time seeking help and was told that there is no solution for the problem that he is having. In that case, the product really needs to be fixed. - (Personal experience; I feel slightly hypocritical about this...) I needed to use NetBeans over the past week to play around with JavaFX. Because Apple doesn't ship a Java 1.6 JVM for the older 32-bit Core Duo machines, I had to use Windows XP under VMWare Fusion. Anyway, there were a lot of things that NetBeans did not support that could make development easier. I could definitely still use the product and there was nothing seriously wrong with it but there are some features that could definitely make the product more usable. Disclaimer: I am an Eclipse/ IntelliJ user so I might not be familiar with the philosophy of NetBeans. Did I bother reporting such issues? No, unfortunately, because I just wanted to get things done, pack-up all the JavaFX components that I created as a jar that I could then use from Eclipse (as part of my actual plug-in project).
So Scenario 2 happens when the user doesn't really need to use the product all that much but is just a casual user (for the time being). However, the problems that the user encounters early as a new user is actually pretty useful for the software developer since it could be used to improve the usability of the product in the future.
So there you have it, two personal encounters that demonstrate why it is not really a good idea to rely too much on beta/end users to test your product. I have submitted one usability problem to Eclipse though, and that was only because I use Eclipse so much daily that the issue really bugged me. It was a UI issue and it is one that I have trained myself to circumvent. Nonetheless it bugged me enough the first time I saw it that I decided to report it so that new users will not have to suffer through it.
So end users with some form of product loyalty – people who use it constantly by their own volition – are the only people who are likely to report bugs with your product. Or, if you have read The Tipping Point, they are likely to be the mavens who take pride in their knowledge of the product.
Some chapters of The Tipping Point are actually pretty useful for software testing since it gives some idea on who is more likely to help promote your product and who is more likely to report problems. Compared to non-software products, we do have a better way to keep track of all the issues that come in (bugzilla, etc).
The panelists: (from left to right) Marjan Mernik, Juha-Pekka Tolvanen, Gabor Karsai, Charles Consel and Kathleen Fisher.
DSLs seem to be a rather popular topic at OOPSLA this year. There was a workshop on it and several tutorials as well. And there was also the DSL panel which I attended this morning. Most of the talks and tutorials on DSLs assume that the audience is using specialized modeling tools to create them. Basically those tools provide a specialized environment for creating the meta-model for your DSL, specifying the grammar of your DSL and helping you specify templates for code generation (which actually makes your DSL useful). Such tools aren't meant to help you create a full-fledged programming language. Instead they help you quickly create a small language that can be refined in an agile and iterative process. One of the panelists commented that this might be more productive (and less intimidating) than using lex and yacc.
So, as a summary of the panel, let me present to you the good, the bad and the ugly about DSLs:
The Good
- Most people are already using DSLs in the industry and it seems that there has been some case studies that strongly suggest that using DSLs actually helps improve productivity. When asked, Juha-Pekka, who presented those results, told the audience that productivity was measured in terms of what the customer told him – so they could be measuring that it used to take 6 hours to complete something and now it takes 1 hour to do it; thus gaining a 6-fold increase in productivity. Such numbers were actually reported on his slides.
- Most panelists agree that using a simplified language like a DSL helps the user avoid certain mistakes that a general-purpose language might incur. After all, it is not possible to express certain constructs in the DSL that would be possible in a normal programming language thus lessening the chances for making those mistakes.
The Bad and The Ugly (we can just merge them)
- As far as I know, it's hard to actually tell what are good qualities of a DSL. So if I am designing a DSL how do I know what direction to steer it to so my customer would benefit more from it? For instance, is DSL A better than DSL B? And in what aspect? The only answer provided to this question is that creating a DSL is an iterative process and depends on what your customers like.
- Which brings me to the second point: how to we duplicate success that we had achieve in using one DSL into another project? If we cannot systematically distill the finer points of each DSL, it would be hard to repeat its success on another project. It is also wasteful to start from scratch every time – though the panelists seem to argue that since it's a domain-specific language, it might better to create very specialized ones for each customer.
- And this leads me to my third point: how to share DSL among different projects? Currently, there is very little being done on researching what is the best way to share DSLs among projects. Also, do we actually need to share the DSL or just the underlying meta-model that the DSL embodies? A simple and naive way might just be to form a standard and have everyone use that. Though, of course, that naturally, leads to a bloated DSL since everyone would have some specific feature that no one else needs.
- The panelist also agree that there are currently very few tools to help DSL developers. Currently there aren't many tools to help a Domain-Specific Environment (DSE) that will include all the features of a normal IDE such as code-completion, debugging, etc. But more importantly, there aren't tools to help domain-analysists distill the features of the domain itself – the most important step in creating an ubiquitous language as a basis for the DSL.
- Also, when asked if they prefer a internal DSL (implemented using the constructs of an existing programming language – common in Ruby) or an external DSL(implementing a new language), the panel seems to agree that it might be better to use an external DSL so that the user will not be tempted to escape from the domain and use constructs outside of the DSL. For instance, if we were to implement an internal DSL using Ruby, we would have at our disposal, all the feature of the Ruby language which can be used in our DSL. This makes it tempting to use existing language features in an ad-hoc manner and pollute the actual DSL.
The interesting part of the panel only happened during the second half when the audience was given the opportunity to ask questions. The first half was wasted (pardon the word) on having each panelists spend 10-15 minutes talking about their work and their position on DSLs. In my opinion, that was a serious mistake since it only gave the audience about 30-45 minutes for the interesting questions.
Coincidentally, I just found out that Martin Fowler actually has a work-in-progress about DSLs. It might be better to read about DSLs from him rather than other DSL enthusiasts since his approach might be a more unbiased view toward how to do DSLs; instead of a DSL-enthusiast who is already too enamored by the very idea.
"Brooks was fooled."
– Dick Gabriel
In OOPSLA 2007, Fred P. Brooks gave a keynote entitled Collaboration and Telecollaboration in Design. I actually wrote a blog entry about it. Overall, I thought that it was a fantastic keynote.
Software Engineering Matters : OOPSLA '07: Collaboration and Telecollaboration in Design:
"There could be multiple architects but there is one chief architect who has the authority to make the final decisions. Even in an open source project like Linux, Linus Torvalds still has the ultimate veto power on what to put in and what not to put in. This almost seems like a dictatorship but if that is what is required for conceptual integrity then it is something that has to be done."
However, there was one over-simplification that Fred Brooks used; he claimed that all of the most innovative products in the world today are created by one individual (or at most two). As examples, he used Albert Enstein, Thomas Edison and the Wright Brothers (the only group of two that Brooks cited).
A couple of people, including Ralph Johnson, pointed out after Brooks's talk that it wasn't necessarily true; in fact, a lot of excellent products today are not created just by one individual. Instead the product usually created by a committee or group and the product would not have been possible without those other individuals.
I definitely agree that Brooks might have oversimplified his examples to get his points across. But then again, he isn't really to blame. Society has always ascribed greatness to one particular individual for each accomplishment. That is just how the world works. And though some might say it isn't unfair for just one individual to claim all the credit, it certainly makes more sense to say that Albert Einstein came up with the theory of relativity than saying that Einstein collaborated with Tom, Dick and Harry and they all helped him to come up with that theory. Besides being a mouthful, it is hard to trace which individuals actually influenced Einstein. If you wanted to, you could even optimistically claim that it was the series of all his interactions with different people that enabled him to formulate his theory. But no one does that because while people realize that such works of marvel aren't the works of an individual, it's best to attribute it to the individual who seemed most responsible for it.
Most people who have left it as that. However, some people might ponder about it for a few more days. And a small minority might even think about it for a few more weeks. But there is likely to be only one individual who would ponder about it for a whole year and write an essay on it – Dick Gabriel : )
So the title of his essay this year is Designed by Designer in which he cites numerous examples (and personal interviews) on why most grand works of art are seldom, if ever, just the work of one careful individual seeing everything from beginning to end. It is insufficient (and unfair) to study the mind of the perceived genius who created that work; instead we need to also study his collaboration with his colleague and how possibly, his work could have just appeared randomly(!) or the work was already there waiting to be discovered and the first person to do so was given the title of genius.
For me, Dick's presentation today was a very interesting one. It was definitely well-planned and well-executed in Dick Gabriel's style – it was artsy, akin to watching a National Geographic or Discovery Channel documentary. It wasn't really a counter-argument to Brooks; rather, I see it as a valid alternative perspective to what Brooks had said. After all, no one really knows what is the best way to create such grand works of art (and similarly no one actually knows how to create complex software perfectly each time). There is always the possibility that a small agile and egalitarian team could produce software of equal quality to that of a big group with organization and bureaucracy. And there is always the possibility that a group wrapped with organizational chains can also produce software of equal quality to that of a small agile group.
Some groups function better with the illusion of a romantic genius to guide them into unknown territories. Other groups prefer to approach everything as a team without a leader – who needs a leaders after all? we could all just take turns since none of us is infallible – and approach each new challenge as a group.
Dick's essay will be available in the OOPSLA proceedings. I will be spending more time reading (and appreciating) it when I get back. There simply isn't enough time during OOPSLA for deep, contemplative reading.
Seaside BOF
I didn't get to take any pictures in this session because I came in late and was sitting at an awkward spot. James Robertson from Cincom was there and has posted some pictures on his blog. I am actually in one of those pictures :)
Michael Lucas-Smith did a demo of WebVelocity – Cincom's new in-your-web-browser development environment for Seaside. I missed the first 10 minutes of it but it's similar to the screencast that he had done before. WebVelocity is cool but there are two things that I really like about it.
First, it has ample documentation provided for Seaside. Besides the base library from Cincom, I think that this is one of the best documented Smalltalk project ever. Not only do they have comments for the classes and method, but they also have a nice getting started guide all built-in and easily accessible. That was really fantastic! And to make it even better, they had an integrated search widget that allowed you to search through the class names, methods names and, now, comments! Something that was missing from Smalltalk before.
Second, WebVelocity bravely goes where no Smalltalk has gone before: it lets you edit your source code inside a single window – in this case, your web browser. You can actually see all your methods in one editor without having to open multiple windows! I think this is one of the most interesting (and smart) approaches that Cincom can take to get people to use Smalltalk. I believe that they have reduced the entry barrier for Smalltalk by presenting it in a more familiar environment to newcomers. Hopefully they get some positive feedback and comments on this and use that to help them structure their tools to suit both new and veteran users.
Squeak BOF
I took the pictures using my iPhone so they are all pretty bad actually. The only way I could salvage them was by turning them into black and white pictures....
There were three presentations in this year's Squeak BOf.
First Goran presented his project on creating a a lightweight Simple-CGi replacement for Seaside development in Squeak. He calls it Blackfoot and at the time of the demo he still had some bugs in getting Seaside to function. However, he is confident that he will be able to fix it and publish the code soon. His primary goal with this project was to create a replacement that was small, simple and fast. And the micro-benchmarks that he had showed that it was about twice as fast.
Next, Dave Ungar presented The Birth of Manycore Squeak. Basically he demonstrated what he and Sam Adams have been working on at IBM Research: writing a new Squeak VM that could run on top of the Tilera 64 multicore chip. Right now he has hacked the VM so that it can actually run the MVC UI in Squeak. It's also able to do simple object migration from core to core. However, as he emphasized, this is still work in progress and there's lots of things to be done. In particular, it might require adding some new primitives to Smalltalk to make multicore concurrent programming easier. And it might also require changing the programming model to make it easier to program things concurrently.
And finally, Jecel Assumpcao Jr. talked about Issues in Smalltalk Hardware Design. He gave a lengthy introduction to the various attempts at creating specialized hardware to execute Smalltalk throughout the years. Most (if not all) of those projects are now dead and obsolete. However, he is interested in creating a modern implementation for Squeak. From the looks of it he already has a draft of the architecture and ISA that he is planning to support. The spec for his Squeak bytecode processor, Plurion, is available from his Siliconsqueak web page. It wasn't clear from his presentation if he intents to create a multicore bytecode processor or just a single core one.
There weren't as many people at this year's BOF as I had expected. It was cool that the Cincom guys were there this time around though.
update: The videos are now available from Goran's site.
This year marks the 50th anniversary of Lisp. While I am not a Lisp hacker, I have dabbled with it (and lambda calculus) enough that I do take interest in things that happen in the Lisp world. There was the Lisp50@OOPSLA workshop today but I only had time to attend the first talk of the day by Guy Steele and Dick Gabriel.
Guy Steel and Dick Gabriel presented their combo-talk on The Evolution of Lisp. And like every combp-talk by Guy and Dick, it was entertaining and informative. I had never actually realized how many versions of Lisp there were until today! The slides that Guy and Dick showed contain the history of Lisp from the 1960s till now. They could even divide the different implementations geographically across the United States and showed how different version of Lisp from different states influenced one another. While the primary centers for Lips development were at MIT and Stanford, there have been many different efforts outside of those two universities as well.
The presentation must have been endearing for most of the Lisp hackers in the crowd. For me, it was also an insightful look into how one of the oldest programming languages had evolved and to see its impact on other modern programming languages.
For me, the most memorable part of the presentation was when Dick emphasized that though there were so many different versions of Lisp, each with its own little variation (making integration and code sharing hard) the explosion of those different implementations actually fueled a lot of interest and research into making better implementation for Lisp. Now that there are standards for Lisp, fewer dialects have popped up. And so by gaining standardization, you do lose out on some of the innovation because no one is interested in creating something that doesn't conform to the standards anymore. I thought that was something interesting to think about since we usually take it that standardization is always the best route. But Lisp has shown that innovation can often times blossom through many different home-brew implementations.


