Spoonful of Hacking: 2009

Sunday, 27 December 2009

Webdesign and Supercompilation

With supercompilation being a long-forgotten technique invented decades ago, and with “webdesign” term being usurped by graphic artists & HCI experts, I doubt this post will be anything close to popular, but as always, it will not stop me from expressing my opinion. But let’s take it slowly now.

Let us assume the “web design” in a good, broad sense now: not just the omnipresent “logo on the right vs logo on the left” & “10 tips to get more clicks”. Just as software design comprises multiple heterogeneous activities concerning the making of a piece of software, just as language design is about how to create a good language suited for the target domain, web design is in general about how to make a web site, a web service or a web app well.

Super-compilation is a program transformation method of aggressive optimisation: it refactors the code based on the most possible assumptions, throwing away all dead code, unused options and inactivated functionalities. If was irrelevant or at least unproductive during the structured programming epoch, but the results of super-compilation were promising before that and remain promising in our time, during the epoch of multi-purpose factory factory frameworks.

The current (at least since 1999) trend in web design is dynamics and more dynamics. The content and its presentation is separated, and most of the time what the end-user sees is what is being generated from the actual content stored somewhere in a database by using the representation rules expressed in anything from XSL to AJAX (in software we would call such process “pretty-printing”). However, this is necessary only for truly dynamic applications such as Google Wave. In most of the other rich internet applications the content is being accessed (pretty-printed) much more often than being changed. When the super-compilation philosophy is applied here, we quickly see that it is possible to store the pre-generated data ready for immediate end-user demonstration. If the dependencies are known, one can easily design an infrastructure that would respond to any change of data with re-generation of all the visible data that depend on it. And that is the way it can and should be — I’m running several websites, ranging from my personal page to a popular contest syndication facility, all completed with this approach: the end-user always sees the statically generated solid XHTML, which is being updated on the server whenever the need arises, be it once a minute or once a month. Being static is not necessarily a bad thing, especially if you provide the end-user with all the expected buttons and links. Saves time and computational effort on all the on-the-fly processing requests.

When will it not work: for web apps that are essentially front-ends for a volatile database access; for web apps that are truly dynamic in nature; for web apps where user preferences are inexpressible in CSS & access rights. When will it work: pretty much everywhere else. Think about it. Have fun.

Thursday, 3 December 2009

Type V clones

Clone detection has been an active research topic for decades by now, but it’s among those that never wither. We all know the basic classification of clone types: Type I is for two pieces of code that are identical in all aspects except perhaps for whitespace (indentation and comments), Type II is for two structurally identical pieces of code with variations only in whitespace and naming, Type III is for two pieces of code that have syntactically mapping constructs but can bear additional statements/expressions somewhere in the middle, and Type IV is for two semantically equivalent pieces of code that have the same functional behaviour but can be implemented differently.

Copy-paste programming is by far not the only cause for clones, we all know that too. And recently there has been another cause evolving: syndication and aggregation. There are just too many web services and RIAs, no-one can register on each one of them. (In fact, very rare ones go half as far as I do). Thus, in order to broaden one's potential audience, the users let the services propagate the same pieces of data: blog posts are fed into twitter updates, they become facebook status updates, etc. These updates are hyperlinked and heavily annotated, so I can’t help thinking about them as strictly structured grammar-abiding data (better known as “code”). The rules for propagation vary from bi-directional synchronisation to quite obfuscated schemes of one-directional non-information-preserving transformations. One the other hand, front-end grammarware (web-2.0-ware) like TweetDeck allows end users to aggregate updates from different sources on one screen (in the case of TweetDeck, we’re talking about Twitter, Facebook, MySpace and LinkedIn). In this case, the end users can receive the same information multiple times through different paths.

This leads us to the necessity of introducing Type V clones as two pieces of differently structured data representing the same information. The main difference is that such clones will most of the time be non-equivalent, with one derived from the other in a known (or partially unknown) manner. Some other scenarios exemplifying the non-triviality of this, follow:

“Identity X is connected to identity Y” coming from service A does not mean “identity X is connected to identity Y” on service B as well. However, these identities will appreciate being notified about the possibility to connect on service B as well (if not to be automatically connected).

“Identity X posted text T” is the same as “identity X posted text T with link L”, if L links to one of the clones, otherwise the second one is more complete.

“Identity X posted text T1 with link L” is a neglectable clone of “identity X posted text T2”, if T1 is a truncated version of T2 and L links to the second one.

If “identity X posted text T” often occurs together with “identity Y posted text T”, then X and Y might be the same entity.

When we have two streams which are known to be clones, we can try to establish the mapping by automated inference.

If we know the transformation R that makes an update U' on service B from an update U on service A, and we have U' at hand but U is unavailable (security issues, service is down, etc), we need to [partially] reverse R, as we did in our hacking days.

There is much more than that to be done, I’m just providing you with the most obvious raw ideas. Of more advanced topics one can immediately name identity clone detection, data mining, topic analysis and coverage metrics.

Friday, 25 September 2009

SCAM/ICSM/Twitter mapping

@abramh — Abram Hindle, PhD student, University of Waterloo, Canada
@avandeursen — Arie van Deursen, Software Engineering Research Group, Delft University of Technology, The Netherlands
@SebDanicic — Sebastian Danicic, Goldsmiths College, University of London, UK
@davema — David Ma, Calgary, Canada
@frama_c — Pascal Cuoq, INRIA, France
@grammarware — Vadim Zaytsev, PhD student, Koblenz, Germany
@ICSMconf — consolidated account set up by Jamie Starke
@jamiestarke — Jamie Starke, University of Calgary, Canada
@j_ham3 — James Hamilton, PhD student, University of London, UK
@JurgenVinju — Jurgen Vinju, CWI, Amsterdam, The Netherlands
@nicbet — Nicolas Bettenburg, PhD student, Software Analysis and Intelligence Lab, Queen’s University, Canada
@quinndupont — Quinn DuPont, Algorithmics Inc., Canada
@ssepotsdam — ?
@tkobabo — Takashi Kobayashi, Nagoya, Japan
@taoxiease — Tao Xie, North Carolina State University, USA
@tiagomlalves — Tiago Alves, PhD student, SIG, Amsterdam, The Netherlands
@tomzimmermann — Thomas Zimmermann, Microsoft Research, University of Calgary, Canada
@yk2805 — Yiannis Kanellopoulos, SIG, Amsterdam, The Netherlands

Please send updates or comment them here if necessary.

Thursday, 24 September 2009

Architecture Evaluation

During the ICSM presentation of Eric Bouwers about criteria for assessing implemented architectures I asked a question that raised a discussion that was proposed by Yuanfang Cai to be taken off-line. Since I already left Edmonton, I’m taking it on-line instead. There is no doubt Eric has made considerable contribution by analysing SIG expert opinions, reports and interviews, my question was more about the relation between architecture evaluation and architecture evolution and his proposal to integrate regular architecture re-evaluation into maintenance activities.

One of the definition of architecture that I remember from the time working in the same department with Hans van Vliet is that it comprises those components, dependencies, properties, configuration elements, etc—in other words, those parts of a system design that do not change with time or are the most reluctant to change with time. I.e., the easier it is to discard or to change something, the less place it deserves in the architecture. If you think the problem is purely terminological, please direct me to a perfect definition, and I will shut up. However, I believe there are some deeper issues here.

Can architecture re-evaluation be used as a system analysis tool that can deliver useful and non-trivial results?

So far I can imagine three scenarios: (1) the software system evolves without changing its architecture; hence, re-evaluation is redundant since it will provide the same results we already obtained; (2) the software system is redesigned in the meantime in such a way that its architecture changes as well; hence, re-evaluation is needed since we can no longer rely on the outdated data; (3) the software system evolves in such a way that the properties of its architecture can shift without noticing; hence, the answer to the question from the previous paragraph is definitely "YES". The first two scenarios are trivial, the third one is not, and I call for examples. So far I can think of only external ones, like when a new technology is introduced and makes parts of the existing system outdated/obsolete/incompatible/… Are there internal ones?

Thursday, 9 July 2009

GTTSE/Twitter mapping

@1TTechnologies — Denis Avrilionis, One Tree Technologies, Luxembourg
@BBasten — Bas Basten, CWI, Amsterdam, The Netherlands
@Elsvene — Sven Jörges, Ruhrpott, Germany
@Felienne — Felienne Hermans, PhD student, Delft, The Netherlands
@GorelHedin — Görel Hedin, Lund University, Sweden
@GorkaZubia — Gorka Puente, PhD student, University of the Basque Count, Spain
@grammarware — Vadim Zaytsev, PhD student, Koblenz, Germany
@inkytonik — Anthony Sloane, Macquarie University, Sydney, Australia
@JeanMarieFavre — Jean-Marie Favre, University of Grenoble, France
@JurgenVinju — Jurgen Vinju, CWI, Amsterdam, The Netherlands
@MedeaMelana — Martijn van Steenbergen, MSc student, Utrecht, The Netherlands
@MichalPise — Michal Pise, Czech Technical University, Czech Republic
@notquiteabba — Ralf Lämmel, Koblenz, Germany
@PaulKlint — Paul Klint, CWI, Amsterdam, The Netherlands
@PauloBorba — Paulo Borba, Software Productivity Group, Pernambuco, Brazil
@radkat — Ekaterina Pek, PhD student, Koblenz, Germany
@TerjeGj — Terje Gjøsæter, PhD student, Grimstad, Norway
@TvdStorm — Tijs van der Storm, CWI, Amsterdam, The Netherlands

Please send updates or comment them here if necessary.

Wednesday, 10 June 2009

Floating code snippets in LaTeX

Last week I decided to separate “figures” that contain diagrams, graphs and parse trees with “figures” that contained source code snippets. In LaTeX it means the former kind stays in figure environment, while the latter needs to reside within its own. I googled for solution and was quite surprised how the web was full of cluttered random hacks that were done without any understanding of TeX internals. My solution is 9 lines long, and it solves three problems: the environment itself, the list of them and referencing issues.

\usepackage{float}
\usepackage{tocloft}
\newcommand{\listofsnippetname}{List of Listings}
\newlistof{snippet}{lol}{\listofsnippetname}
\floatstyle{boxed}
\newfloat{snippet}{thp}{lol}[chapter]
\floatname{snippet}{Listing}
\newcommand{\snippetautorefname}{Listing}
\renewcommand{\thesnippet}{\thechapter.\arabic{snippet}}

The first two lines connect two packages: one for a mechanism of defining new floating object types, one for toc-ish lists. On the next two lines we define the new list — at this point the new LaTeX counter is already created but not used anywhere. floatstyle can be plain, boxed or ruled — I decided for boxed since I was using boxedminipage inside the old-style figures anyway. Then we define a new float type which fails to define a new counter and has to use the one for the list we already made, just as planned. We finish up by giving the new floating environment some names.

That’s it, we’re done. Just use \begin{snippet}…\end{snippet} and \autoref{…} as you usually would with figures and tables. I see no need to create more counters, to brutally mess with @addtoreset and theHsnippet, etc. Hacks must be simple, effective and beautiful.

Monday, 8 June 2009

Python pains, part 3

Another thing I miss is perfect matching in in and not in operators. And when I say ‘perfect’, I mean prolog. Yes, I’m mad at python because it’s not prolog. Just take a look at this wonderful code:

Neat, eh? So, why can’t I do this, then?

(Yes, I know what the underscore variable in python is for, just pretend you don’t and roll along, ok?)

The nastiest thing about this issue is that it is not universally solvable. Once you’ve implemented a special magic matching function for this particular case, there will be someone who wants to write:

If the whole tuple on the left hand side is defined, one can do the matching perfectly in python. If some parts ain’t, one cannot and has to write an awkward external matching/traversal function. Too bad!

Friday, 5 June 2009

Python pains, part 2

Binary operators are not directly reimplementable, this ain’t Ruby. Some of them can be changed implicitly with a bit of __smth__ magic, others are implemented in the class of one side with a parameter of the other side—and it happens that I often need it the other way around. Suppose the idea of the script is to open the file, read all the lines, stick them together, strip the result, parse it, perform some xpath and create an new element for each element found that way. Easy? Yes. Realistic? Well, the reroot2nonterminals transformation generator is written that way. What do we see there?

And how this would have looked like in my dream python?

Yeah, I know, I know. The problem is inhererent and not really the sole fault of python. However, python already has that nifty self mechanism, which eliminates the difference between a.method() and method(a), and that duality is commonly exploited with string module, for instance. The only thing I need here is two selfs for binary expressions. Or more, for n-ary… Or even…

(The ElementTree is intentionally left out for its side effects, don’t bother). This code actually works, with some magic woven in the Wrapper class. Anyway, if any language ever does this kind of meta-programming naturally, it would be beyond good and evil, I could honestly retire as a language engineer, shave my head, get a peg leg and become a pirate ninja. For now at least it is implementable as meta-hack.

Thursday, 4 June 2009

Ten Dying IT Skills

The main problem with the article Ten Dying IT Skills is sloppiness. The author, Linda Leung, is apparently not knowledgeable enough in the areas she writes about, hence the weird choice for list items, hence the complete mess in the explanation.

Take (3), for instance. ‘Microsoft eventually replaced J++ with Microsoft .Net.’ It is plain impossible to replace a language (J++) by a platform (.NET). J++ has been superseded by J#, MS VS 6.0 grew up to be become MS VS .NET, and the underlying infrastructure is CLI instead of JVM (if anyone cares). Even then, why is this a ‘dying skill’? One of the main design goals for J# was to make a language that would allow experienced Java programmers to develop software for .NET. If you possess a good knowledge of J++, it’s not a problem to find a job as a J# programmer.

Now take (6). She maintains that Extreme Programming is a dying skill, which is utter nonsense. From the article it seems that XP is dead since 2003. In reality, Agile Manifesto has been published in 2001 and the methodology only began blooming ever since. ‘Losing ground due to the proliferation of offshore outsourcing of applications development’ also seems unlikely—outsourcing or in-house development, but the software needs to be developed and go through its life-cycle, which is almost never waterfall nowadays. XP is not the only rapid application development/agile software development method, by the way, there’s also Scrum, DSDM, FDD and many others. Even on Twitter, look at @KentBeck, @WardCunningham, @RonJeffries, etc, they are all alive and kicking liek whoa. Look at Ruby on Rails, for crying out loud! If that’s not hot, I don’t know what is. In modern language engineering DSLs also tend to be produced in an iterative fashion.

Skip to (9). I fail to see how the conclusion that HTML is a dying skill is drawn from the fact that ‘good grasp of HTML isn't the only skill required of a Web developer’. Sure, it’s not the only one, but a crucial one. Yes, no-one is writing HTML in text editors anymore like we did ten and fifteen years ago. However, the value of validation and conformance has been understood and appreciated since then, and there are so many HTML embedded languages and technologies—like PHP or ASP—that are simply dangerous to use without HTML knowledge. Bottom line: HTML is not a dying skill, it’s an absolute must. A ‘web programmer’ that states (s)he doesn’t know HTML will never get it to a job interview, (s)he will be scratched out at the first line of HR folks by a big fat red marker. Don’t call us, we’ll call you (or maybe not).

And, finally, (10), the good old Cobol. ~~Leave Cobol alone!~~ Oops, I meant, stop bashing it! A lot of people have been talking for decades about how dead Cobol is, and those are the same people that use hole in the wall machines, ticket booking services and other Cobol applications on a daily basis. Their salary is surely being calculated either in Cobol or in ABAP (which is another step from Cobol to hell). Nobody seems to be aware that most business-critical software is run in Cobol. Why? Because it works. It just works. And I’m not talking about some legacy systems that maintenance programmers are afraid to touch—I’ve been to a couple of Cobol conferences and talked to people from industry (IBM, ABN AMRO, ING Group, Micro Focus, SOGETI, Getronics-PinkRoccade, etc), they are well aware of all pros and cons, and they want to stick to Cobol, at least for backend. SOA is not used to get rid of Cobol, but rather to support it and to introduce new interfaces to the same applications. New applications are being developed in Cobol, too. It takes a lot of time to train a Cobol programmer, and let’s be honest, it is not the most exciting job in the world, but there’s always a market for that, and Cobol programmers and mainframe experts are paid much, much better than Java programmers. Not the least reason for that is the constant lack of new blood.

Python pains, part 1

I’ve been a devoted python hacker for almost a decade, and I don’t plan to back off (at least unless Python 3000 forces out 2.x). I program in dozens of other languages, but whenever I need to hack something up, python’s almost always the choice for me. Still, there are things I miss/dislike about it, and I plan to dedicate a couple of posts to them.

The first thing I don’t like is the semantics of list methods. Those methods always disrupt the fp-ish flow of my script. I want to write:

But I have to write:

And that only if I won’t need the original value of a!

The point is: if I ever want to mutate the original list, I’d be willing to write something like a = a.reverse(). On the other hand, assuming that this is what I always want is far-fetched and limiting to my functional way of thinking.

The universal solution would be to make a library of wrappers like this:

Which is ugly (see for yourself), not integrated (a.append(b) vs append_(a,b)) and involves a lot of copying—more than probably really required. So, for better or worse, I end up each time implementing only wrappers specific to my current app.

Spoonful of Hacking