Code to nowhere

Blog :: N. T. Rutherford

0 notes &

I’m currently writing the approach and case-study/prototype sections for my thesis to get off the paper trail for a while & more towards evaluating something. While fleshing out these and other sections I’ve been on the look-out for references justifying the whole exercise. Here’s a couple of things I found that might be of general interest:

I’m on the lookout for something correlating poor availability and/or performance (especially due to surges from /. or ‘flash crowds’), ping me if you have something in mind.

p.s. The ogre/troll story still might happen, but not in the near future — they keep moving around.

Filed under distributed systems qos thesis web

0 notes &

Pretty Ubuntu terminal colours

Solarized is an interesting general-purpose colour theme for the terminal, editors, etc. Setup in Ubuntu is not explained on the project page, but essentially you can install to your editor/ide of choice by seeing the appropriate project page (recall that Sublime Text 2 consumes Textmate themes). The gnome terminal is a little more fiddly, but fortunately there’s a script for that. I was put onto it by the blinks theme from ohmyzsh.

Filed under ubuntu linux terminal colours

0 notes &

You know you’re making progress when…

…you realise that you significantly misinterpreted the initial work.

When that’s caused by opaque or inconsistent writing it’s quite annoying. It now appears that my system will be more widely applicable than I’d thought, as it operates on key ranges or partitions rather than individual files. It’s still dependent on “actuation points” in the storage systems for manipulating the ranges and which servers are involved, but this is less constraining than I’d previously thought. Time to revisit the related work and second system-integration target.

That’s good, actually, I think? What worries me about it is I don’t think I’d have realised this without source-code access. I’m picking the paper apart now to find out what I misread, of it’s just not properly explained, but I’m getting the feeling that their terminology is inconsistent and there’s a section missing from their system outline.

I’d like to give feedback on this, a paper review if you like, but as it’s already been published it’s too late. That seems unfortunate. Why don’t we have a standard place to go and comment on papers? If we do then it’s not standard enough, yet. This could salve the issue of reproduction and falsification papers being in the minority, or in my view wholly absent, in systems research at least. If difficulty in publication is the issue then let’s sidestep the publishers. This is somewhat paradoxical, given they’re there to enforce peer-review.

On a more positive note, I’m very grateful to Berkeley’s CS lab for open sourcing their prototype code for closer examination by the outside world. This is a brave, bold, and potentially embarrassing move, but to my mind essential to peer review. You may write one thing in your paper, but have done something quite different in your code, perhaps unintentionally (bugs?). Indeed, issues with instrumentation code could scupper your paper’s results section. It’s also great for people wanting to build on your system, be it your suggested future work, or something they’ve thought of.

For reproducibility it could be argued that the next researcher should start again, and reproduce the whole system, to see if they get the same result. While I appreciate the idea, I think that it’s unrealistic to expect everyone to start again — systems can be prohibitively large. Moreover, what is the goal of said reproduction, to eliminate software bugs or measurement error? You don’t eliminate bugs by starting again, rather you produce new ones.

TLDR: +1 for open-source prototypes, +1 for having post-publication peer-review, -1 for “bad”* writing.

* the paper’s actually quite good, and it’s hard to cover everything for everyone.

Ed: this Scientific American article “Secret Computer Code Threatens Science” is worth a read.

Filed under thesis

0 notes &

C++ in a day for the lightly seasoned

The following is a half summary half rant about the C++ I was able to pick up in a day. I have a background with C, Java and Ruby, so a lot of it was familiar already, but the syntax, idioms, and some of the language features are still quite different. My aim was to pick up enough to understand how to do constraint programming with Gecode, though I think I went a little beyond that.

I started at Google’s code school, but wouldn’t recommend it unless you have no idea what you are doing & a lot of time to spend on it. See footnote. I found http://www.cplusplus.com/doc/tutorial and http://www.cprogramming.com/tutorial helpful. This article on C++ from Java could be interesting, but I’m going to skip it and get on with the Gecode homework assignments now.

The first half of the tutorials I’ve seen are essentially C tutorials, and can be skipped if you’re comfortable with control statements and basic pointers.

Control structures

The same as C.

Objects

There are some confused tutorial writers out there who think “class” and “object” are synonyms, but never mind that for now.

Class definitions look a lot like Java, but with some important differences.

Member function (method) definitions can be declared in the class body, but declared somewhere else. I like this idea a lot, due to it providing a clean interface, but have no idea how well it scales.

Object instantiation is a lot more concise than in Java (no assignment needed!), but be aware of the difference between object holding variables, and pointers to objects. Also check up on polymorphism with pointers, and virtual functions, and seeing how virtual funcs are in other languages. I prefer how Java and Ruby abstract this.

The relationship between classes and structs (unions) is interesting, coming from languages where objects are something special. Emphasis seems to be put on memory locations rather than object notions such as encapsulation. I’m sure there are other talks/tutorials which improve on this (perhaps here) but I found this interesting.

Having seen how inheritence works (or doesn’t, if you look at friendship) it may be confusing to see constructors inheriting from things that aren’t class names. Actually “:” has been overloaded and you should take a look at initializer lists. This confused me a lot, and is what pushed me into spending yesterday on this.

Being able to specify public/protected/private for the inherited classes, and also having direct multiple inheritance, are interesting aspects of the language, though - as usual - this can be achieved through other patterns such as composition.

Templates

Are Java generics, for example List<Int> and List<String>

STL

This was disappointing, I thought from what I’d heard in the past it was some kind of macro system. Instead it’s the C++ equivalent of the Java standard library. It provides things like List, Vector, etc. Tutorial. Of course, coming from Java and other languages with reasonable standard libraries this isn’t a big deal, but I am forgetting that C came with nothing and you had to write, find, or buy everything for yourself.

Memory management

Godspeed.

Footnote, on teaching C++ to beginners

The Google code school for C++ provides some well-meaning practical examples, sets the activity in context by explaining some things they do at Google, and giving examples of other interesting software, but largely the tuition is delegated to outside tutorials. Those tutorials are incomplete and not so hot.

To be fair to the authors though, writing a beginner’s guide to C++ is a futile endeavour: C++ is a terrible language for beginners. There are too many concepts to cram into a small space. When you’re teaching a concept, avoid detours. If you can do a 1-line hello world that’s great. If you have to explain stdio, compilation, linking, platforms to get there you’ve already lost. I remember some years ago being very confused by stdio and how it related to my program. If you are writing in Ruby or some other higher level language you just don’t need to see these things, they’re abstracted away. Don’t try and teach systems concepts and object concepts and procedural programming concepts and algorithms (etc) to someone — especially me — all at once!

I wish I’d picked up PHP, Python, or even Lisp instead of buying C++ books when I was younger. Alas, I wanted to work on Quake mods. Never got past chapter 3/4/5 on structs/pointers/malloc. Now, some years on, having used other languages to build things, learn object concepts, pick up system concepts through shell usage etc, it all fits together and seems trivially simple. Back then, it was overwhelming. :)

Now may the search-space pruning commence!

0 notes &

Latex for Ubuntu, 2012, part 2

I’ll wrap this up tersely, as I should really be getting on with my thesis.

I’ve now settled on the following:

  • Sublime Text 2
  • Zotero
  • Makefile & latexmk, either from terminal or ST2 build integration
  • ST2 LatexTools is promising, but needs work, hence the Makefile.
  • Okular and Adobe acroread for PDF viewing
  • VC & backup: git, github, dropbox

Straight to the point: if you’re a TextMate user migrating to Linux I can’t recommend Sublime Text 2 highly enough. It’s the only thing that comes close, without investing time in mastering emacs or vim. Kile is interesting, and if you’re familiar with KDE and like Kate then it’s a great option, but I just didn’t like it. I never tried writing anything in Texlipse, and didn’t give the vim extension a go.

While nosing around the ST2 LatexTools code and issue tickets I discovered latexmk, a perl hack which runs latex enough times to resolve all the missing references and whatnot. Much better than clicking a button 5 times as in TextMate. I’ve thrown together a very basic template.

I’m going with Zotero over Mendeley for reference management, for the time being, since LatexTools has a nice citation completion feature, making Zotero’s lack of an export citekey feature moot. I may reconsider this in future, but for now I want the bib file to be an encapsulated part of the project, not a dependency on a system-wide file provided Mendeley. I don’t know what I’ll do about the agglomeration of papers I have scattered between the two programs; another job for the summer.

Okular is a PDF viewer for KDE, but installs easily on Ubuntu through the software centre, and works reasonably well in gnome 3. However, it’s a compromise. While inverse search is nice to have (shift-click text in the pdf file, and you’ll be taken to the tex source file where that text appears), the pdf rendering is abysmal and actual-size is not actual-size due to screen dpi bugs. Evince (Gnome’s PDF viewer) does get the size right, if you fix your screen’s DPI settings (with xrandr), but doesn’t have menus or documentation indicating whether it does inverse search. It also doesn’t render things very well. Acroread integrates with nothing but the mothership, though it does render things properly, and even lets you specify your DPI as an application preference. It provides a better reading experience than evince and okular, but is less convenient while editing. None of them are as good as Preview.app or Skim on Mac OS, though I very much like how Okular’s selection tool works.

So, tools picked, it’s time to get to work.

Filed under ubuntu latex

1 note &

Abstract #1

It seems strange writing an abstract for a literature study, but then I’m not in the habit of submitting independent literature studies.

This study comprises the literature and system surveys performed for the masters thesis project “Self-optimisation for elastic Cloud storage services”. It introduces the concept of self-management, in particular self-optimisation, and its application to storage systems for cloud computing platforms. Furthermore, to provide a basis and motivation for the project, it reviews recent work in the domain, and pertinent concepts from AI, reinforcement and statistical machine learning, and constraint programming.

The result of this study will be used to extend the Berkeley SCADS Director, with the aim of reducing provisioning costs by exploiting predictable traffic patterns.

Now for the hard part — the rest of the document.

Filed under thesis cloud self-management self-optimisation

0 notes &

Today, many programmers believe that this complexity is best managed by using only a small set of well-understood techniques in their programs. They have composed strict rules (best practices) about the form programs should have, and the more zealous among them will denounce those who break these rules as bad programmers.

What hostility to the richness of programming—to try to reduce it to something straightforward and predictable and to place a taboo on all the weird and beautiful programs! The landscape of programming techniques is enormous, fascinating in its diversity, and still largely unexplored. It is certainly littered with traps and snares, luring the inexperienced programmer into all kinds of horrible mistakes, but that only means you should proceed with caution and keep your wits about you. As you learn, there will always be new challenges and new territory to explore. Programmers who refuse to keep exploring will surely stagnate, forget their joy, and lose the will to program (and become managers).

Marijn Haverbeke, Eloquent Javascript.

I’m hovering somewhere around the last sentence at the moment. Time to stop worrying about style and get on with making things.

1 note &

LaTeX for Ubuntu in 2012

This post will skip over my installation process, and give an initial opinion of the Texlipse TeX editor for Eclipse. In future I’ll aim to comment on which LaTeX editor I found best, and whether I settled on Zotero or Mendeley for reference management (where they should be helpful in finding or importing documents, and in working with LaTeX documents).

Installation

Coming from a Mac, where installing MacTex (TeX Live) is very easy, this was excessively time consuming. TeX Live is the way to go. For Ubuntu, or other Debians, this is a bit fiddly, as the deb package is 2 year out-of-date. Fortunately, installing 2011 is pretty easy, and there’s a work-around to let the package manager know that you’ve installed something else.

I hadn’t seen equivs before, so it was a useful learning exercise. Essentially it lets you map your custom configuration onto multiple apt packages, to prevent it installing them as dependencies for other things. This is important when installing something like auctex, which will otherwise try and install the old tex version — something to avoid.

Getting some work done - editing and previewing LaTeX->pdf

Another challenge is finding a good editor. I’m led to believe there are a few options, and I’ll go through the ones I’ve tried. My previous experience is using TextMate to edit and build, which worked relatively well, and had reasonably good templates, snippets, and documentation integration, so my expectations are high.

Over the coming weeks I’ll be trying out:

  1. TeXlipse, an Eclipse plugin
  2. TODO: AUCTEX, an emacs mode
  3. TODO: vim-latex
  4. TODO: kile

TeXlipse Eclipse plugin

I’m not an Eclipse user unless I’m writing Java (which is rare), and I’ve never seen an Eclipse plugin I liked, so I’m sceptical about this option. Still, it was recommended by my thesis supervisor, and came out well on Wikipedia’s feature comparison so I thought I’d give it a spin. Installation was trivial.

  1. Install Eclipse (I used the Eclipse Indigo from the Ubuntu software centre)
  2. Follow the installation guide installing TeXlipse and the pdf viewer plugin from the same update site. Note that the Eclipse menus have changed, so you’ll have to fight with it a bit, but it’s still pretty easy, and given your goal I’ll assume you can figure it out.

Having done that you’ll be able to start a new LaTeX project in Eclipse. I was impressed by the structure options provided by the setup dialogue. Not essential, but an improvement over the mess in one directory that I’d have had previously. I pushed output pdfs into print/, tex files into src/ and left tmp as it was. That all done I was confronted with the latex perspective, and my empty thesis to fill in. I must say, it was really weird seeing LaTeX source in Eclipse, rather than Java.

Further configuration: http://texlipse.sourceforge.net/manual/configuration.html The viewer will do strange things out of the box; you can use half of the solution found here to get the external viewer preview button working.

Good points

  1. Build management is handled for you
  2. Project file layout configuration is nice to have
  3. Autocompletion is essential for any Eclipse plugin - it’s there.

Bad points

  1. Window management is quite broken; admittedly this is Eclipse’s fault, but dependencies are like that. I’m using Gnome 3 on Ubuntu 11.10 with the Sun java 6 (installed via the hacks mentioned in the Ubuntu Java documentation page). For example, scroll-bars are difficult or impossible to control, and internal windows/panes jump away from you when you try to move them.
  2. Viewing/previewing integration is undocumented and broken out-of-the-box. Very poor user experience compared with the Mac OS experience using Textmate and Skim via synctex, or even Textmate’s internal viewer.

More to follow when I’ve tried using it for a while, and also the other editors.

Filed under tex latex ubuntu

1 note &

All Change - Controlling Elastic Cloud Storage

A few past I found out that my supervisor will be unavailable, and I should change my thesis topic. My feelings on this are a mixture of trepidation and optimism. The time constraints are not ideal, but the new topic is interesting and the supervision situation is much improved.

I’ll now be working (in KTH’s Software and Computer Systems department) on controlling cloud storage elasticity, specifically cost optimisation for horizontally scalable storage, e.g. key-value stores, with machine learning techniques. That is, typically a website will have more visitors during daytime than at night, so at night we can turn off servers to save money - this is complicated by file availability, popularity (load balancing), and file relocation speed and cost.

My starting point is The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements by Trushkowsky et al. We’re interested to see whether their approach can be improved with respect to diurnal (day/night) workload shifts. My belief at present is their spike handling is good, but their recovery to a cheap global state when you have more time available to reorganise can be improved. That said, this needs to be measured, as their system’s heuristic approach have a simple elegance.

Significant parts of this work will be identifying problems addressed by existing work which may be re-framed as machine learning problems, a survey of cloud based storage to find systems providing sufficient control to integrate with the controller, and the construction of a prototype controller for measurement to evaluate its performance vs alternative approaches. It’s still quite preliminary at the moment. I have some ideas, and will get feedback on them next week when presenting a project outline.

It’s good to be working with machine learning again, as it’ll help me decide between that and distributed concurrent programming as areas for my PhD. I have been (provisionally, pending rubber-stamp) accepted on the EMJD-DC programme, and should be starting at UCL, in September. Brussels is a beautiful city, and I’ll be living either there or close by. I’ll need to find some time to brush up on my French in the coming months. It’s been a long time, but I think it’ll be fun, especially when the films and books start making sense.

Filed under thesis emdc

0 notes &

Installing a current Scala version on Ubuntu 11.10

Based on this post and this env documentation.

  1. Download the archive from the scala site (you could probably use the jar file and mv it)
  2. Extract and move into /opt (sudo)
  3. edit /etc/environment to add scala to your path
  4. run

More details in the posts above. You may prefer to link into an existing path directory rather than modify it, but that’s more work and more to uninstall.

Ideally apt-get’s version would be updated. Using this particular env file has the advantage of being system-wide rather than specific to e.g. bash login shells, so will work in other shells (zsh) and with cron.

update - be careful with $vars in the environment file, if they don’t work you’ll no longer be able to log in to your account (and will have to fix the file in a terminal without any working path; good luck finding the full program paths for sudo, cat, and whatnot!).

Filed under scala ubuntu