Archive for the ‘Programming’ Category:


This is why you use git

I wanted distributed version control before their was distributed version control.  When it finally became practical, I looked at the options, and chose Mercurial – because murcurial clearly works like a distributed version control system ought to, while git was… lets say idiosyncratic (because ‘weird looking’ would be rude)

Recently (when we open sourced XenServer) I found myself having to get to grips with git and github.  Over the last year or two I’ve begun to understand why git turns out to be better in many respects for real world version control usecases (except for managing patch queues on Windows – hg still wins there)

Today gave me an example of ‘one of those times’…

My problem was:

I had submitted a patch to a public repository – but it turned out to have problems, and for reasons of time constraints I had to revert it.  Since this was all done in github, I used the very handy github revert feature.  The github repository then progressed somewhat without me.

Meanwhile, i fixed the problems in a local private repo.  Which didn’t pull in the revert from upstream.  So I ended up with branches looking like

Upstream: —–Patch A—-Changes—-Revert Patch A—-More Changes

Local:         —–Patch A—-Fixes for Patch A—-More fixes for Patch A

What I wanted to submit to upstream was Patch A again, with all fixes, rebased to work on top of the existing upstream… all as one big patch

How I managed this was:

Clone the upstream branch into my personal github repository

create a pull request against my clones upstream branch which reverted ‘Revert Patch A’ – and merge it.  Giving

Personal Upstream : —–Patch A—-Changes—-Revert Patch A—-More Changes——-Revert “Revert Patch A”

I then fetched this into my local git repo

I rebased my previous local branch against this branch, leaving me with a branch containing

—–Patch A—-Changes—-Revert Patch A—-More Changes——-Revert “Revert Patch A” —–Fixes for Patch A—-More fixes for Patch A

And finally another interactive rebase let me squash things down to

—–Patch A—-Changes—-Revert Patch A—-More Changes——-Patch A (2nd attempt)

 

This was then sent back up to github so I could issue a pull request to merge to upstream.

 

This certainly isn’t a workflow you would see in a non-distributed vcs – but it also isn’t the sort of workflow you would expect to see with mercurial (which has more of a policy of not changing history – rather than trusting you and giving you the tools to perform patch management)

Opening Up XenServer – Reworking XenIface.sys

When we first began to discuss the specifics of open sourcing XenServer I had to raise my hand and say “No”.  Not because I didn’t want the software to be opened up, but because I knew that one of the modules I look after (XenIface.sys – the XenServer Windows PV Interface driver) had code in it which we wouldn’t be allowed to ship under an open source licence (we could ship the bainaries – we could even give away the source code, but we couldn’t let other people give that code away).  We had to come up with a new plan.  And the plan that was agreed on was the one I suggested:

The interesting bits of XenIface are in files which we unambiguously own.  But there is no chance we will ever get the right to give the rest away.  Nor is there any chance of us having the time to rewrite that code by the date XenServer goes open source.  So what we can do is just put the files we own out into the world, and rewrite the rest in our own time.

This was a dangerous plan.  Specifically because I know how corporations work, and once the buzz of hitting the ‘We’ve open sourced everything’ has passed, people will want new features, not just rewrites of perfectly functional code.  It would be too easy for things to get missed.

The last couple of weeks have seen a lot of excitement.  We opened up (most of) the code.  The XenServer team all gathered in a kitchen at work to drink Champagne and celebrate.  There was also the release of Citrix XenServer 6.2 accompanying the open sourcing.  And (on the Windows team at least) we started our new development processes with all the code changes happening out in the open, going via GitHub.

But behind the scenes, I’ve been getting some programming done too.  Specifically, I managed to find the time (quite where from I’m not sure) to rework XenIface, strip it of the code we don’t own and get it working again.  The way I did this was quite simple:  I ripped the bottom end of the XenVif driver, pulled out all the networking and providing-PDOs-for-NDIS-drivers-to-attach-to code, and then plumbed in the remaining parts of XenIface.

The code isn’t quite ready for public consumption yet but:

It installs, enables, disables and uninstalls nicely

It passes our dev tests (which we use to ensure the WMI and IOCTL interfaces don’t change)

It has passes the HCK successfully (once I found the filter for something I was pretty sure was an HCK bug and persuaded it to install)

 

This is all, primarily, testament to the quality of the XenVif code which was so easy to pull apart and put into the XenIface driver.

Before we put it out in the world, I just want to prove that our internal system testing won’t show up any regressions – but I think we’re likely to be safe, as the changes to the code which provide the functionality have been very minor.

I wasn’t really expecting to see this code finished for another month or two (and was quite scared it would never happen at all).  Instead it looks like it’ll be out the door well ahead of schedule.  And once XenIface is in place, thats the complete driver set needed to let you run the Guest Agent and have the full PV tools optimised Windows guest experience.

 

It would just be nice if it was easy to install.

 

The good news there is that the installer package – which is the final missing part of the PV tools is also close to being releasable.  We didn’t ship that because we didn’t have confidence that the click through licence it displays was the right thing to display.  Well, I’ve finally had word back from legal about what we should be displaying and so there is only a bit of plumbing to do before that can be put up in all of its glory too.

A debugging war story

The bug has been around for most of the last year.  It’s intermittent and a pain to replicate.  We hadn’t heard anything about it from our customers, but our test system threw the errors up when we performed our nightly runs.  But the failures were all slightly different.  And different tests were failing each time.

All we knew were we were blue-screening, and that the problem was something to do with memory corruption.

The first thing to do was look at the crash dumps.  And the good news was that our drivers weren’t on the stack.  The bad news was that more and more testing showed the problem only occurred when our drivers were loaded.  You see, the problem was clearly a memory corruption – but the crash wasn’t happening at the time memory was being corrupted – no, the crash was happening when the memory got freed. You couldn’t tell who corrupted the memory, just that it got corrupted.

We tried driver verifier.  As soon as we put verifier to work on the driver we suspected, all the problems went away.

We did find that close to where the corruptions occured there was often a memory pool allocated with the tag ‘HAL’ – what was interesting about this pool, which looked like being some sort of mapping between addresses and page frame numbers, was that it seemed to have one entry too many – it had overflowed the space allocated for it.  The good news is it wasn’t one of our pools.  The bad news – I was beginning to suspect it was something like a double free of memory which caused this situation to arise.

Because we thought our driver might be causing this, we added all the instrumentation in the world to its memory allocations and frees.  But this didn’t show anything up.  The driver seemed to be working perfectly.

We were close to giving up.

One of our test engineers went through the test logs, and came up with a set of situations most likely to cause the problem.  With a bit of effort he made a reproduction of the bug that could happen in about an hour – much better than the six hour repro we had earlier.  One of the things he found was that the issue mainly happened on Windows 2008 Sp2 32 bit.

We then wen’t through ruling out any number of potential hypotheses.  Everythign from ‘Was a DVD in the drive at the time?’ to ‘Does it only happen at on machines with 2 CPUs’.  Once we had ruled out the impossible, whatever reamined, however unlikely was sure to be the cause.

Unfortunately, we ended up with the same suspicious driver.  And the same lack of clues.

Not knowing where else to look, I tried reproducing the reproduction on a checked build of 2008 sp2.  I didn’t hold out much hope.  We frequently use checked builds in developing our code, and this issue looked like being timing specific – the checked build was going to play havoc with the timing.

I installed the drivers, rebooted, and:

Assertion failed: RaidIsRegionInitialized

OK.  great.  What now?  Google was our friend.  Well, almost.  We found two results.  One was an MSDN page which didn’t mention anything about this.  The other wasn’t clear but had a few lines of hope

You may need to call GetUncachedExtension, even if you’re not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn’t have any DMA restrictions, so Storport probably doesn’t really need the DMA adapter object, which is why everything works without the call.”

http://lists.openfabrics.org/pipermail/nvmewin/2012-March/000075.html

And, as it turned out, we did.  We did need to call GetUncachedExtension, even though there was no reason for us to do so.

One line fixed our storport driver, removed the bugs, fixed everything.

A year of irritating, intermittent, bluescreens gone.  And a good reason to help us understand what, roughly, was happening:  Microsoft Windows was freeing memory which we had never asked it to allocate.  More or less a double free.

Its astounding how often my job ultimately comes down to being a Google Monkey.  But there was a lot of work to lead us to Google.  And some bad luck too – we used checked builds a lot, but – it turns out – not the 2k8 checked build, which was the one that had the assertion.  We only used that this time because 2k8 was part of the repro we found.

But figuring this out is something that our team (and it was absolutely a team effort) can be proud of.

Today is a good day to code.

The Art of Being Invisible

Invisible Man

Recently Citrix commissioned a survey into the public perception of cloud computing and it went ever so slightly viral.  Which was presumably the intent – to get magazines and websites to publish articles which link Citrix with cloud computing, rather than actually to learn anything new about the cloud.  I have nothing against this – Citrix is a company that is a big player in the growing cloud, but anyone who hasn’t noticed this (and many haven’t) probably still consider them to be ‘Those metaframe people’ – so any PR that works is probably a good thing.

What I found out from watching this unfold was:

Not many people writing articles about surveys actually link to the original source

Even when I got to the original source, I wasn’t able to locate the survey people were give, or the responses to those questions – just the results, as digested by the company.  Which means I have absolutely no idea of the context in which to put the results.

Most people who actually reported on the article didn’t seem to care.  They pretty much parroted the press release data.  Again, as I would have expected – that seems to be what tech journalism is all about.  But it would be nice to see more people out there who get some interesting data and actually think about it – and its implications – before writing anything.

And finally, as the survey suggests:  Not many people know what cloud computing is.

Which isn’t a surprise, because it is a made up term which loosely describes a whole bunch of tech industry trends.  In short, I think we can safely say it comes from those vague technical drawings of infrastructure where you might draw a few data centers, each with a bunch of servers and storage inside, then link them by straight lines to a picture of a cloud – often with the words ‘The Internet’ inside to suggest the data centers were connected together via someone else’s infrastructure.  As people are increasingly hosting there technology on someone else’s infrastructure, rather than in bits of a datacenter maintain by company employees we say that technology is in the cloud.

The public don’t know about this.  And frankly they don’t care.

And also they shouldn’t.

My day job is developing a key part of the infrastructure for the cloud.  Without it big parts of what we call the cloud wouldn’t work – or at best would have to work in a very different and less good way.  You will almost certainly have used part of this product in some way today.  And you probably don’t even realise it, or care.  So why don’t I care that no-one knows about the cloud?  Why don’t I wish more people would love my work and sing its praises?

Because, if I do my job well, my work is invisible.  Every time you notice anything about my work, any time you worry that it exists in any way, shape, or form, you’re keeping me up at night because I’m not doing my job well.

I’ll give you an example:  Electricity.  To get electricity there are power stations, huge networks of wires, substations, transformers, all ending up at a plug socket in your house.  You don’t notice these.  You don’t care.  Unless – that is – it all stops working… or perhaps you have some technical problem like trying to run a 110 volt appliance in the UK.  If electricity wasn’t invisible – if we had to ring up and request enough power for our TV set to run, then we would care more – and enjoy our lives a little bit less.

Cloud computing is actually all about making computing into a utility, just like electricity.  It is about not having to worry about where servers are.  It is about not having to worry about where your data is.  Now, some people have to worry about electricity – if you’ve ever set up a data center, you’ll know that you need to start caring about all sorts of issues which don’t worry the home owner.  Similarly, if you work in the IT industry, you’ll have all sorts of worries about aspects of CLoud computing which end users simply shouldn’t ever have to care about.

So if you ask a man in the street about the cloud – he should remain more worried about the sort of cloud which rains on him.  And, to determine how worried he should be, he’ll probably ask Siri on his iPhone.  And not care about how Siri takes his voice input, and uses vast numbers of computers to respond to it with data generated by metrological offices who process big data over vast grids of computers.  He won’t worry about anything which goes in between, and more than he worries about how to charge is iPhone when he gets home.

Consumers already have their heads in the cloud.  They don’t realise it.  and they don’t care.  because they are already used to it.  To them the cloud isn’t anything new, its just how things are these days.  As for companies and programmers – we need to make the cloud less and less obvious, less and less difficult.  One shouldn’t need to think about doing something in the cloud, because that should be the easiest way to do things.  We have to take the blocks of code we put together, and make them blocks which work across the cloud as seamlessly as they currently work across CPU cores.  We need to stop thinking in terms of individual computers and individual locations – and those of us who build the code need to make it easier and easier to do this.

We are already on our way.  But would I want to be the number one clod computing company?  No, I would want to be the number one computing company – because once everyone is in the cloud, the cloud vanishes, and we ar back playing the same game we always played.

 

Rethinking Social Networks : Different Applications

 

Assuming we don’t want to replace Facebook, then we are left with trying to use social networks in other applications.  These need to be applications that lots of people are going to want to use (otherwise the social aspect is useless), which perhaps have a viral way of grabbing people’s attention (using social to sell them) and which fundamentally are made better by being social.

When I’m thinking of the sorts of application which come in and grow big, my first port of call is to see “What are geeks using right now which hasn’t caught on in the mainstream?”  There are two things that currently come to mind:

Bug Tracking Systems

and

Distributed Version Control

Now – clearly neither of these are new ideas (although I was thinking of distributed version control well before it hit the mainstream consciousness – but that’s another story).  And both of these exist to some extent outside of geekdom:  You have a certain level of version control is various word processing systems, and online data storage systems, and ticketing systems of various type exist in various industries (mainly industries with support desks).  So how do we make them different, and make them social.

Bug Tracking:

As I said, ticketing systems are used in many industries.  In my job I have to handle both the customer support ticket system and the internal bug tracking system.  In my time I’ve used quite a few bug tracking systems of various colours.  They have generally common characteristics:

Someone enters a bug into the system (we could generalise this as ‘someone enters a thing for you to do into the system’).  This raises a ticket.

They assign the ticket to the person they think is responsible

This person is made aware of the issue by an email arriving.

If they don’t think they are the person responsible, they pass the issue on to someone else (and that person gets an email)

 

Better systems let you say things like:

This particular tasks consists of multiple subtasks

and

Before I can work on this particular task, someone else must complete another task.

 

This is starting to look like a general ‘to do’ system.  Indeed, I’m astonished when I hear that most companies don’t use a system like this to manage their projects, and keep track of things that have to be done, and when they are to be done by.  That  also suggests to me that, given a more friendly user interface, we might be onto a winner.

So we’ll start with a single user ‘to do’ program.  They can enter tasks, and mark them as done.  They can also break them down into subtasks, and put dependencies between tasks.  All that requires is a friendly UI to make everything clear.  There are good examples on the net.

Now, lets take a leaf out of tripit’s book.  When you sign up to the todo apps site, you get an email address.  You can forward any email you get assigning you things to do to that address, and it will get turned into a todo within the system (which may well be a todo along the lines of ‘TODO: Generate todo tasks from this email”).  The first social aspect is that each task will be associated with the original email – which means you can send an automated email back saying something like ‘I’ve identified the following tasks from your email – if you want to see that I’m keeping up with them, please go to this web page’.  Moreover you could only allow someone with the email address you originally identified to log in to that site (using email based authentication)

We can go further.  What if you generated a task which someone else had to do.  Now, its pretty bad form to say ‘here is something you’ve got to do, which I’ve already put into a task tracking system’) – but you could add a task ‘wait for a response from this person’ and send them a querying email from your to do system.  Moreover, if that person is already using the bug tracking system, the email could be automatically redirected to their todo list box – which would mean they would have a task that you could monitor.  If they are not using the system, well, every email you send will have an advert encouraging them to give it a try.

Monetisation could come from apps (see passim), enterprise subscriptions (will walled gardens that won’t stay in someone’s account once they leave the company and mass email subscriptions), or premium subscriptions.

 

Distributed Version Control

The concept here is there exists some sort of document (or set of documents), wherein each person can have a copy and make their own changes, then pass the document on to someone else who already has a copy of the document, who can decide if they want to accept some or all of your changes into their copy.  There isn’t one true central copy.  Also, you can go back in time and see how the document has changed.

We geeks use it to keep track of our source code.

But in the real world it would seem great for managing that big collection of stuff you have to keep track of for a project (or, on a smaller scale, for a meeting)

I see it as being like this:

I have some sort of application where I can store various pieces of text, photos, lists, links to web pages, other documents etc, and keep them all together in one place.  In the old days that place would have been a file, these days it would be a web site somewhere.  Now, I might want to let someone else look at this collection of documents, while I might want to let others edit it.  Easy – I just set it to email them links to the document – now all those people have accounts where they can take a copy of the document, and where they can edit their own copy to their hearts content - and some will also have the ability to see what changes I’ve made since – and fewer still will have the ability to suggest I accept some of their changes (it would be something along the lines of ‘Show Ben These Changes’ in the ui.  To us Geeks, I’m talking about a github pull request)

Again, it is fundamentally social – and all the app I’m describing needs to actually be is something akin to a wiki – or HyperCard.

 

Go with either of these ideas, and you have the potential to exploit the still underexploited social arena

Rethinking Social Networks : The App.Net move

Social Network

Social Networks are high in people’s minds right now.  Twitter is annoying its developers, trying to become an island rather than the convenient platform it used to be.  Facebook is a mess, a jumble of confusing options, an unfriendly interface, and adverts jumping out at every corner – it reminds me more of the pre-Google Altavista than anything else.  And there is reaction to this.  The Diaspora project seems to have gone nowhere, but newcomer App.Net has hit a kickstarter target – and, by getting enough people to make a cash commitment has become interesting.

App.Net makes two points:

  • At the moment, the customers of social networking sites are not the users, but the advertisers.  So long as the users are tied in, they will remain, and their eyeballs will be able to be exchanged for the contents of advertisers wallets.  A social network designed for users needs to be funded by the users – they need to be the customers
  • What makes a social network work is when it ceases to be a website and becomes a platform

Its worth describing two geek fallacies before we continue:

Fallacy 1:  Any good internet project is distributed in nature.

This is the flaw of Diaspora.  Geeks love us some hard distributed systems problems, but the take away from the user the simplicity of going to a single place – the same place as everyone else – to get what they want.  Distributed technologies such as social media require people to provide servers – but these servers have to be paid for, so people will charge.  Charging isn’t too bad, except any such server must, by its nature be a commodity, there is little room for differentiation.  It is hard to see why anyone would want to get into this game – see the decline of usenet servers as an example.

Fallacy 2: It is all about the platform

UIs are for wusses.  What matters is the clever technology underneath.  This is both true, and false.  What matters to must users is that users get the features they are looking for – it doesn’t matter if the backend has some hyper-clever architecture or runs in Spectrum BASIC if it does the job and keeps out of the way.  Geeks think differently – they want to know that their lives are going to remain easy as they interact with the system over time, so they design platforms which you can build good products on top of, but don’t care that much about the product.  I fear this might be what app.net are doing.  I hope I’m proven wrong.

Where app.net have been clever is in using Kickstarter for some cash.  Not because they needed the cash (if you can convince that number of individuals to pony up $50, you can probably convince some investors to do likewise).  Getting the cash gave app.net some publicity, because Kickstarter is hot right now, and social networks are causing consternation – and for a social network to get going, it needs publicity.  But it also got a number of people to tie themselves into the service – and the sort of people who would fund a new social network are early adopters, the thought leaders in the social sphere, and this could be very important to app.net’s growth.

But it could be more important to the people who paid for the developers licence.

Right now, if I wanted to try something new and interesting in the social world, I would seriously consider tying it in with app.net – because its a small market of exactly the sort of people you want playing with your fresh idea.

I don’t think there is anything special about app.net in itself, but I expect it to be a breeding ground for interesting social graph based applications.  So in app.net’s case, perhaps by building the platform, they are doing the right thing, even if it isn’t the right thing for them.

Incidentally, I have a number of thoughts about the next moves that could be made in social networking – I’ll be writing about them over the next few days.

Turn on, tune in, log out

British biometric passport

I hate logging into websites.  And I’m a master of it.  After a number of hassles with external websites revealing my passwords to the world, I now have a lastpass setup managing my login credentials and a google authenticator to keep my google identity (which is probably more important than my birth certificate) extra-secure.  And does this work?  Well, most of the time, but the other day I got logged out of a banking account and had to reregister because I got my details wrong 3 times in a row.  Was it my fault?  Well, maybe, but I don’t for the life of me know what exactly I did wrong  - or how I could avoid doing the same things wrong in the future.

I also hate writing login mechanisms for websites.  They always seem overly complicated – the sort of thing you wish could just be done better by someone else.

My first experience of a half-decent login system was with tripit.com.  With tripit, all I had to do was start forwarding emails to their email address.  I got an email back telling me I could go to a magic URL, where all my details were awaiting me.  But then it asked me to set up a password to secure it all.

On most websites, it doesn’t matter who I am.  It only matters that I am the same person I was last time I came to the website.  Why should I have to log in?  In fact, a lot of web sites recognise me and log me in automatically if I’ve been there fairly recently

So, Idea 1:  Instead of making me sign up to a website, just assign me a cookie, and then whenever I come back in – in a day, a week, or a decade, read that cookie, and log me back in.

This is simple.  It is session management.  Website writers are going to have to do session management anyway, so why not just have two sessions – a short term ‘is currently doing a particular thing’ session and a long term ‘is still the same person’ session.

There seem to be two big problems here:  Security and Multiple Browsers

The security problem is that cookies remain the same and are passed in plaintext.  This can be resolved, in part, by making all connections via https.  All the evidence seems to suggest https-only is the way to go for lots of reasons, so this is no big loss.  As for cookies remaining the same – well, the website can always decide to change your cookie on its own schedule (so long as you log in).  And because the cookie will be website generated, it will only ever work for that website – and if that website gets hacked, well, they probably have access to everything you stored on that site anyway.  No no huge risk, so far as I can see.  (we could probably do more complicated things like making the cookie you provide a public key, and making session initiation a challenge response procedure, but that probably isn’t needed)

The multiple browser issue is harder.  If I want to access the same account from two browsers, the cookie generated account issue is a problem.  Different cookies will be generated for each browser – meaning you are a different person at work and home.  There are three ways around this:

1 – add the ability to set a username and password to your account once you have logged in.  Let people associate different browsers with the same account by then loggin in with the username and password.

2 – add the ability to associate an email address with your account.  You can then request the site emails you with a way to log you in on other machines.  Since you’ll need to do this if you have the username password mechanism, you’ll proably have to do this anyway

3 – make the problem your browsers problem.  Come up with an way of identifying the cookies as shareable, then let your browsers choose a central location to allow them to be synched.  This is ideal as it also allows browsers to let you switch between multiple accounts with the same site.

So – this seems quite simple, but it seems you still need to write code to register an email address with your site (at least until all the browsers come up with a solution to synch cookies).  My suggestion here is:

Idea 2: someone needs to generate a web service which will manage the storing of email addresses and associating them with an internal representation of user ids.  All your site would have to do for registration purposes is provide a form which sent the email address and internal id to said service.  When a user wanted to receive a log in email, you would have to provide a simple request whereby you provided the email and your sites id, and got back the internal id.  You would then look up the id, then generate an email which would lead the user to a page which would allow them to access their long term session cookie.

some bonus ideas:

Idea 3: if I send an email from my valid account to ‘logmein@whatevermywebsiteis.com’, it should, by return post, send me an email that would log me in – meaning I wouldn’t have to do any setup work’

Idea 4: You could happily sign up with multiple email addresses.  The site should only send to the email address you request.

Idea 5: If you lose control of an email address, you may need to revoke it.  The best way to manage this would be to allow you to revoke all email addresses, then let you assign them again one by one.

xoxco makes a similar argument (which didn’t so much inspire this piece, as made me think there was probably something up the tree worth barking at)

Why you should develop your web app in public

For mortals – those of us not gifted with insane levels of insight about how other people work – the process of design goes something like this:

Find out about a problem

Figure out a way to solve that problem

Come up with a suggested way of solving the problem

Show the suggested solution to someone who has the problem

Listen to what they have to say about it

Change your understanding of the problem

Iterate.

 

This is true when it comes to designing the next ubercool widget, and its true when designing the stodgiest piece of business management software.  It is true when designing web apps.  Unless you have a huge usability lab and can fund focus groups, your best way of testing web app ideas is to get them out there, in front of people, and see what they think, so that you can iterate and improve the design.

I’ve been thinking about this for a while, and came to the conclusion that it might be a good idea to do all of your web app development in public – not just making the app available online, but also the source code.  It was an idea which gnawed away at the back of my skull, not fully formed, until I read the following article http://techblog.netflix.com/2012/07/open-source-at-netflix-by-ruslan.html and specifically the following quote:

We’ve observed that the peer pressure from “Social Coding” has driven engineers to make sure code is clean and well structured, documentation is useful and up to date.  What we’ve learned is that a component may be “Good enough for running in production, but not good enough for Github”.

I want to explore wether the benefits of opening your web app code to the world outweigh the disadvantages.

Lets start by remembering that we have had this argument before.  Back in the late nineties we were discussing wether Open Source code was the way forward.  Gradually that argument has matured, and a world without open source software would be unrecognisable – a probably significantly behind our current world.  So there is social benefit to some forms of open code – by releasing your code, and playing your part in the open source community, you are doing a good thing, and moving technology forwards.

Lets also consider that, in the case of web apps, sharing your code is not a common or usual behaviour.  There must be a reason for this (even if the reason is wrong, or out of date)  Perhaps, in looking for these reasons, we can understand why, in the net app space, people are less inclined to share.

The arguments I can think of stand as follows:

I want to get the full benefit of the code I have written.  If I were to share my code, others would be able to compete with me based on what I have written.  I might lose out to someone who hasn’t written the code.

Sharing the code, packaging and documenting the code, is too difficult or labour intensive.  It isn’t what I want to spend my time doing.

I don’t want to expose my code to the world, only the functionality, because I am scared of being judged on the basis of my code.

Opening the code might allow people to spot security holes

When writing an article such as this, it is always a worry that I am creating straw men to knock down – so I would be pleased to learn about other arguments against opening up code to the world – please mail me

The NetFlix quote adresses the argument about judgement.  By exposing your code to the world, you are forced to make the sorts of decisions about your code you would generally only make in the face of peer review.  It will lead to better, more readable, more maintainable code.  This is a good thing.  If the problem is not the quality of your code, but rather the fear of public ridicule, then I suggest that posting and being damned is absolutely the best way of getting over this.

The question of security holes is similar. We know that security by obscurity is ineffective against a dedicated attacker – all you are doing by not publishing your code is giving yourself a false sense of confidence.  By opening the code, you not only open it to potential attackers, you also open it to other people who may wish to use your code, and who may spot the flaws and help you correct them.  Open Source software has a deserved reputation for addressing security issues well – there is no reason why open web apps should not do the same.

Sharing and packaging the code is too difficult.  A while ago, I would have agreed.  But now we have github.  While github is far from perfect, all you have to do is keep a copy of your code there.  I don’t make any suggestion that you need to make your code easy for other people to use – just that you make it available.  If other people care about packaging your code, documenting it, making it nice – let them.  Thats their work, not yours.  You don’t have to become a community leader, you just have to keep on doing what you enjoy doing.

The final one of my arguments against opening your code is that you don’t want anyone else to benefit from your work.  To this I might make a few comments:

1.  You must be kidding.  Odds are your site is running on an open source language on an open source operating system.  You’re using web browsers (the majority of which are open source these days) to let people get to your site – over the internet (a technology which has had a huge amount of development form other people).  Your’re probably using open source libraries and open source databases and web servers. You are absolutely standing on the shoulders of giants.  Is you’re shitty first draft web app really that difficult to come up with, in the big scheme of things.

2. You’re not kidding?  Right, well in that case, consider that the value of a web site is more than just the value of the code.  It is also the value of the design, the graphics, the quality of the  site’s dev ops and the community which use it.  You have the opportunity to succeed in all these areas.  If you don’t win in these areas, you probably don’t deserve to win

3. Still not convinced?  Copying the first draft of a web app is cheap.  Especially since web apps don’t generally do things which are particularly complicated – the thing that gives them value is the idea, and the way it is made available to the user (the design).  Copying design and idea – then writing code to fit is a lot easier than writing the code from scratch.  Just by putting your web app out there you are making it easy to copy – especially from the big boys you are, presumably trying to disrupt.  When you introduce a new idea, the hope is you get big before Google or Facebook notice you have disrupted anything.

4. First mover advantage works.  In the open source world we don’t often see major forks of code – and when we do, it is normally because people want to do something significantly different with the code.  This is less clear in the web app space, but let me point out an example:  Wikipedia.  Right now, I can download all the data.  I can download mediawiki. I can set up my own wikipedia.  But I won’t beat wikipedia.  Because they have the community.  Because they were there first.

5. You might benefit from other people’s code.  If they are using your code, and making changes to it, then you get the benefit of those changes.  If you want to be sure of this, release your code under an affero license.  You might also benefit from someone using your code.  Lets consider Google – lets say they decide to compete with you.  Unless you’ve got the community sewn up, you’re screwed, wether they copy your code or write their own version.  However, if your code is available, why the hell wouldn’t they try to use your code, and bring you onboard – sure it might not be the megabucks you make from creating the next big web app, but its an income based on your work, your love and your passion.

I really don’t see anything except for upsides when it comes to releasing the code which runs your web app.  All you are doing is adding to the ecosystem you draw from.  And in that sense, not only is sharing your code a logical imperative, it is a moral imperative too.

(Random question:  Why can I not fork GitHub?  It really seems like the sort of thing I ought to be able to do)

Building An Idea Assembly Line

When we look at the history of technology successes, we are reminded time and time again they come from incubator areas.  Not from the artificial technology incubators that VCs might set up to house new start up companies, but from regions which are good at incubating companies.

What these regions have are some combination of:

Universities

Big Technology Companies

Research Labs

Now – the advantage of having these sorts of institutions are they put lots of bright people together, playing with technology.  And specifically, the people they put together playing, don’t have to be entrepreneurs – they are getting paid some sort of wage (good or bad) to come up with new things.

The typical VC approach is to wait until there are companies formed by the few of these people brave enough to sacrifice their working wage and go it alone.  The hope is that the best ideas have risen to the top, and have been picked up by people capable of running companies.

But, if you talk to a VC, a common issue which causes them not to back a start up, is the poor quality of the team, not the quality of the idea.  And many many companies wind up pivoting to follow a different idea from the reason they got together.  You need a combination of good ideas and good people throughout the start up’s life.  You need the idea generators to stay in the company.  And you need a strong business team to take on the best ideas.

I wonder if the solution is to come up with an artificial equivalent of the research lab in order to breed ideas, and form teams.  I call this the Idea Factory.

Imagine an office (perhaps a big, brightly coloured open plan office – or perhaps something different) where some number of bright people are employed.  Their job is to have ideas, to prototype them, and to demonstrate them to one another. They also spend some time on building reusable frameworks to make prototyping new ideas easier and easier.

The time spent on prototyping ideas should be short (perhaps measured in days), and we should be quick to drop ideas.  People should feel free to provide support to each other, in taking the ideas they like and adding to them, or improving on them.  Forking ideas, and using their own skills, or even just making suggestions for other people to fork.

Over time, teams would grow, and some ideas would rise above others.  These ideas can then be shown to the world.  At conferences.  In papers. On YouTube. Or by going live on the web.  The question becomes ‘what is the minimum needed to get this idea out there?”

And this is where the VCs come in.  VCs already know people who are good at forming businesses, people who know what to do next.  They can take a team with a good idea and match them up with the business skills they need to move one step on.

Now – the important point here is that, all the time people are working in the idea factory, they are being paid.  You want people in the factory with a range of experiences – from fresh hungry graduates, through to the world weary sorts who have seen everything and know how things really work.  So there are going to be a range of salaries. Perhaps, because the work environment is unusual, you might be able to get away with offering a lower salary then the market would usually require. The question of salaries is where the VCs take a risk – how much time and money will these people need per idea?  By the time the idea is being fitted out into a standalone start up, the risk should be much reduced, and the VCs should be happy about getting a higher rate of return.  [Also, one presumes that an idea factory would be a good source of patents, if one of the members were to be a patant lawyer]

Idea Factories might be the way to inspire entrepreneurial growth in towns currently lacking it, or for a small group of VCs to monopolise start ups (and get a better share of the equity then they might otherwise manage)

But wait – Idea Factories might not just be good for VCs.  Consider your big company – not quite a company the size of Google, but the sort of tech company which regularly takes over large convention centres to support their customer base.  These companies often need new ideas.  They could set up internal idea factories along the same lines, getting people to play with the sort of technologies they are interested in.  It would lower the risk of disruption, and – even if all the ideas came to nought – give them a way of showing they support innovation.

This is an idea which I think – based on rough estimates -  has legs and is worthy of further investigation.  Please contact me if you think you might want to play a part in bringing an Idea Factory to life – either as a venture investor, or within your company, because I would very much like to help.

A Sandbox In Every Walled Garden

The Apple wold seems to be throwing a wobbly at the announcement that sandboxing will be required for all apps in the mac os app store.  People are discussing what might be a better solution, and if we might eventually only ever be able to install applications from the walled garden of the apple store .

It all gave me a sense of Deja Vu.  Weren’t we having the same thoughts over in Microsoft land a few months ago?  Are Apple developers really that out of touch about what is going on on other platforms that they haven’t noticed the parallels?  Apparently yes – I hav’t seen the words Metro or WinRT mentioned in this discussion.  Which is odd, because surely how the competition is trying to solve the same problem – and going down the paths which have Apple devs so up in arms – can feed into their strategy for how to approach our brave new world.

So, herewith, a cheatsheet aimed at showing the parallels:

What Apple Have Announced?

Mac OS apps for the Mac Os App store will have to implement sandboxing – which is to say, they will have to list a set of capabilities that their app requires, and then not make any calls which require capabilities they have not listed.  It appears (from people saying that this currently buggy and affects AppleScript) that this is enforced at runtime.

What Apple Have Not Announced That People Are Scared Of?

It might seem only a short way from there to declaring that the app store becomes the only way to install apps on your Mac.  And only a short way from there to giving Apple the chance to have a kill switch on every Mac Os application.

What People Have Suggested As Alternatives?

Certification.  Specifically having per developer certificates signed by Apple, so that if someone does something bad, Apple can revoke their trust in the developer certificate.  And ditching the whole sandboxing idea.

What Have Microsoft Announced?

If you want to use the new Metro UI, you can only use a subset of Win32 calls alongside calls to the new WinRT runtime.  Furthermore, you must specify a set of capabilities your app will be using at compile time.

The only way to install your apps will be via the new Windows App Store (this isn’t strictly true… there is a way to install developer signed apps on a developers own machines, and we are expecting to hear a way for enterprises to install apps which will presumably be more than just the App Store)

To get your apps into the app store, your app will have to pass a set of tests.  Microsoft will run these tests, but they also provide them to developers, so that developers know if they will pass.  Once MS have validated you pass the tests, MS will sign your app and put it in the app store.

One of the tests that MS will provide is to ensure you make no calls which are not allowed by the set of capabilities you have requested.  In short, the sandboxing is done prior to signing, rather than at runtime.  (This has some security issues, specifically in the area of self modifying code.  I presume MS plan to handle this via legal and social means, rather than technical)

Oh, and all your old Win32 Apps will continue to run unaffected – but not via Metro.  You will even be able to install Win32 apps via the App Store.

Have Microsoft Ever Done Anything Like This Before?

Yes.  We’ve had driver signing for years – each driver type has its own set of functions it is allowed to call, and there are any number of testing hoops you have to jump through in order to pick up a signature.  Just check my twitter stream to see how much pain WHQL causes me every so often.

This means Microsoft have experience of the real world implications of trying to manage a certification and signing scheme.  The main implications being “for every rule we lay down, there are exceptions”  generally many exceptions – there seem to be as many special cases as there are drivers.  Half of the fun of passing WHQL is convincingMicrosoft that a set of rules they require drivers to obey are wrong, or insufficient, or just plain shouldn’t apply to your driver.  The good thing is that Microsoft can usually be convinced.  Eventually.

Now, I’ve no idea if Microsoft’s driver signing experience will feed into their App signing experience, but there are enough similarities between the processes for me to guess there has been some communication beween departments on this issue.

Are Microsoft Doing Anything That Seems Wrong?

The biggest problem seems to be requiring that things are signed.  Because once you require apps to be signed, you need to sign every script you run (or just sign scripting languages – in which case you’ve lost most of the security you were aiming for).  It looks like the solution to this involves dev certificates which so far are only available via Visual Studio.  So all development will involve Visual Studio in one way or another.  (Incidentally, PowerShell has had signed scripts since day one – maybe there is some intention to integrate that architecture – but I don’t see a straight forward path).  It may be that all scripting will stay on the Win32 side of the fence.

Is there anything Apple could learn from Microsoft?

Firstly, MS are allowing old apps to continue to work with no changes.  There is no Win32 walled garden.  All the changes are only for people who want to use the new WinRT hotness.  Now, we’ve no idea if anybody will want to use WinRT, but MS do seem to be providing us with a world where people get to make the choice between two different environments.

MS are also allowing the same of old-style apps via their app store.  It seems that this will be more ‘providing a link to your companies website’ and less ‘a full integrated install experience’ for Win32 apps.  As far as I can see, its a way MS can make money while still saying ‘do this at your own risk’.  I’m guessing here that anything distributed this way may have to be an MSI – if so, you might just be giving the app store the ability to uninstall apps which turn out to be dangerous.

MS realise that there are exceptions, that app stores and enterprises won’t mix (think bespoke software), that admins have to have some control over what users install, that perhaps some software won’t fit into the model they are testing for.  Apple have always wanted to provide the user with the best experience, whereas MS are more about providing the developer with the best way to ship their software.  Apple is about fitting in around how Apple work, whereas MS is more about MS fitting in around how your application works – and we see this with the attitude towards signtime vs runtime tests for sandboxing. With Metro, MS is trying to learn from Apple, Apple could probably stand to learn a few things from MS too.

Are there any other thoughts

Moving to OS X bought Apple a whole load of developers who wanted Unixy tools on a reliable machine with a nice UI.  OS X comes with many many scripting languages which are able to access the core of the system and do everything a compiled program can do.  Do we honestly think that apple are going to restrict those scripting languages so that scripts can no longer access the system?  That one move would cause a major rupture in the dev community and harm Apple significantly.  MS can get away with it (if you want to develop for Metro, use Iron Python on top of the CLR and you’re happy – if you want to run a script, theres Win32), but without a new hotness to tie all these changes to, Apple would just be taking developers favourite toys way – and suffering the tantrums that follow.

Of course, most mac users don’t know or care abou what a scripting language is.  These will be the people who use the app store.  Just like Itunes makes it easier to get music (so fewer and fewer people bother buying CDs and ripping them to fill their iPod), the App store makes it easier to get your apps – your average user won’t consider getting apps any other way.  There is no need to restrict the techie few that the Mac software ecology depends upon.

And is Signing the answer?

Signing isn’t a flawless solution to all your problems – assume you have a killer app your system depends upon – lets say “Photoshop” for the mac.  Assume the manufacturers of Photoshop were to bring out another piece of software Apple didn’t like (I’ll call it ‘flush’).  If apple wanted to revoke the signature for flush, they could either revoke the signature on every release of flush ever made (and on the new applications ‘flish’ and ‘flosh’ that might be submitted thereafter), or they could revoke the developer’s signature and loose their killer app (and annoy many customers in the process).

Signing also requires that certificate lists are kept up to date on every system involved (or that you have reliable internet connectivity all the time)

But signing does allow for technical, legal and social means of deciding which apps to allow to run.

Most notably though – signing is really really irritating to have to do all the time – especially if you’re scripting.  I can’t see it as a real solution to the problem if you want to keep developers hanging around.  What you need is to just let people write their scripts and get on with using their machines…  By all means make developers go through some sort of hoop once to be able to script and install their own software (lets say by joining a group, or turning off a particular feature of their user account), but don’t come up with a technical solution that will only irritate.

© Ben.Cha.lmers.co.uk
CyberChimps