Visualizing Data

There is plenty of data in the world. Mostly we are unable to process it accurately, and we rely on intuitive judgments to draw meaning from it. Hans Rosling, the Professor of International Health, Karolinska Institutet, Stockholm, has come up with new techniques of data visualization over the past few years to enable us to draw more accurate information from large data sets.

Hans Rosling developed the techniques to interpret world health and economic data. He draws conclusions that are not obvious from a casual observation. He provided a remarkable presentation of his techniques at the international Technology, Entertainment, Design (TED) conference in 2006 and followed up with another in 2007.

Rosling sold the Trendalyzer software to Google in March 2007. Google have indicated that the software (or a version of it) will be freely available, but they have not announced what they plan to do with it.

For me, the business significance is the potential benefit of not just the statistical analysis of data, but the visualization of it. The human brain is designed to process vast amounts (in digital terms) of data through visual pattern recognition. Statistical software can process the data and draw all the same conclusions, but not in a way we can easily understand. Visualization software provides the link.

Here are a few companies in this area:

One of the principle lessons Hans Rosling gives is that we don’t need more data or better analysis. We need better presentation of the data so we can make more sense of it. That is worth bearing in mind if you are considering a Business Intelligence project.

Power supply for travellers

Here’s a really useful device for travellers. We have probably just grown used to travelling with several chargers for different devices like the laptop, the mobile and the iPod. The iGo charger means you can carry just one.

You buy the charger you need, then Power Tips for the devices you want to charge.

  • You can buy one charger for AC and one for the car
  • The top of the range charger will handle your laptop as well as all your other devices
  • Works with international AC power supplies
  • You can buy Power Tips for just about everything you could need.

Have a look at iGo.

PC’s and Cars

Following the analogy of the PC and the car, the next stage of evolution for the car will be to achieve a form of distributed management.

The aims of distributed management will be to:

  1. Make cars more reliable, safer, more secure.
  2. Handle the conflicting interactions between cars sharing the same network.
  3. Provide a degree of social control.

For the user:

  • SMS message if your car moves when it is outside your own mobile phone radio cell.
  • Automatic collision avoidance between two cars in the same vicinity. If another car is approaching, the cars are automatically alerted and take avoidance measure.
  • Auto alert for parts failure.
  • Journey report and analysis.

For the network

  • Auto-streaming. You join a busy segment of the network and your car comes under auto-control, regulating the speed and distance until you leave the segment. This allows many more cars to use the segment at a much faster average speed.
  • Auto-junction. You approach a junction. Your car obtains a ticket to cross. You are allowed to cross in the most balanced effective pattern. Your speed and direction are controlled for the duration of the crossing.
  • Space-finder. Automatic parking space identification, first-in priority and guidance to space.
  • Auto-pay for parking spaces, toll roads and controlled zones.

Social control

  • Your car will not start without a valid license and insurance.
  • Trucks are disabled from entering residential zones at night.
  • Cars receive penalties for stopping in controlled zones.
  • Monthly emission allowance calculated on actual emissions.
  • Accident and near-miss investigations. Authorities study data to identify dangerous driving.
  • Dangerous drivers disabled from driving any vehicle.
  • Auto-enforcement of traffic directions (no right turn, one way).

For network optimisation, you would need to achieve a critical mass where enough vehicles were capable of using the system for it to be worth having the system. The best way to do this would be to provide some sort of independent benefit irrespective of other users. For example:

  • congested roads and bridges open at peak time only to networked vehicles
  • traffic lights change at clear junctions for networked vehicles.

Any more ideas?

Mainframes and trains

Trains are like the old mainframes. They should be more efficient than a car, but in practice they aren’t.

Like a PC, the car has the advantage over the train of:

  • distributed, not central, control
  • variety of type to suit different and unknowable needs
  • many competing suppliers.

Cars share a road system with only a few enforced rules. They interact with each other with a complexity and subtlety that no designer could possibly foresee. They use the road infrastructure much more intensively than trains can use rail. Overall, although they are obviously a sub-optimal way to travel, they beat the train for most people most of the time.

Like a mainframe, a train has the advantage in a few situations:

  • when the load is very heavy
  • for predictable, repeated loads
  • when you are not paying for it.

Transport engineers are like IT people. They would much rather everyone used the train. But in practice they spend their time dealing with the unpredictable world of the car.

IT Support

Have you ever noticed that the more senior people in IT don’t like dealing directly with end users? It’s because users can be so difficult to deal with and don’t fit into the neat patterns that IT should have. Every single person in IT should be on user support for a day.

You need real social skills to be a good support guy. You don’t need a lot of technical skill. The user often doesn’t know the difference if you tell them the motherboard needs to be replaced to fix a problem in Word. In fact they think you are a genius for figuring that out. Every single internet problem can be blamed on the firewall.

There is also a strange symbiotic relationship. A lot of users are bored and frustrated by their work. An IT problem creates a welcome distraction that is not their fault. It suits the engineer as it keeps the call rate up and keeps him employed. Removing the cause of these problems is a bad idea, as it would make him unemployed. Doctors need patients. Policemen need criminals. IT support people need users with problems.

Quick Transit Gloria Mundi

Quick Transit is a product from Transitive Corporation that recently won the 2007 European ICT Grand Prize for Innovation. The product stems from research work at Manchester University, UK, going back more than ten years under Alasdair Rawsthorne, now CTO.

Quick Transit is a technology for virtualizing an OS by translating one set of Operating System (OS) calls to another. This means that you can run an application or service designed for one OS on hardware that does not run that OS. How does that help?

In summary, it gives vendors a way to help you migrate legacy applications or services to new hardware running a different OS. You could take your clunky old UNIX application and put it on your new standard hardware running Linux or a Linux virtual machine.

VMWare is probably the best-known platform for virtualization. VMWare allows you to run multiple instances of an OS on one server. You can run a Windows, Linux or Netware guest OS on a Windows or VMWare host OS. This means you can consolidate your current servers onto fewer larger servers. You can preserve application isolation by running Windows or Linux applications in a dedicated OS but on shared hardware. You can run Linux applications on a Windows server. You can bring virtual servers into play when you need them and store them when you don’t. As a vendor you can distribute your software in a virtual appliance complete with OS, ready to run on a VMWare host.

VMWare works by virtualizing the hardware device drivers. So instead of interacting with the real network card, for example, the guest OS interacts with a standard virtual card that then interacts with the real physical card through the host OS. So you achieve all sorts of versatility, but all on x86 hardware and OS. What you can’t do is run a non-x86 OS on VMWare, because they don’t use hardware the same way. So, for example:

  • you can’t run a PowerPC, SPARC, MIPS or PA-RISC OS on VMWare, and
  • you can’t run VMWare on PowerPC, SPARC, MIPS or PA-RISC hardware.

This rules out whole sectors of consolidation. For example, if you have a legacy application that runs on a Sun Solaris SPARC OS, you can’t run it on your VMWare infrastructure. You either have to port the application to another OS; or migrate away from the application; or continue to use SPARC hardware.

Quick Transit is the product that could solve this problem. It works by translating OS calls from one to another. This means that you could run your legacy application on Quick Transit, running on a Linux virtual machine. Transitive claim little reduction in performance in the translation, through the use of caching and optimization. The technology is explained here. Does it work?

  • Apple have implemented Quick Transit, called "Rosetta", as the technology in their new Mac OS X to enable older Mac PowerPC applications to run on their newer Intel x86 hardware.
  • Intel have co-developed a version to enable applications designed for Solaris SPARC to run on their x86, x64 and Itanium processors running Linux. For Intel, it provides a way to move customers to their platforms, and in particular to develop the new Itanium platform, which so far lacks applications developed for it.
  • IBM have announced that they will use Quick Transit to enable x86 Linux applications to run on their System p systems.
  • Sun have recently announced a version to enable applications designed for Solaris SPARC to run on Solaris x86.

So the technology evidently works and is being adopted. What are the limitations?

  1. There are no implementations of Quick Transit running on Windows, or Windows applications on Quick Transit. All translations at the moment are on Unix/Linux variations.
  2. You have the OS working in memory as well as the QuickTransit cache, so you would want a lot of memory. If you want to use it in multiple virtual machines in practice it seems best suited for the newer Itanium platform addressing more memory more effectively.

So in summary:

  • VMWare is for consolidating Windows and Linux applications onto fewer servers
  • Quick is Transit for migrating legacy applications onto any flavour of Linux on any hardware.

SharePoint or Wiki

A lot of information exists in small chunks. Collaboration could be said to be putting small chunks of information from different people together to create new information. E-mail, SMS and Instant Messaging reflect the power of the small chunk of information, or "message". Microsoft Office SharePoint Server is one way to handle this in the large structured organisation, but third-party wikis and blogs are a more cost-effective way in the less structured organisation.

  • What are you doing tonight?
  • Nothing, why?
  • Do you want to see a movie?
  • What’s on?
  • Bloodfest.
  • OK, let’s go. What time is it?
  • 8:30. I’ll book.
  • See you there.
  • Roger.
  • Yes?
  • No, roger, Roger.
  • Oh!

This is clearly not the same sort of information that you put into a Word document, an Excel spreadsheet or a Powerpoint presentation. I think there is a case that it is more important. In most interviews with important people I read they tell me either that they don’t use computers, or they love their Blackberry. The Word document is a kind of bureaucratic necessity. When did your FD or CEO last produce a Word document themselves?

Microsoft tried to structure messages into folders, in Exchange. But it didn’t really work and now public folders are gone and replaced by SharePoint. SharePoint Server offers a window into the conventional document store but also a place for less structured communication, with personal web sites, blogs and wikis. With SharePoint Server Microsoft are trying to turn their dominant Document franchise into a more informal Collaboration franchise.

The problem with Office SharePoint Server is that you need a license for Windows Server, plus a license for SharePoint Server, plus a license for Office if that’s the editor you use. Microsoft want to bind more informal methods of communication into a sort of Platform license.

That’s fine for large structured organisations like multi-nationals and the public sector where they can use an Enterprise volume license agreement. But it is a lot of overhead for a distributed informal organisation.

The alternative is to put up an industrial strength wiki like Confluence from Atlassian, and a blog server like Movable Type. You have an unlimited user license in the enterprise and you don’t need an Office, a Windows or a SharePoint Server license to use it.

SharePoint and CALs

I am not sure how much further Microsoft can go with the Client Access License (CAL) concept as the way of charging per user for services.
Of course they are very clever people, and I am sure they have a cunning plan. But for the moment it is very complex to work out the cost of providing services like SharePoint with Microsoft technologies.

Microsoft would like every user of a service to have a CAL. So you have a base Windows CAL because all MS services run on Windows. Then you have extra CALs for using SQL Server, SharePoint Server, Exchange Server, Terminal Server etc.

Within the enterprise this all makes sense, and the cost is softened by volume licensing agreements that bundle client licenses into one.

A key point is that the CAL entitles the user to use the service on any number of servers, as long as the server is licensed on a per-user rather than a per-server basis. So in the enterprise the user has one CAL for Windows Server allowing them to access any Windows Server in the enterprise.

However the basic idea of the CAL does not work so well outside the enterprise. Firstly, on economic grounds. It is one thing to have a few thousand CALs for employees. But if you have a few hundred thousand users of an Internet service you can’t pay for a CAL for each of them. You might say, why shouldn’t they cost the same? But an internet user may use the service very infrequently, and may not be using the full range of services available to the enterprise user. For example, an internet user is not benefitting from Print services, DFS, Replication, and all the other built-in Windows features. Because users are not using the full features, the alternative services from Open Source or third-parties do much the same, at much lower per-user costs. Similarly an Exchange CAL makes sense if you are using the full range of Exchange services in the enterprise, but not for an internet mail service using open standards like IMAP.

Secondly, on practical grounds. Microsoft are defining the CAL boundary in terms of the user’s contract of employment. They are saying that employees need a CAL, and non-employees are covered by something called a Connector license or an Internet license. In the case of SharePoint for example, you require SharePoint Internet Edition to provide services to non-employees, at a very large cost. The Internet edition must run on a separate server, and must not be accessible to employees. Similarly you would need an internet-valid licence for SQL. But employment is getting to be a very old-fashioned concept in defining people’s working relationships. What would this mean if you had a joint venture with staff from five different companies working on the same project? I am sure it is all defined somewhere in Microsoft’s license agreements, but it is not easy to work out what the license costs should be if you set up an extranet. Of course you can ask Microsoft, but that’s a bit like saying "how much do I owe you?".

To use SharePoint Server 2007 for an extranet you might need:

  • Windows Server CALs per employee
  • SharePoint Server CALs per employee
  • Windows Server External Connector per server
  • SharePoint Server Internet edition per server
  • SQL Server processor licenses for both internal and external users, or CALs per user (employee or not)
  • Forms Server 2007 for Internet sites is licensed per server and does not require CALs.

Then you need to think about what you may need for development and testing. And the licensing for passive standby servers.

Where is this all going? Microsoft are talking a lot about providing their software as services. I think that is all about CALs. When the internet is as fast as the LAN, and when it makes sense for someone else to be running the infrastructure (since it is all the same) then CALs don’t really work any more.

Google Apps

The new Google Apps Premier Edition (here) provides a low-cost alternative to Microsoft Office for less intensive, or more price sensitive desktop users.

Premier Edition adds the means for integration with Active Directory accounts and with existing e-mail, so you don’t have to log on to separate systems. It also means no ads.

It could be useful for people working from home, or for very low usage in the office, maybe on a Linux desktop.