# posted by Tony
Mon, February 2, 2004
We have been working for some time on Twingle, a tool for finding information in your email. The original, server-based version of this used the Jakarta Lucene engine, through the Perl-Java bridge of Inline::Java
However, in order to fully develop our vision of the next version of Twingle, we needed more control over the fine nuances of searching through email. And, as the next phase of the Twingle development is to include a downloadable version of the software, we needed it to make it easier for people to install - when the lead developer gave up after 6 hours of trying to get it all working on his own machine at home we knew we had a problem!
For the last year we have employed renowned Perl guru Simon Cozens to work with us on Twingle, and as his final project we asked him to port Lucene to Perl.
And so, Kasei is proud to announce the release of Plucene - a Perl port of Lucene. As with much of the software produced by Kasei it is released as open source (In the past year Simon, Tony, Marty and Marc have released over 60
Perl modules).
We'd like to publicly thank Marc and Simon for all their hard work on Plucene, and wish Simon all the best on his future projects.
# posted by Karen
Sat, December 6, 2003
Although we originally promised to use this weblog to rant about the sorry state of ecommerce, it's still my preferred means of shopping. At Christmas this is even truer, as traditional shopping becomes an absolute nightmare. More than ever, shop assistants seem to resent being interrupted from their job by customers, as they have so many more important things to do like stacking the shelves, answering the phones and chatting to their friends.
On Friday, needing a decorative gift bag for a bottle of vodka, I tried Birthdays, where I noticed that the one I liked was in a special 3 for 2 offer. Knowing that I could easily use the others I decided to take the 3. I fought my way through the narrow aisle to the counter, only to discover that there was only one shop assistant serving, and that I had to fight my way half-way back down the other narrow aisle to the end of the queue. Three other employees hovered by the stock room door, presumably so they could make a quick getaway if any one approached to ask a question.
When it was finally my time to be served, I handed over my three bags, and prepared to pay my ÃÂã4.98. When I was asked rather for ÃÂã7.47, I explained that the bags were in the offer. The shop assistant retorted that as there weren't any promotional stickers on the bags they could not be on offer. When I tried to explain that I had deliberately searched for bags without the stickers as I didn't think I would be able to remove them without damaging the bags, she called for the manager, who, without checking into the matter, reiterated that if they had no stickers they couldn't be in the offer. When I again pointed out that other identical bags had the stickers, she snippily stated that "someone must have just stuck those on".
As the manager disappeared back into the stock room, the assistant asked if I still wanted to buy the bags, and looked really put out when I told her that I would only buy one bag. As this meant cancelling the other items, she needed to get the manager to come back out again to change the details on the till. By now the queue behind me stretched the whole length of the shop, but of course none of the loitering staff deigned to open up another till.
When the manager finally returned to adjust the till, she discovered that the bags actually were on offer, that the till had automatically taken the discount, and what was my problem? Losing patience I asked what made her think that ÃÂã2.49 times two was ÃÂã7.47? Looking puzzled she checked the till again, and discovered that the assistant had actually rang through four bags. They removed one, and with the matter now "solved", perfunctorily took my ÃÂã5 and moved on to the next customer.
Like most customers I won't make any sort of complaint about this. Instead, I'll do what Sam Walton always took great pains to warn his staff about:
There is only one boss: the customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else.
# posted by Tony
Fri, October 24, 2003
IT Week this week references the Web Effectiveness Report 2003, which, amongst other things, reveals that only 19% of web site managers surveyed review their log files to look for problems with their site.
When we originally built BlackStar we were fairly novice Perl programmers. We knew that all Perl should really have "strict" mode and "warnings" turned on - so we made sure we did that. But as long as everything ran correctly, we didn't really pay much attention to the warnings that were emitted.
But after a while we realised that the profusion of verbiage cluttering up our logs was highly distracting, and would ensure that nobody really paid them much attention - and the useful information about real problems would be obscured.
So we decided to clean this up, to ensure that anything appearing in the log would most likely be a symptom of a real-life actual problem, of which someone should probably be notified straightaway.
This was much easier to decide than to actually implement, however. We were introducing all manner of new quality procedures at the time, so it was relatively easy to at least decree that any new code, or any alteration of existing code, should be free from warnings. This way we could at least halt the growth of new warnings (although with site visits growing 30-40% per month, the
volume of messages was still growing quickly). We even scheduled into the programming schedule a little time devoted to removing some of the most egregious offenders, which probably removed about 50% of the warnings.
The slow clean up of having the old code gradually tidied in passing wasn't going to get us anywhere fast enough though. So we introduced a new system that was disarmingly simple, but remarkably effective. Every night a process collated the error logs from across the various web servers, and generated a report of the ten most common problems. This report would be emailed to the
entire programming team, and the next day one person would take responsibility for ensuring that the top item on that list vanished.
This approach proved very effective and within a few months the volume of warnings had decreased dramatically. It was so successful that we started to apply the approach to many other areas of the business. Any problem that was monitored over time was a candidate for it - and with an entirely web based business we had a lot of data to monitor. Someone became responsible
for checking the list of the most common search terms that returned no results, and adding a re-mapping so that that search would automatically be transformed into what the person probably wanted. Someone else would ensure that the most visited DVD page that didn't provide cross-reference to the VHS of the same item was given that linkage. Someone in the warehouse would upload the cover of the most visited item in stock that hadn't previously been scanned.
Most of these tasks took less than 5 minutes. They would rarely be a top priority in amongst the constant fire fighting, growth-pains, and breakneck pace of developing new features and systems in the crazy world of exponential growth. In normal circumstances none of these things would even have appeared in someone's list of top 10 priorities. But the simple action of ensuring that each day the top occurrence of each problem was removed created a staggering cumulative effect.
The art of time management is usually a matter of ranking your items by importance and urgency, and prioritising according to how high things appear on each axis. But most books, articles and seminars on the topic stop there. Spending a few minutes each day doing something that is neither particularly important, nor particularly urgent, but that has a beneficial outcome, has value. When everyone in an organisation is doing likewise, and those tasks are automatically selected based on their potential benefit, that value can be enormous.
# posted by Marc
Thu, September 25, 2003
Recently, we were told that Mars was closer to the earth than it has been at any time in the past 60,000 years. This gained much media attention, and sparked a wider interest in astronomy amongst the general public.
This was followed by the revelation that the Earth was in danger of being hit by an asteroid. As this seemed to be of considerable significance, rather than just a general curiosity, all the data was re-examined, and the recalculations indicated that we should be safe after all, as there was only a 1 in 909,000 chance of it hitting.
One might think that, in the light of such a dramatic reinterpretation of the data, someone might similarly return to the Mars data to see if the figures there were incorrect too. But although both cases are essentially the same mathematically, the approach and resulting presentation were very different.
It seems that someone, somewhere, noticed that Mars seemed to be getting closer to the Earth. Curious as to whether or not this was so, they gathered the data from recent observations, and traced its orbit back in time to estimate its more historic position. Out of this popped the media-friendly statistic that the last time Mars was so close was 60,000 years ago. A sensible margin of error here might be 1%, but "60,000 +/- 600 years" doesn't make for as nice headlines.
So why is one portrayed as much more accurate than the other? Of course, there are differences in the calculations, but essentially it comes down to scientists presenting their information in a way to make it more palatable to the public. But should science be something which can be played in a particular way in order to make it more acceptable to certain audiences? Science is as much value-judgement based as anything else, and the universe is not the large clockwork instrument the Victorians believed it to be.
What is really at stake is the different way that issues can be presented to the public. And especially when it comes to anything technical, be it physics, chemistry or computer science, the general public are all too willing to believe what they are being told. The gurus all tell us about the Next Big Thing. The scientists all tell us How It Is. All the critical analysis is done for us, and as a result we accept sloppy reporting without question.
We all want to believe that the latest methodology or technology will make us better people, more productive workers, and even rich. But the Latest Thing may just be the Same Old Thing from a different angle. It is important not to forget any lessons we have already learnt, not to instantly abandon what we know to be right just to keep up with the all-singing, all-dancing bandwagon.
# posted by Marc
Mon, August 4, 2003
Software development methodologies are designed to help us produce cleaner, better and more maintainable code. Books and journals are produced at a staggering rate, filled with the latest answers to all our coding and maintenance woes. As programmers we are all by now expected to be hyper-productive and error-free.
The names change, but the ideologies remain similar. Beck and Fowler may have displaced Weinberg and Constantine, who in turn displaced Knuth and Kernighan, but these are all mere pretenders. As Isaac Newton said, albeit in sarcasm, we all stand on the shoulders of giants. In order to see where all this wisdom originated, we must travel back further still - back to the tales we were told as children, the stories we heard at our mother's knee. We need to rediscover the truths that are in our folklore.
I believe the time is ripe for significantly better documentation of programs [where the programmer] chooses the names of variables carefully and explains what each variable means. - Donald Knuth
A fine sentiment from the ever-wise Knuth. Of course, it is common sense to name your variables wisely. It adds an extra layer of meaning to your code, which can greatly ease future maintenance. But doesn't that sound like a familiar lesson? Where did we first come across the power of naming? Rumplestilskin, of course. Think back to the difficulties the Queen faced in puzzling out the naming convention there!
Or consider the interface to a software library. This is notoriously hard to get right, and the price of getting it wrong is even higher than badly named variables. So software developers learn about the principle of encapsulation. Rather than displaying the innards for all to see, we should hide away all the workings, providing a clean and usable front end. This is, of course, a great piece of advice. But again it is neither new, nor original. In fact, this information has been around for a very long time. Consider the tale of Hansel and Gretel. If the children had have been able to see the cauldrons, potions, jail cells and oven hidden the witch's house, they would never have gone near it.
But instead, the witch encapsulated all the horrible things, and put all the things that children like on display. (Of course, the witch had a bug in her system, which allowed the children to escape, but that is a different matter entirely!)
Even the issues involved choosing a software library were implanted when we were youngsters. I spend most of my time programming in Perl, which has one of the largest public resources of any language: the Comprehensive Perl Archive Network
(CPAN). One of the benefits of these sorts of libraries is that any problem you are trying to solve may already have been solved by someone before you who has released their code for free use. In fact this has probably happened more than once, in many different ways. So it is always good to choose amongst these libraries carefully, in order to find the one that will solve your own particular problem, in the best manner. Of course we learnt this in our youth from Goldilocks. She didn't like the big bowl of porridge, so she tried the middle sized one, which still wasn't to her liking, but the littlest one was exactly what she was looking for. (Of course, there are lessons to be learnt in this story about hacking and computer misuse, but that's for another day!)
Or consider testing. The recent rise of the agile methodologies, such as XP, have re-awakened developers to the power of testing. A good test suite, with full regression tests, can save a project. In fact, some go as far as to recommend you write your tests before you write any code.
But again this is hardly a new concept. Remember Chicken Licken? Unable to correctly identify a problem, he believed his entire system was failing, and led his entire project to disaster. With only had a simple test suite he could have been much more confident in his environment, and quickly recognised the problem solely as untested external input.
I could go on and on. There are many other tales which also show that far from being a new science, Computer Science has just tapped into our universal stories and dressed them up with new terminologies. But hopefully this small taster demonstrates that there may be alternative sources when you're looking for further information on a new concept you've discovered in your favourite buzzword-compliant journal.