Thursday, March 22, 2007

Voyager and Blackwell

We are in the process of setting up Blackwell's Collection Manager system to help streamline our purchasing and ordering of books. As part of the process (which I should add is still underway) we have developed a couple of simple perl scripts that might be useful to others using Collection Manager. They have not be tested in a production environment so use them at your own risk :)

1. MOUR (Mini Open Url Redirector)
Collection Manager has a user based preference to setup an open url resolver the idea is that you point this at your open url server and it will let you know if a book is already in your catalogue. Unfortunately our openurl resolver doesn't link into our library catalogue properly for books, and users would have to click through two links to even see the results. The Mini Open Url Redirector simply redirects the browser to the library catalogue search for the book concerned, no need to go through any intermediate screens.

2. Currency Split
Collection Manger exports orders to an ftp server which can be downloaded and imported into voyager. Unfortunately voyager can only handle orders which are all using the same currency. This script takes the file produced by collection manager and splits it into a bunch of files, one for each currency. These files can then be imported separately into voyager.

Surveying Users

One of our Teams Objectives this year will require us to create at least one web survey form. There have be other instances where we have been asked to assist in creating a web surveys or tests, so I have been investigating possible options for expediting this. There are some good solutions for this purpose out there! One that struck me as extremely well thought through is PHPSurveyor. This is an open source product that is one of the best examples of this style of software development I have seen. The current stable version is PHPSurveyor 1.0 The tool appears to be robust and extremely functional with a “minimalist” interface that whilst not exactly intuitive only takes about an hour to get the hang of (It would be faster if someone was available to instruct you…).

The features of PHPSurveyor that really attract me are:
  • Whilst setup is not completely automated, very clear instructions are provided. In my Windows XP "test" environment, setup was straightforward and we did not experience much different in the Linux server environment we setup for production.

  • The survey user interface is clean of “branding” which is great!

  • Questions can be set to appear singularly or in groups (groups are really pages in this context).

  • There is a cool piece of functionality that allows questions to be dependent on an answer provided to an earlier question. This means that the user can be presented with questions that are relevant and not have to be presented with questions that they would have to ignore…

  • There is a ribbon gauge that advises the user of their progress through the questionnaire.

  • The reporting options for results and the exporting of results is straightforward.


  • This product is worth the trouble of implementing although users will need to have some support to initially get it to work.

    Friday, March 16, 2007

    PERL Data File Search Script

    In the Powershell - Get Inventory Script process I posted last month I briefly mentioned a web based server script that we use to search the data output files we copy to our intranet server. Because I have noticed a significant boost in traffic to the website hosting the Get Inventory Script, I have cleaned up and rewritten the PERL script I developed for searching these files and I am making it available here. I used PERL for this task as the process runs for us, on a Linux intranet server, which of course has PERL natively installed.

    This PERL script is designed to locate all the HTML files in a single directory. These files are in fact named for the hostname of the workstations that originally generated them when they were created by the Powershell - Get Inventory Script, so the script either searches for computer names and displays the complete data sets for each computer on one page, or it searches each file to locate the data string requested and displays the results. Which type of search is in fact done, is dependant on the users choice in the original search (Hostname or Keyword). If a Keyword search is selected any HTML tags in the source code are removed, before comparing the remaining text data with the string being sort. This reduces the number of false hits!

    By searching all the data files harvested (copied) from the workstations there is no need to set up a database to store the data and the interaction overhead this causes. Because we are seaching ALL the files content we are able to effectively isolate discrete data quickly. For example: If I am asked for the number of installations of EndNote we have. I can set up a Keyword search for EndNote the returning data will advise me of the number of workstations with Endnote (124 as of writing) and then list for me the names of the workstations followed by the version details of EndNote on each workstation.

    Links:
  • Download and implement the PERL Data File Search Script.

  • View the Powershell - Get Inventory Script post.
  • Sunday, March 4, 2007

    A Spellchecker for Webvoyage

    At ANZREG Conference a couple of weeks ago, I presented the work I did enabling spellchecking functionality for webvoyage. I was very pleased with the reaction I got with a number of people showing interest in using the same or similar system in their catalogues.

    As promised I have put all the code and some limited documentation on the web. Naturally I'm not going for a Pulitzer Prize in writing, or aiming to make the documentation absolutely complete, but if you do have some input you want to make into the webpages, documentation, the code, or just want to discuss the ideas involved. Please feel from to contact me. My email address is j.brunskill AT waikato.ac.nz

    Links:

    Combining Datafiles

    Dealing with text based data extraction can be time consuming and cause real hassle, especially if you have to combine data files without causing a file to "blow out" with duplicated data. To help me with automating a couple of processes, I wrote this console application in VB .NET. The file is small (15KB) and does not require installing... But you will need to have loaded the Microsoft .NET Framework Version 1.1 on any workstation on which you want to run this application.

    The zip file download of NGCombine.exe is mounted on my personal website. The downloaded zipfile will need to be opened and NGCombine.exe can be copied into the system directory of your workstation or to a directory of your choosing. If you load the file into the system directory you will not need to use a full filepath to call it.

    NGCombine.exe has been limited to processing 1,000,000 records (that is lines of text) per file, which I think is plenty for most of us! Details of the call syntax follows:

    NGCombine.exe
    Combines the contents of two text files sorting the data and eliminating empty lines. (If applied to a single file, the file is sorted.)

    Syntax NGCombine [/a [X:\...]] [/n [X:\...]] [/o [X:\...]] [/e]

    Parameters
    /a [X:\...]
    Required: Specifies filepath for the file containing original or "Archive" data.

    /n [X:\...]
    Specifies filepath for the file containing "New" or incoming data. If this file is not specified, a data sort will occur on the original or "Archive" data only.

    /o [X:\...]
    Specifies filepath for the output data. If this file is not specified, the default output filepath is the original or "Archive" data filepath.

    /e
    Eliminates any duplicate lines of data.

    /r
    Reverses the sort order.

    /?
    Displays help at the command prompt.

    Tuesday, February 27, 2007

    Compare your Library with LibraryThing

    Tim Splading of LibraryThing just release and new set of data, An export of all the ISBN's that LibraryThing knows about.

    Tim writes:
    Over on Next Generation Catalogs for Libraries, NCSU's Emily Lynema, asked me:
    "Do you have any idea of the coverage of non-fiction, research materials in LT? Have you done any projects to look at overlap with a research institution (or with WorldCat)?"
    No, we haven't. And I'm dying to find out, both for academic and non-academic libraries.
    So I decided to see how hard it would be to write a script to compare the LibraryThing dataset against a simple export from our library system. It turns out it didn't take to long. And I have posted the perl source code on my personal website so you no longer have that as an excuse for not helping Tim out.

    Here are the stats for The University of Waikato Library:

    Out of approximately 500,000 Bib records in our database I found only about 178,460 unique ISBNs. LibraryThing has 1,774,322 ISBNs so they have ten times as many as us! Note: This was found to be an error during normalisation. The number is now 292,073


    UoW Library and LibraryThing have 45,259 73,377 ISBN's in common, which means that LibraryThing only has about 15% of the ISBN's we have or in other words 75% of our ISBN's are ones that LibraryThing doesn't have. This seems like a surprisingly large number given how much larger LibraryThing's database is. Tim may have the right idea though, as he said he suspects LibraryThing users tend to have the paperback (cheaper) copies of books rather than the more expensive hardcover versions that libraries tend to buy. It would be interesting to see if that is infact the reason, or if we just have very different set of resources from what is cataloged in LibraryThing.
    DatabaseTotal ISBNsUnique ISBNsPercentage Unique
    University of Waikato292,073218,69674.88%
    LibraryThing1,774,3221,700,94395.86%

    Total ISBNs in common: 73,377


    I figure since they asked the question, NCSU Libraries should be next...

    External Links:


    Update: Updated Figures after discovering I had dropped a whole bunch of ISBN's when normalising them.

    Wednesday, February 21, 2007

    Citizen Preservation, A vision for the future.

    I attended the ANZREG (Australia & New Zealand Regional EndUser Group) conference in Wellington this week.
    We were privileged to have the NZ National Librarian (Penny Carnaby) speaking about the National Digital Heritage Archive (NDHA) as well as other projects such as the National Resource Discovery System, the concept of kete (basket of knowledge) and a whole lot more.

    She talked a lot about the way the internet is evolving, web 2.0 concepts bringing content creation to the hands of every day people and how that is changing the way content needs to be archived and preserved.

    This got me thinking, if web 2.0 if all about giving ordinary people the tools and resources they need to produce content, shouldn't we also begin to put the tools and resources in people hands to preserve and describe their content?
    Preservation isn't exactly a foreign concept to most people, I mean people collect stamps and antique furniture, they rewrite grandma's favorite chocolate cake recipe in a new book so that is wouldn't get lost. We all like to hold on to family heirlooms and all manor of odds and ends. So is there a place for a national library, or in that sense anyone to make tools and resources available to everyday people and set them loose to protect and preserve their content, history, and the like?

    I asked that question (Slightly more succinctly I might add) of Penny Carnaby, and I love her response. Note: This is stated as I remember it, not even slightly 'word for word'.

    "Imagine this picture, A elderly man walks into the national library, his grandson reaching up to hold his hand. Under their arms are books filled with old New Zealand and international stamps collected over the decades. Together the two sit down at computer and begin to scan in and annotate the collection, making it available to the world."

    It is such a nice picture isn't it? The people who care about the data, the people who have the data, are able to release that data so that others have access to it. I don't think from any stretch of the imagination that the national library will undertake to build such a system, but it is a vision of what the future maybe like.

    I don't know about you, but I'd love to see it happen!