Hands-on with VuFind

VuFind LogoOne of the roles of the Information Discovery & Access Group (IDAG) is to stay current on technologies related to library searching. One relatively well-known search tool in the library landscape is VuFind, an open-source search interface built on top of Apache‘s Solr platform. We decided to try installing VuFind to accomplish several things:

  • Gain experience and knowledge about this specific tool
  • Find out more about search technologies in general
  • Possibly perform some user testing to see how our patrons might respond to a different search interface

This is the first in a series of posts about the process of installing, configuring and testing VuFind here at Dartmouth. If you’re not interested in the technical aspects of installing the software, stay tuned for a subsequent post that will explore the interface itself.

Installation

Installing VuFind was not very difficult, but there were some obstacles.  The first was platform. The installation instructions given by the VuFind Project focus on installing the software on specific server platforms: Windows, Ubuntu Linux, and Fedora Linux. The library’s servers run on Red Hat Linux, which is related to Fedora. However, in order to keep this experiment from impacting production services around the library, we decided to install VuFind using a Mac Mini as a server. For this reason, the installation instructions had to be treated as a rough roadmap rather than a set of commands that could be folllowed verbatim.

There are a number of components needed:

  • The VuFind software, available on github
  • The Apache web server
  • A MySQL database
  • The PHP programming language
  • The Java Development Kit (JDK)

While most of the components are installed by default on a Mac, it was desirable to leave the system versions in place and install versions that matched what VuFind expected (and could be modified without altering the fundamental system components). Homebrew is a package management tool for the Mac operating system that makes this easy to do1.

Once all of these pieces were installed, and configured, we had a system that was operational, but no content in it.

Importing data

The library’s catalog data is held in our Integrated Library System (ILS). The software we use is Sierra, which is provided by Innovative Interfaces, Inc. VuFind provides a search interface that complements the back-office workflow support needed to run library operations, which Sierra provides.

In the Sierra system, we used a feature called “Data Exchange” to create an export file of MARC records. The process of exporting approximately three million records took hours. The resulting file was 3.6 gigabytes of data. We copied the file over to the Mac Mini and imported the records. We had to make some adjustments to the configuration so that the record number from our Sierra system would become the unique identifier in VuFind.

VuFind indexes the records so that when you give it a search query it can find relevant data, which is why we import the records as a batch. However, there are some pieces of data that need to be updated continuously. The most obvious one is the item’s status – is it currently available or checked out? Since this data can change moment to moment, VuFind comes with plugins that can work with most of the major ILS systems, including Sierra. The plugin checks the current availability in real-time by querying the Sierra database at the moment when the item appears in a search result on VuFind.

Making changes

Because VuFind is open source software, we are able to read and modify its code. Almost as soon as we got the system installed, we noticed a problem with the displays of certain records. We were able to trace the problem to a variable that wasn’t being initialized. We were able to edit the code to initialize the variable ourselves, and then submit that change back the VuFind community. This is one of the biggest advantages of open source software – work that we do to improve our own experience can benefit users of the software around the world.

Beyond changes to the code, there are many possibilities available to us in customizing the way the software indexes our data. There are specific notes fields in some records that are not searchable through our Sierra system without paying to have the data reindexed. With VuFind, we can configure the indexing to include whatever data we want to make searchable.

Looking Forward

There are a number of steps that still need to be taken in order to have a fully functional search interface:

  • Build some code to update the records in VuFind regularly. Currently the data in VuFind is a snapshot of what was in Sierra as of mid-October, but records are added, edited and deleted every day. We need to set up a pipeline so that the data in VuFind is kept up to date as changes continue to be made in Sierra.
  • Filter out records that should be suppressed because the item is no longer held at Dartmouth.
  • Integrate the library’s locations into the system. Currently the system doesn’t differentiate between the various branches around campus, but VuFind does support this – we just need to get the configuration done.
  • Integrate patron data into the system. There are features of the software that require login (including tagging records, adding reviews, etc.). We don’t have VuFind integrated with the campus login system yet.
  • Examine and adjust the indexing. The current setup is the generic one that comes included with VuFind, but to make it work best with our data will require the input of metadata experts in the library
  • Get feedback from our users.

Look for another post soon which will get into some of the differences between the VuFind interface and the current search interface.

 

 


1. The basic steps followed after installing Homebrew were:

  • brew install httpd22
  • brew install php56 –with-homebrew-apxs –with-pear –with-postgresql
  • brew install php56-intl
  • brew install php56-mcrypt
  • brew install mysql
  • brew cask install java
  • Install VuFind:

 

Encore Duet – Discovery platform from Innovative Interfaces

Earlier this year, I watched a webinar about a search interface from Innovative Interfaces. I wanted to share my notes on what it allows – and some questions the demonstration didn’t answer.

Encore is the most current library catalog product from Innovative Interfaces, Inc., and it supersedes the OPAC that Dartmouth College Library uses now. Encore Duet is an expansion of the Encore product to include access to electronic articles, e-books, locally created digital collections, and the library’s traditional collections of materials, in one search interface. Duet’s default search presents results from across all the different content types available. It’s offered as a service that can be hosted either locally or in the cloud.

The goal of Encore Duet is to provide a user experience that’s integrated. They are building their own relevance and value ranking into their system. The product includes instant updating of indexes and real-time availability, and it is being presented as something that can save library staff time by including these features. (There is a demonstration instance of the system available at http://encore-academic.iii.com, although some features will not be working because it won’t have an institution’s subscribed content included.)

The interface is built around a search box with three default tabs to search in: CatalogPlus / Catalog / Digital Content. The CatalogPlus tab is an all-in-one search which integrates all of the content that Duet can access. The catalog tab searches within the records that make up the library catalog (content currently served at http://libcat.dartmouth.edu). The Digital Content tab provides searching across the harvested digital objects that the library has configured.

The search of the catalog records integrates many of the features of the catalog itself – placing holds, requesting or renewing items are all functions that can be done in the Duet interface. Current availability of item records is also included. One interesting feature that’s built in is a shelf browse widget – clicking on the call number displays a small number of the titles that would be nearest on the physical shelves to the record selected, allowing for a browsing experience that approximates visiting the shelves in person. This browse display can be either a graphical representation with book jacket thumbnail images, or a simple list in call number order. The library can determine which facets are available to users and can have them re-ordered, removed, and/or set to be open or closed by default. Additionally, locally created indexes can be used as the basis of facets beyond the standard ones that Encore Duet provides. Another interesting feature is the “Promote relevance” button. This feature is available for library staff to boost specific records in order to move them closer to the beginning of search results, but no details were given on how that is accomplished.

The interface provides direct linking to pdf’s of article content in some cases, with OpenURL linking being used as a fallback. Their link resolver is included with Encore Duet, although it’s not clear if it’s a new product or just a bundling of WebBridge, the OpenURL resolver product they’ve sold for years. The article searching is provided through a partnership between III and EBSCO. Content provided by EBSCO and ProQuest was shown in the course of the demonstration, and the presenters claimed that the interface doesn’t privilege results from one provider over another, but ensuring the inclusion of content from all the providers a library subscribes to is important. E-books from EBSCO and Overdrive were highlighted, and the ability to checkout ebooks from Overdrive while in the Duet interface was shown. The system is based on the idea of using Duet as the library’s knowledgebase. This would obviate the need for some of the work that we currently do in the Electronic Resource Management module in Sierra.

Duet can be configured to harvest digital objects from up to three external repositories, to be indexed and included in the main search. The example that they showed brought thumbnail images and metadata into Duet from a contentDM instance. This is interesting to Dartmouth, as we already have ContentDM for many of the objects we’ve digitized.

The tabbed setup might cause confusion about which things are where – certainly some results returned in the Catalog and CatalogPlus tabs will be digital content. There is a similar source of confusion in the faceting that’s provided – users can limit to the catalog by using the tab or the facet on the left. “At the library” is a facet under the heading “Availability” with nothing to show how it’s different from “Library Catalog,” and users may be uneasy with this ambiguity as well.

duet

Questions that I still have:

  • Does it integrate with StackMaps?
  • Is the relatively slow response time seen in the demonstration typical?
  • How can we prevent users from coming to dead ends in their searching? During the demonstration, the presenter used the facet for EBSCO EDS full-text – and the link brought her to a page saying that there were no matching results. (I tried the same search in Summon and got a link through to the article, which is apparently available through Wiley Online Library. I wonder if that’s one provider that wouldn’t work with Encore Duet?) It might be significant that some of the articles found had links for both full-text finder and “WebBridge” which is III’s OpenURL link resolver – which might be confusing to users and it’s not clear what if any maintenance by library staff this would require.
  • What would a service like this cost, especially in comparison to something like Summon?
  • We know that library users want simplicity in finding things, and a single search box with a unified results set seems like a good attempt at simplicity. But many of our peer institutions have been moving toward bento-box style results, with different types of search results broken out into different compartments within a results page. Which approach will better serve our patrons?
  • Have there been published reports on the usability of the service?