Hands-on with VuFind

VuFind LogoOne of the roles of the Information Discovery & Access Group (IDAG) is to stay current on technologies related to library searching. One relatively well-known search tool in the library landscape is VuFind, an open-source search interface built on top of Apache‘s Solr platform. We decided to try installing VuFind to accomplish several things:

  • Gain experience and knowledge about this specific tool
  • Find out more about search technologies in general
  • Possibly perform some user testing to see how our patrons might respond to a different search interface

This is the first in a series of posts about the process of installing, configuring and testing VuFind here at Dartmouth. If you’re not interested in the technical aspects of installing the software, stay tuned for a subsequent post that will explore the interface itself.

Installation

Installing VuFind was not very difficult, but there were some obstacles.  The first was platform. The installation instructions given by the VuFind Project focus on installing the software on specific server platforms: Windows, Ubuntu Linux, and Fedora Linux. The library’s servers run on Red Hat Linux, which is related to Fedora. However, in order to keep this experiment from impacting production services around the library, we decided to install VuFind using a Mac Mini as a server. For this reason, the installation instructions had to be treated as a rough roadmap rather than a set of commands that could be folllowed verbatim.

There are a number of components needed:

  • The VuFind software, available on github
  • The Apache web server
  • A MySQL database
  • The PHP programming language
  • The Java Development Kit (JDK)

While most of the components are installed by default on a Mac, it was desirable to leave the system versions in place and install versions that matched what VuFind expected (and could be modified without altering the fundamental system components). Homebrew is a package management tool for the Mac operating system that makes this easy to do1.

Once all of these pieces were installed, and configured, we had a system that was operational, but no content in it.

Importing data

The library’s catalog data is held in our Integrated Library System (ILS). The software we use is Sierra, which is provided by Innovative Interfaces, Inc. VuFind provides a search interface that complements the back-office workflow support needed to run library operations, which Sierra provides.

In the Sierra system, we used a feature called “Data Exchange” to create an export file of MARC records. The process of exporting approximately three million records took hours. The resulting file was 3.6 gigabytes of data. We copied the file over to the Mac Mini and imported the records. We had to make some adjustments to the configuration so that the record number from our Sierra system would become the unique identifier in VuFind.

VuFind indexes the records so that when you give it a search query it can find relevant data, which is why we import the records as a batch. However, there are some pieces of data that need to be updated continuously. The most obvious one is the item’s status – is it currently available or checked out? Since this data can change moment to moment, VuFind comes with plugins that can work with most of the major ILS systems, including Sierra. The plugin checks the current availability in real-time by querying the Sierra database at the moment when the item appears in a search result on VuFind.

Making changes

Because VuFind is open source software, we are able to read and modify its code. Almost as soon as we got the system installed, we noticed a problem with the displays of certain records. We were able to trace the problem to a variable that wasn’t being initialized. We were able to edit the code to initialize the variable ourselves, and then submit that change back the VuFind community. This is one of the biggest advantages of open source software – work that we do to improve our own experience can benefit users of the software around the world.

Beyond changes to the code, there are many possibilities available to us in customizing the way the software indexes our data. There are specific notes fields in some records that are not searchable through our Sierra system without paying to have the data reindexed. With VuFind, we can configure the indexing to include whatever data we want to make searchable.

Looking Forward

There are a number of steps that still need to be taken in order to have a fully functional search interface:

  • Build some code to update the records in VuFind regularly. Currently the data in VuFind is a snapshot of what was in Sierra as of mid-October, but records are added, edited and deleted every day. We need to set up a pipeline so that the data in VuFind is kept up to date as changes continue to be made in Sierra.
  • Filter out records that should be suppressed because the item is no longer held at Dartmouth.
  • Integrate the library’s locations into the system. Currently the system doesn’t differentiate between the various branches around campus, but VuFind does support this – we just need to get the configuration done.
  • Integrate patron data into the system. There are features of the software that require login (including tagging records, adding reviews, etc.). We don’t have VuFind integrated with the campus login system yet.
  • Examine and adjust the indexing. The current setup is the generic one that comes included with VuFind, but to make it work best with our data will require the input of metadata experts in the library
  • Get feedback from our users.

Look for another post soon which will get into some of the differences between the VuFind interface and the current search interface.

 

 


1. The basic steps followed after installing Homebrew were:

  • brew install httpd22
  • brew install php56 –with-homebrew-apxs –with-pear –with-postgresql
  • brew install php56-intl
  • brew install php56-mcrypt
  • brew install mysql
  • brew cask install java
  • Install VuFind:

 

Encore Duet – Discovery platform from Innovative Interfaces

Earlier this year, I watched a webinar about a search interface from Innovative Interfaces. I wanted to share my notes on what it allows – and some questions the demonstration didn’t answer.

Encore is the most current library catalog product from Innovative Interfaces, Inc., and it supersedes the OPAC that Dartmouth College Library uses now. Encore Duet is an expansion of the Encore product to include access to electronic articles, e-books, locally created digital collections, and the library’s traditional collections of materials, in one search interface. Duet’s default search presents results from across all the different content types available. It’s offered as a service that can be hosted either locally or in the cloud.

The goal of Encore Duet is to provide a user experience that’s integrated. They are building their own relevance and value ranking into their system. The product includes instant updating of indexes and real-time availability, and it is being presented as something that can save library staff time by including these features. (There is a demonstration instance of the system available at http://encore-academic.iii.com, although some features will not be working because it won’t have an institution’s subscribed content included.)

The interface is built around a search box with three default tabs to search in: CatalogPlus / Catalog / Digital Content. The CatalogPlus tab is an all-in-one search which integrates all of the content that Duet can access. The catalog tab searches within the records that make up the library catalog (content currently served at http://libcat.dartmouth.edu). The Digital Content tab provides searching across the harvested digital objects that the library has configured.

The search of the catalog records integrates many of the features of the catalog itself – placing holds, requesting or renewing items are all functions that can be done in the Duet interface. Current availability of item records is also included. One interesting feature that’s built in is a shelf browse widget – clicking on the call number displays a small number of the titles that would be nearest on the physical shelves to the record selected, allowing for a browsing experience that approximates visiting the shelves in person. This browse display can be either a graphical representation with book jacket thumbnail images, or a simple list in call number order. The library can determine which facets are available to users and can have them re-ordered, removed, and/or set to be open or closed by default. Additionally, locally created indexes can be used as the basis of facets beyond the standard ones that Encore Duet provides. Another interesting feature is the “Promote relevance” button. This feature is available for library staff to boost specific records in order to move them closer to the beginning of search results, but no details were given on how that is accomplished.

The interface provides direct linking to pdf’s of article content in some cases, with OpenURL linking being used as a fallback. Their link resolver is included with Encore Duet, although it’s not clear if it’s a new product or just a bundling of WebBridge, the OpenURL resolver product they’ve sold for years. The article searching is provided through a partnership between III and EBSCO. Content provided by EBSCO and ProQuest was shown in the course of the demonstration, and the presenters claimed that the interface doesn’t privilege results from one provider over another, but ensuring the inclusion of content from all the providers a library subscribes to is important. E-books from EBSCO and Overdrive were highlighted, and the ability to checkout ebooks from Overdrive while in the Duet interface was shown. The system is based on the idea of using Duet as the library’s knowledgebase. This would obviate the need for some of the work that we currently do in the Electronic Resource Management module in Sierra.

Duet can be configured to harvest digital objects from up to three external repositories, to be indexed and included in the main search. The example that they showed brought thumbnail images and metadata into Duet from a contentDM instance. This is interesting to Dartmouth, as we already have ContentDM for many of the objects we’ve digitized.

The tabbed setup might cause confusion about which things are where – certainly some results returned in the Catalog and CatalogPlus tabs will be digital content. There is a similar source of confusion in the faceting that’s provided – users can limit to the catalog by using the tab or the facet on the left. “At the library” is a facet under the heading “Availability” with nothing to show how it’s different from “Library Catalog,” and users may be uneasy with this ambiguity as well.

duet

Questions that I still have:

  • Does it integrate with StackMaps?
  • Is the relatively slow response time seen in the demonstration typical?
  • How can we prevent users from coming to dead ends in their searching? During the demonstration, the presenter used the facet for EBSCO EDS full-text – and the link brought her to a page saying that there were no matching results. (I tried the same search in Summon and got a link through to the article, which is apparently available through Wiley Online Library. I wonder if that’s one provider that wouldn’t work with Encore Duet?) It might be significant that some of the articles found had links for both full-text finder and “WebBridge” which is III’s OpenURL link resolver – which might be confusing to users and it’s not clear what if any maintenance by library staff this would require.
  • What would a service like this cost, especially in comparison to something like Summon?
  • We know that library users want simplicity in finding things, and a single search box with a unified results set seems like a good attempt at simplicity. But many of our peer institutions have been moving toward bento-box style results, with different types of search results broken out into different compartments within a results page. Which approach will better serve our patrons?
  • Have there been published reports on the usability of the service?

Open Educational Resources: New Initiatives for Creation and Discovery

Open Educational Resources, or OERs, include full works like textbooks, as well as smaller units of content that can be repurposed as needed for the learning goals of a course. These are key resources for new approaches to course design and delivery, particularly but not limited to, Massive Open Online Courses (MOOCS). The creation and discovery of OERs has been forwarded by initiatives involving librarians, computing experts, instructional designers, and faculty.  They are enabled by Creative Commons licenses. Here are a few notable examples of technology platforms that make it easier to create OERs, initiatives to support that creation, and discovery services specifically for OERs:

  • Rice University’s Connexions provides a platform including a content management system, an XML structure, and content on which to build, which they call “modules” and “collections”.  Connexions provides tools for writing and assembling content, and content on which to build, licensed for that purpose.
  •  Lumen Learning, founded by David Wiley, BYU Business School, offers support for faculty to work with and develop OER content, and provides consulting services for institutions to help plan for incorporating OERs.  David Wiley explains why in his TED talk:  http://www.youtube.com/embed/Rb0syrgsH6M
  • The Open Education Initiative at UMass Amherst, started in 2011, provides funding for competitive grants to faculty to develop content.  Faculty can use a variety of platforms to develop content, but first learn about resources for finding existing content, and about licensing to make the material reusable.
  • Open Textbook publishing at Oregon State University involves the Library, the OSU Press, and the OSU Extended Campus Open Education Resources Unit, and provides funding for competitive grants to faculty to create open textbooks. See OSU Request for proposals for details on the program.
  • The Open Textbook Library is the result of a new project at the University of Minnesota focused on enhancing discoverability and peer review of OERs, including open textbooks.  David Ernst, University of Minnesota Chief Information Officer in the College of Education and Human Resources, and Executive Director of the Open Academics Textbook Initiative discusses this in his TEDx talk: http://www.youtube.com/watch?v=eA9Tv-OvoZU
  • Flat World Knowledge includes a catalog of resources, and an online editor so faculty can customize materials; it still offers affordable options but no longer completely free access.

For more catalogs, lists and platforms for OERs, see the guide from UMass:  OER For Educators

Questions to ponder:

  1. Should librarians select these kinds of resources for inclusion in our key discovery tools, such as the Catalog and Summon?  If so, which ones?
  2. What is the value to academic institutions in supporting the development of OERs financially and/or with staff support?
Image: Global OER Logo from UNESCO

CrossMark: Tool for Identifying Changes in Journal Articles

CrossMark Link to Information You may notice this CrossMark symbol on the PDF of a recent journal article you have downloaded. The icon is linked to information about this journal article, and keeps you updated with any changes even though you have downloaded the PDF to your own computer, as long as you are connected to the internet. You may also see it on the HTML of an article. The CrossMark icon link will most likely tell you that the version of the journal article you are viewing is current, but it will also warn you if there have been updates to the article, then link to those updates.

CrossMarkUpdatesUpdates could include corrections, changes in a data set,  or retractions.

The DOI (digital object identifier) registration service CrossRef has developed the CrossMark service for use by publishers who use CrossRef DOIs. See CrossMark examples implemented by a variety of publishers.  .

 

Discovering Open Access Articles

Several tools for discovering journal articles, such as Web of Science, IEEE Xplore, PubMed, and ScienceDirect, now have ways for you to limit a search to open access articles or to identify the open access articles within the result set of your search. Open access articles are free to read regardless of the reader’s access to the published articles via institutional subscriptions.
Due to the importance of being able to identify open access articles and to know what kinds of uses of these are permitted, NISO is sponsoring a working group of stakeholders to develop “Recommended Practices for Open Access Metadata and Indicators“. The adoption of standard metadata will enable transfer of that data among information providers and publishers, and potentially further enhance discovery of this information, including for example in web scale discovery services like Summon.
Meanwhile, you can use the following tools to locate open access articles; look for similar options in other search tools:
In the new Web of Science platform, run your search, display results, and find the open access option at the end of the “Refine Results” list of options. This will show you the number of articles in your result that are OA; then apply “refine” to limit your set to these.  OARefineWoSShot
IEEE XPlore offers the option at the search page:
IEEExPloreOA
PubMed offers a filter for “free full text”.
ScienceDirect provides browsing of journals by “open access” for completely open journals or “contains open access” for those where some article are open access, as well as a refinement on your search to open access articles.

Summon 2.0 – Preview It Now!

Summon 2.0 is not just a new look for a user interface to search for vast amounts of scholarly content. It provides new functions and content now, with more to come as it develops over the next few months. It’s available to preview now so have a look!   Summon Preview

Highlights of the new look and features that you’ll see in Preview:

  • 3 columns so additional information does not cover the existing information
  • Research guides, subject specialist librarians and topic overviews display in the third column to provide additional sources of information on the topic
  • Overviews of topics, currently from three sources with more to come: Credo Reference, Encyclopedia Britannica and Wikipedia
  • Facets are selected by links instead of check boxes
  • Results are grouped in “roll-ups” by content type such as images and newspaper articles
  • Results are grouped into broad disciplines
  • Additional suggested search terms are provided through use of controlled vocabularies from a variety of sources, including some index and abstract services

The Summon Preview for Dartmouth URL is:

dartmouth.preview.summon.serialssolutions.com

 

 

BrowZine-Journal Reading Shelf for iPad

BrowZine brings the experience of  browsing current journal shelves- enjoying the cover art, scanning the table of contents, and reading the full text- to your iPad.  This new app from Third Iron allows you to build your own journal browsing shelf from your choice of open access and subscription based journals from a large range of scholarly and scientific publishers. You can set up current awareness notification, and save and download articles to Zotero, Mendeley, Dropbox and other services.

There is a free version of the App that you can use for open access materials, and for a fee, an institution can set up your BrowZine experience to include the journals to which your institution subscribes. Stay tuned for a Dartmouth trial of BrowZine!

More:

Browzine-Journal-Browsing-App-Logo

Discoverability Challenge

An interesting blog post from Lorcan Dempsey

http://orweblog.oclc.org/archives/002206.html

The discoverability challenge is “not now only to improve local systems, it is to make library resources discoverable in other venues and systems, in the places where their users are having their discovery experiences. These include Google Scholar or Google Books, for example, or Goodreads, or Mendeley, or Amazon. It is also to promote institutionally created and managed resources to others. This involves more active engagement across a range of channels.”