Ivies+ Discovery Day 2016

I attended the Ivies+ Discovery Day at MIT on Monday.

Chris Bourg gave the opening talk. The importance of serendipity in discovery was her main focus. How do we build serendipity into our discovery environments?

I think it is important to consider how machine learning and artificial intelligence can be used in a library context. Can we leverage the API libraries of large systems like Alexa or Siri or open source versions of these to offer “intelligent” interfaces to research?

Laura Morse gave an update on the NISO Open Discovery initiative. She encouraged us to review the completed Conformance Statements and the library talking points sections.

Some takeaways from the show and tell of discovery environments:

  • Google Sheets is a common tool used to generate Best Bets of resources
  • Bento box design is pervasive though Harvard went with a unified intermingled list; is Bento still the best choice as columns/containers continue to grow?
  • Use of Blacklight is pervasive.
  • Showing all results with a blank search- facets or other limits to focus the results afterwards
  • Call number browse to show related items
  • Displaying images in a grid
  • Smart fulfillment using Umlaut
  • Penn showed many options up front in their search box
  • Separation of articles and books

There wasn’t any discussions around how indexing decisions were made in Solr – maybe that could be an activity for next year?

Yale presented on their path to building unified discovery

  • They started with Columbia’s code
  • Moved to 360 Link
  • Project went from 2013 to 2016; ended up with 4 developers focused on the project
  • Integration of Spotlight and Avalon are upcoming
  • Move to use evidence based design
  • Create a MARC driven database list using Blacklight

Nancy Pressman-Levy gave some tidbits of things she heard from various meetings she had attended

  • Yewno had a big presence at ALA – MIT and Stanford are testing
  • The new Refworks will work more like Zotero
  • In late fall Proquest Summon will have a citation trail feature, Database Recommender will have more staff editing abilities, improved topic explorer, searching by database name, personalization

A followup related to Google Analytics – 360 Link can be integrated into analytics but requires some vendor setup.

Links to review in more detail







Notes on Readings on Discovery – Future of the Library Website


“biggest problem for library websites, is that there is little future for the library website. That’s because people will get less and less information through web browsers. Indeed, consider how often you use a web browser on your phone versus an app. Developments in AI, Augmented Reality and Virtual Reality will compound that trend.”

“Within a decade or two, I expect people will look back on web pages as a brief, transitory medium bridging print information to linked data. And as our AI, VR and AR technologies take off, they will liberate information from the old print paradigms altogether.”

“you could define the future online library as something between an MMORPG, Meetup.com and the TED conference”


Notes From Readings on Discovery – Catalog focused blog posts update

Tim Spalding has posted his second installment:


From analyzing usage statistics … learned that enrichments hidden behind tabs or other such elements are effectively invisible and rarely used. Users have been trained to scroll up and down, but they are averse to clicking to see ‘what else there is.’

User experience also throws the distinction between website and catalog into question. Libraries think of them as separate things, often placing them under separate teams. But users do not.”

Libraries today exist on social media as much as anywhere else; enrichments should too.

Search for something on Google and some excellent hits from your library” should “show up as well.

Even as technology unifies, libraries must resist closed solutions and ‘walled gardens.’ Catalogs and other services must be open to enrichment

We are “in the infancy of recommender systems, and ‘serendipity-systems’ have barely been tried

Serendipity and ‘tripping over things’ plays a role in library discovery

Notes From Readings on Discovery – Catalog focused blog posts

Tim Spalding is writing a series on catalog enrichment:


the first wave of catalog enrichment was all about static elements—book summaries, tables of contents, published reviews” and “cover images

such enrichments came from the growing sense that Amazon—for all its faults—was doing something right: presenting books in a way patrons wanted to see them.

The second wave used “cross-site JavaScript” and “exposed user data,” especially their reviews and tags

the second wave “moved catalog enrichment beyond data into full-fledged two-way services” … “library patrons could add their own reviews, create lists, and search and browse hand-picked and algorithmic read-alikes while still essentially in the catalog itself

I would add faceted displays, locally implemented discovery layers like Blacklight and the merging of catalogs into discovery services like Summon as a third wave – maybe Tim will discuss these efforts in his followup article?

Karen Coyle is writing a series on the development of library catalogs -based on her presentations at ELAG:



Because we present the catalog as a retrieval tool for unrelated items, users have come to see the library catalog as nothing more than a tool for known item searching. They do not see it as a place to explore topics or to find related works.

In the past catalogs had cards, one for the main entry and separate cards for headings: “all bibliographic data was subordinate to a layer of headings that made up the catalog…  that heading layer was… the only entry to the content of the library.

Innovation:  Use “printed cards and type (or write) the desired headings onto the top of the card. Each of these would have the full bibliographic information” … users would no “longer need to follow ‘see’ references from headings to the one full entry card in the catalog

The printed card “combined bibliographic information and heading tracings in a single ‘record’, with the bibliographic information on the card being an entry point to the headings.”

Innovation: “The MARC record was designed to have all of the information needed to print the set of cards for a book” … “Here again the bibliographic information and the heading information were together in a single unit, and it even followed the card printing convention of the order of the entries, with the bibliographic description at top, followed by headings. With the MARC record, it was possible to not only print sets of cards, but to actually print the headers on the cards, so … they were ready to go into the catalog at their respective places.

the catalog is composed of cards for headings that have attached to them the related bibliographic description. Most items in the library are represented more than once in the catalog. The catalog is a catalog of headings.

In most computer-based catalogs, the relationship between headings and bibliographic data is reversed: the record with bibliographic and heading data, is stored once; access points, analogous to the headings of the card catalog, are extracted to indexes that all point to the single record.

Indexes of the database system are not visible to the user. This is the opposite of the card catalog where the entry points were what the user saw and navigated through.

online catalogs are not “a linear, alphabetically ordered list of headings.” Databases  “encourage the use of searching rather than linear browsing. Even if one searches in headings as a left-anchored string, … a search results in a retrieved set of matching entries, not a point in an alphabetical list. There is no way to navigate to nearby entries. The bibliographic data is therefore not provided either in the context or the order of the catalog.

if the catalog uses a visible and logical order, like alphabetical by author and title, or most recent by date, there is no way from the displayed list for the user to get the sense of “where am I?” that was provided by the catalog of headings.

A “major difference between the card catalog and the computer catalog: the ability to search on individual words in the bibliographic record rather than being limited to seeking on full left-anchored headings… keyword searching was both a boon and a bane because it was a major factor in the loss of context in the library catalog.

So how do we get context back? Faceting based on headings seems to provide that. I will check back with Karen as she continues her series.


Notes from Readings on Discovery: Who, What, When, Where and Why of Library Discovery

Peter Murray gave the concluding remarks at the NISO Forum on The Future of Library Resource Discovery, and he recorded his thoughts on his blog. I have pulled out some points worth further reflection:


One vision of a discovery service — ask a question into the air and get an instant, accurate, and concise answer back- the Amazon Echo as an example. Is this what many library users are looking for when they come to our one-search-box discovery service? Do we want our discovery service to look like this and in what ways do we want our service to be distinctly different?

Do users know how to navigate the web, operate a mouse, and understand the user interface cues that are now ingrained in our[the librarians] experience?

Do users have speech, mobility or visual impairments? Do they have the life experience needed to even form the question they are asking, or are they a budding scholar in an unfamiliar research area?

Our discovery services need to take this range of “whoness” into account. They need to work for a wide variety of skill, abilities and knowledge.

Our discovery service should be deeply rooted in the tradition of the reference interview.

The art of the reference interview carefully guides the user through the maze of possibilities. The user might not even know they’ve been lead through one of a thousand paths that could have been taken when the question was first formed. Do our discovery layers account for that complexity as they lead users to their end goal?

While our library discovery interface should have all of the responsive design techniques that make it scale from phone size to wall size, we should not lose sight of where users will conduct their research — on their tablets and desktops.

Within our own community I think we are looking for the ubiquity of our discovery service in all the places the user is.

Contextual clues that the discovery service could use to tune its reference interview algorithm is the time of day, day of the week, and week of the year.

In a world that is being given customization, complete contextual awareness, and near prescient capabilities for meeting its information needs, is it important that the library profession should chase after those capabilities. I don’t know.

Smaller devices are more likely to be used for “known item” searches — such as when the patron wants to send a citation to a collaborator in the course of a conversation.

I will still argue that very, very few are going to conduct a full-blown literature search — with all of the bells and whistles — on a small mobile device.

Notes from Readings on Discovery – The University of Oxford Report

I am kicking off a new series of posts that will capture research in information/resource discovery that I am pursuing as part of my role as Emerging Technologies Librarian. I recently finished reading the University of Oxford Resource Discovery report that came out in late 2015. Following are the highlights/notes from my reading. Any conversations/comments about these findings would be welcome.

Highlight, page 4
Good resource discovery tools, though, are not simply about making research easier and faster, but about facilitating the creation, preservation and discovery of knowledge by enabling new modes of research—especially across disciplines.

Highlight, page 5
Resource discovery is defined as any activity which makes it possible for an individual to locate information which he or she needs. Such material and such activities may be digital or analogue in nature

Highlight, page 11
Firstly, resource discovery is very discipline-specific. While quite a few people do start their search at Google, many start at the library catalogue. Within certain disciplines, though, searchers will jump straight to the top resources in their field (arXiv for Physics, PubMed for Medicine, WestLaw or similarly specialized tools for Law). These findings are consistent with the well-documented understanding of the differences in ‘known-item’ versus subject searching, and emphasize that while both happen in all disciplines, the sciences are often dominated by the former. One notable exception to this is evidence-based medicine, where researchers are often engaged in very thorough subject-based searching.

Students need to learn how to search.

Discovery is not as simple as ‘novice’ vs. ‘expert’. Experts in their fields may use some of the same discovery tools and techniques as incoming students in certain circumstances. A professor in one discipline may, for example, use Wikipedia or basic Google searches to familiarize themselves with a new topic just as a new student might.

Asking people, and knowing who to ask, seems to make the difference between simply finding what you need to complete an assignment and becoming an expert researcher.

Highlight, page 12
None of the respondents said that they used open/public social media platforms for asking resource discovery questions.

Expertise in a domain requires two things: an understanding of the parameters of your domain and an understanding of the available and relevant resources in those areas.

Highlight, page 13
Interactive visualization and visual analytics should have significant roles in the next generation of resource discovery technology. It is important to understand what visualization is really for. The key is to save the user’s time and reduce their cognitive load.

Highlight, page 15
Expertise requires two things: an understanding of the parameters of your domain and an understanding of the available and relevant resources in those areas. ‘Experts’ have varying levels of confidence about their mastery of these domains, but all seem to have a clear sense of its ‘borders’. Therefore, resource discovery should at least in part be about helping people to identify and define these borders.

Highlight, pages 16-17
Using collection-level metadata provide an interactive diagram that represents the range of collections. For example, works on paper vs. objects; printed vs. manuscripts; visual vs. textual; digitized vs. not digitized; catalogued vs. not catalogued. Dates could be contextualized within ranges of centuries.

Highlight, page 17
Provide an immediate visual guide to how many collections there are and their relative sizes, which ones are searchable electronically, which are catalogued in print indices and which are not yet catalogued.

Overlaps in collection provenance, topic or format could provide starting points for navigation.

Highlight, page 18
Cross collection search would ideally allow discovery across libraries and museums, while still allowing users to narrow their search to one or more collections (and allowing the individual collections to brand their own home pages)

Index and expose existing item-level metadata.

Provide a clear sense of what is being missed when searching.

The object would be to visualize not only what is being found, but what is not being found due to the lack of item-level metadata.

Provide an immediate visual representation of what is available and in what format.

Highlight, page 19
Searchers at all levels rely on people— librarians, colleagues, supervisors, mentors or experts in their field —to find resources when other search methods have failed.

Provide a graph of the professional networks

Highlight, page 21
Providing a reliable source for upcoming talks by division or subject area would be heavily used and well-received.

Asking people (colleagues, librarians, curators) for help in locating resources was universal among the users interviewed.

For many (across the disciplines), the process of research is as important as the outcome.

Highlight, page 22
Exposing metadata for indexing by Google and Google Scholar would undoubtedly assist those who start their searches on the open web. Working with subject-specific repositories like arXiv and PubMed, and publishers like JSTOR would further assist in connecting users with specific resources.

Citation chaining is ubiquitous in all areas of research across all disciplines. CrossRef and other individual tools and databases have gone some of the way towards making this easier, but citation chaining is still not heavily present. Searchers in all disciplines use cited references as authoritative points of departure for finding more resources on a topic.

Facilitate precise ‘known-item’ searching.

Highlight, page 23
Investment in the analytics and data infrastructure to support evidence based decision making across collections.

UK National Archives Discovery page provides a good example of a portal for discovery services across diverse collections. It combines a cross-collection search with prominent featured ‘popular collections’ as well as research guides.

Highlight, page 24
Collections cannot be discovered using electronic search tools unless they have some sort of representative electronic description. High quality description is key to discovery.

Highlight, page 32
A vast majority of respondents (even those known experts in their field) did not have high-confidence that they were “on top of” everything that was happening in their domains.

Of the incoming students (both graduate and undergraduate) very few tried to monitor new publications and were mostly responding to suggestions from supervisors and instructors.

Most senior academics had developed mechanisms for coping with ‘keeping up’. These usually involved a combination of social media, informal communications (email from colleagues), conferences, Zetoc and table of contents alerts from specific journals, and/or more formal roles such as serving as editor or reviewer for relevant journals.

Those that do use social media, use it as a way to monitor interest groups, people, conferences, blogs in their field, or as a mechanism to promote their own projects or work.

Highlight, page 33
The more people expand the boundaries of what and where they are searching (with tacit and explicit assistance from mentors in their domains, and often through their own trial and error), the more expert they become in their field because they learn the boundaries as well as the tips and tricks for finding the most credible sources and the less well-known parts of the collections.

Highlight, page 40
The differences between [web-scales services] are not that significant… thinking that …there are some ‘good’ and some ‘bad’… is probably wrong. It’s not really about the product, it’s about the willingness of the vendor to overcome problems, and about their attitude to their customers.

In the categories of relevance ranking and successful full-text linking, the differences among EDS, Summon, and Metalib+ were statistically insignificant and no one system greatly outperformed the others. Google Scholar fell behind in both categories. In the category of up-to-date results, the disparities were slightly greater though not significant enough to warrant a recommendation, with EDS scoring best, followed by Metalib+, then Summon, with Google Scholar falling greatly behind the others in this category, with the exception of searches in the Sciences, where it scored higher than the other systems for up-to-date results. Known-item searches are the only area where Google Scholar significantly outperformed the other systems across disciplines.

Highlight, page 44
All the participants use simple search as default. Users are now used to the Google model where ‘people do a search first and then filter afterwards’.

Users tend to use the default tab the most, regardless of what it is.

There are two ways in which search results are presented: a single list and a Bento box. Blacklight and VuFind customers use the Bento box approach, as it is part of the platform design. According to one university, the Bento box is popular with institutions that preferred a home-grown system. Another university mentioned that external research showed a 50/50 split for user preferences for a single list vs. the Bento box approach. When they were investigating the options, there was no clear winner. The choice of approach seems to largely depend on the underlying solution.

There is general agreement amongst the participants that a significant proportion of resource discovery starts outside the library, mainly in Google or Google Scholar. This is the primary reason why Utrecht University decided to concentrate its efforts on providing discovery support for users irrespective of where discovery starts, rather than investing significant resources and effort in acquiring and maintaining a resource discovery platform.

Highlight, page 45
Since the first Faculty Survey in 2000, we have seen faculty members steadily shifting towards reliance on network-level electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.

Thinking the unthinkable: A library without a catalogue – reconsidering the future of discovery tools for the Utrecht University Library

Highlight, page 48
Optimization of records for search engines and linked data are seen as important but are not fully explored at present.

It is important to collect user behaviour statistics/ data in new ways, like watching Pinterest traffic of access to the organization’s collections.

Highlight, page 53
People use the Principle of Least Effort, preferring easy-to-get information over harder-to-get information, no matter how high the quality of the latter, as a rule.

They want instant results and instant gratification because a fundamental tenet is that convenience trumps equality

“Berrypicking” describes an evolving strategy, refining how discovery is carried out as initial information discovered changes the conceptual model the user has of what they are looking for: “gathering information a piece at a time while the information need and search criteria continue to evolve.”

Users of discovery services tend to blithely repeat methods which gave acceptable results in the past. This behaviour is more as predicted by Gratification Theory, where past success in finding relevant material means that the same method is re-used in the future.

The Information Search Process described by Kulhthau divides the discovery process into task initiation, topic selection, prefocus exploration, focus formulation, information collection, search closure, and starting writing. Perceptions of anxiety decrease through this process, accompanying a progression from ambiguity to specificity.

Highlight, page 54
Students carry out resource discovery principally to satisfy immediate academic requirements (essays, examinations, etc.), and is associated with a “certain amount of anxiety”. As a result, convenience and familiarity outweigh suitability as criteria for services and methods used for discovery, and this is accompanied by “hesitancy” about asking for assistance from tutors or librarians

Six typical behaviours seen in university faculty members-

starting: reading reviews and review articles, initial exploratory searchers, etc. – actions to be undertaken before the main discovery exercise

chaining: tracking citations forwards and backwards from a known item

browsing: semi-directed searching, e.g. using author names or looking along a shelf of physical items

differentiating: using differences between items to determine relevance

monitoring: current awareness of activity in research field

extracting: systematic analysis of a specific source (e.g. publisher’s web pages) to identify material of interest.

Highlight, page 55
Recent research shows “less overall difference between the physical sciences and the humanities” than expected. A series of case studies across the disciplines found that “from a broad sociological view, it is striking how much consistency there is across the fields and disciplines”

Highlight, page 62
They [commercial systems] primarily use a Lucene index at the backend. Lucene provides search index (also known as “inverted index”) creation, storage, and management facilities with document ranking algorithms (e.g., Boolean, TF-IDF, Cosine, Fuzzy, and so on). These modern tools also use probabilistic models to map documents to terms and then rank results. Probabilities are generated using methods such as TF-IDF and other language models. Although the exact methods of commercial systems are unpublished, it is likely they use techniques similar to Lucene.

Highlight, page 63
In a library, many of the files are non-textual (e.g., media, archive, images, scanned copies of invoices, books, catalogs, etc.). Therefore, improved metadata-based retrieval is essential. Similar to Web search, an Enterprise Search Engine can use relevance feedback information that can be used to learn and improve the ranking of the search results in the enterprise search as no search can guarantee to find all the relevant file or how to correctly specify the search criteria. Most importantly, it would be useful to assist search with effective visualization and interaction.

Highlight, page 64
Resource discoveries in libraries are powered at the back-end either using a federated search or pre-harvested search. In both federated search and pre-harvested search the users are allowed to search via a single access point. In federated search, the search word is looked for across multiple databases while in pre-harvested search the search word is looked for through the pre-harvested indexes of content with weighting applied to help with the relevancy ranking. By implementing the single-search-screen, users are able to retrieve results from multiple databases. For both of these searches, the results are then returned back often as a rank list in a paginated format. Although it is easier for users to retrieve search results using the single search box, users found that they have difficulties in comprehending the list of results that was returned to them.

For example, Tag clouds and inkblots that enable us to view the categorical information and statistical information about the documents, revealing various signature patterns for comparative analysis. Similarly, visualizing changes of themes and topics over time within a collection of documents, or visualizing of multi-faceted relationships of keywords either within documents or across a large collection of documents. A visual search engine for exploring Wikipedia through its semantic relationship, or a visual analytics tool for exploring academic publications through citations, ranking, and techniques of summarization and automatic clustering.

Highlight, page 65
Among all of the scholarly work that has been published by the visualization and information-retrieval communities on navigation and exploration of Web information space, the VisGets, Fluid Views, and PivotPaths are the most significant.

Highlight, page 66
Another example is using a Voronoi treemap to organize search results.

Highlight, page 70
Glyphs are visual entities composed of several visual channels representing multivariate qualitative and/or quantitative attributes.

Highlight, page 73
Despite its many advantages, there are also several drawbacks to Google Scholar mentioned by our interviewees for example (i) it returns back a vast number of results from which it is difficult to filter out the relevant articles, (ii) the detection of the omission of relevant articles becomes challenging due to the sheer volume of results and there is also concern that the list may not be accurate and up-to-date, (iii) as a keyword-based search, results can be skewed by the use of “incorrect” search terms, (iv) its inability to search across PDF documents for embedded scientific data, and (v) the lack of subject categorizations (or tagging) that makes it difficult to carry out a “top-down search” of drilling down a particular subject or retrieval of articles of similar (or associated) topic. There are also other drawbacks but they are more directed towards Google that (i) the results that are returned can sometimes be overwhelmed by commercial interests, e.g., adverts, and there is the question of the validity and providence of the information and whether it can be trusted.

Highlight, page 75
A number of interviewees explained that the search terms that they used are typically the results from conversations with colleagues, peers, and collaborators. Some of the search terms used are based on their experience from reading through articles as well as refinement of the search terms by heuristic. Library orientations were also mentioned as a source of recommendation for search terms as well as domain specific databases to search on.

Highlight, page 77
Most of the interviewees usually search for the latest published articles and specialized information that are relevant to their fields. For science-based subjects, the interviewees would search for algorithms and experimental methods. Other information that was also searched for include patents, talks, overview explanations of specific topics, other researchers that are in a similar field, and who is doing what in specific subject areas.

Highlight, page 78
[Search system should] generate the most common and related keywords and term-based subjects. It should also bridge the gaps between different fields.

Until you have captured all of the “synonyms” of the search term you could be missing out on a lot of references that may be important.

Generate the view for the co-citations of the articles as well as visualizing the degree of strength of the connections between the articles.

Display the timeline

Evolution of the paper by seeing who referenced it and who it referenced

Integrate between both the publications and social metadata where you would be able to receive social recommendation on articles and books as well as finding articles based on what the other researchers in similar fields are also accessing.

View the author’s profile and see their publications list.

Highlight, page 79
Search and extract embedded information inside PDF documents

An aggregator that would allow you to integrate between the different platforms, such as Mendeley, Google Scholars, and JSTOR.

A tool that could help promote collaboration between researchers by enabling them to see the institutional affiliations of the researchers based on their expertise and allows you to set-up a chat system to communicate with these researchers.

Constructing a visual analytics tool that would complement the existing search engines and reference managers by implementing the items mentioned in the wish list by our interviewees and more. Perhaps this visual analytics tool can help bring back the “physical shelf browsing and serendipitous discovery”

Highlight, page 88
Today’s scholars can take advantage of “altmetrics” both to measure the impact of their own work and as an aid for the discovery of well-regarded research articles. Altmetrics basically measure the informal citations of articles in various forms of social media, immediately giving a picture of the importance of an article which rounds out the information given by traditional citation counting (as well as being quicker to respond to new citations, and applicable to a wider set of academic outputs by including such things as research data)

Hands-on with VuFind

VuFind LogoOne of the roles of the Information Discovery & Access Group (IDAG) is to stay current on technologies related to library searching. One relatively well-known search tool in the library landscape is VuFind, an open-source search interface built on top of Apache‘s Solr platform. We decided to try installing VuFind to accomplish several things:

  • Gain experience and knowledge about this specific tool
  • Find out more about search technologies in general
  • Possibly perform some user testing to see how our patrons might respond to a different search interface

This is the first in a series of posts about the process of installing, configuring and testing VuFind here at Dartmouth. If you’re not interested in the technical aspects of installing the software, stay tuned for a subsequent post that will explore the interface itself.


Installing VuFind was not very difficult, but there were some obstacles.  The first was platform. The installation instructions given by the VuFind Project focus on installing the software on specific server platforms: Windows, Ubuntu Linux, and Fedora Linux. The library’s servers run on Red Hat Linux, which is related to Fedora. However, in order to keep this experiment from impacting production services around the library, we decided to install VuFind using a Mac Mini as a server. For this reason, the installation instructions had to be treated as a rough roadmap rather than a set of commands that could be folllowed verbatim.

There are a number of components needed:

  • The VuFind software, available on github
  • The Apache web server
  • A MySQL database
  • The PHP programming language
  • The Java Development Kit (JDK)

While most of the components are installed by default on a Mac, it was desirable to leave the system versions in place and install versions that matched what VuFind expected (and could be modified without altering the fundamental system components). Homebrew is a package management tool for the Mac operating system that makes this easy to do1.

Once all of these pieces were installed, and configured, we had a system that was operational, but no content in it.

Importing data

The library’s catalog data is held in our Integrated Library System (ILS). The software we use is Sierra, which is provided by Innovative Interfaces, Inc. VuFind provides a search interface that complements the back-office workflow support needed to run library operations, which Sierra provides.

In the Sierra system, we used a feature called “Data Exchange” to create an export file of MARC records. The process of exporting approximately three million records took hours. The resulting file was 3.6 gigabytes of data. We copied the file over to the Mac Mini and imported the records. We had to make some adjustments to the configuration so that the record number from our Sierra system would become the unique identifier in VuFind.

VuFind indexes the records so that when you give it a search query it can find relevant data, which is why we import the records as a batch. However, there are some pieces of data that need to be updated continuously. The most obvious one is the item’s status – is it currently available or checked out? Since this data can change moment to moment, VuFind comes with plugins that can work with most of the major ILS systems, including Sierra. The plugin checks the current availability in real-time by querying the Sierra database at the moment when the item appears in a search result on VuFind.

Making changes

Because VuFind is open source software, we are able to read and modify its code. Almost as soon as we got the system installed, we noticed a problem with the displays of certain records. We were able to trace the problem to a variable that wasn’t being initialized. We were able to edit the code to initialize the variable ourselves, and then submit that change back the VuFind community. This is one of the biggest advantages of open source software – work that we do to improve our own experience can benefit users of the software around the world.

Beyond changes to the code, there are many possibilities available to us in customizing the way the software indexes our data. There are specific notes fields in some records that are not searchable through our Sierra system without paying to have the data reindexed. With VuFind, we can configure the indexing to include whatever data we want to make searchable.

Looking Forward

There are a number of steps that still need to be taken in order to have a fully functional search interface:

  • Build some code to update the records in VuFind regularly. Currently the data in VuFind is a snapshot of what was in Sierra as of mid-October, but records are added, edited and deleted every day. We need to set up a pipeline so that the data in VuFind is kept up to date as changes continue to be made in Sierra.
  • Filter out records that should be suppressed because the item is no longer held at Dartmouth.
  • Integrate the library’s locations into the system. Currently the system doesn’t differentiate between the various branches around campus, but VuFind does support this – we just need to get the configuration done.
  • Integrate patron data into the system. There are features of the software that require login (including tagging records, adding reviews, etc.). We don’t have VuFind integrated with the campus login system yet.
  • Examine and adjust the indexing. The current setup is the generic one that comes included with VuFind, but to make it work best with our data will require the input of metadata experts in the library
  • Get feedback from our users.

Look for another post soon which will get into some of the differences between the VuFind interface and the current search interface.



1. The basic steps followed after installing Homebrew were:

  • brew install httpd22
  • brew install php56 –with-homebrew-apxs –with-pear –with-postgresql
  • brew install php56-intl
  • brew install php56-mcrypt
  • brew install mysql
  • brew cask install java
  • Install VuFind:


Encore Duet – Discovery platform from Innovative Interfaces

Earlier this year, I watched a webinar about a search interface from Innovative Interfaces. I wanted to share my notes on what it allows – and some questions the demonstration didn’t answer.

Encore is the most current library catalog product from Innovative Interfaces, Inc., and it supersedes the OPAC that Dartmouth College Library uses now. Encore Duet is an expansion of the Encore product to include access to electronic articles, e-books, locally created digital collections, and the library’s traditional collections of materials, in one search interface. Duet’s default search presents results from across all the different content types available. It’s offered as a service that can be hosted either locally or in the cloud.

The goal of Encore Duet is to provide a user experience that’s integrated. They are building their own relevance and value ranking into their system. The product includes instant updating of indexes and real-time availability, and it is being presented as something that can save library staff time by including these features. (There is a demonstration instance of the system available at http://encore-academic.iii.com, although some features will not be working because it won’t have an institution’s subscribed content included.)

The interface is built around a search box with three default tabs to search in: CatalogPlus / Catalog / Digital Content. The CatalogPlus tab is an all-in-one search which integrates all of the content that Duet can access. The catalog tab searches within the records that make up the library catalog (content currently served at http://libcat.dartmouth.edu). The Digital Content tab provides searching across the harvested digital objects that the library has configured.

The search of the catalog records integrates many of the features of the catalog itself – placing holds, requesting or renewing items are all functions that can be done in the Duet interface. Current availability of item records is also included. One interesting feature that’s built in is a shelf browse widget – clicking on the call number displays a small number of the titles that would be nearest on the physical shelves to the record selected, allowing for a browsing experience that approximates visiting the shelves in person. This browse display can be either a graphical representation with book jacket thumbnail images, or a simple list in call number order. The library can determine which facets are available to users and can have them re-ordered, removed, and/or set to be open or closed by default. Additionally, locally created indexes can be used as the basis of facets beyond the standard ones that Encore Duet provides. Another interesting feature is the “Promote relevance” button. This feature is available for library staff to boost specific records in order to move them closer to the beginning of search results, but no details were given on how that is accomplished.

The interface provides direct linking to pdf’s of article content in some cases, with OpenURL linking being used as a fallback. Their link resolver is included with Encore Duet, although it’s not clear if it’s a new product or just a bundling of WebBridge, the OpenURL resolver product they’ve sold for years. The article searching is provided through a partnership between III and EBSCO. Content provided by EBSCO and ProQuest was shown in the course of the demonstration, and the presenters claimed that the interface doesn’t privilege results from one provider over another, but ensuring the inclusion of content from all the providers a library subscribes to is important. E-books from EBSCO and Overdrive were highlighted, and the ability to checkout ebooks from Overdrive while in the Duet interface was shown. The system is based on the idea of using Duet as the library’s knowledgebase. This would obviate the need for some of the work that we currently do in the Electronic Resource Management module in Sierra.

Duet can be configured to harvest digital objects from up to three external repositories, to be indexed and included in the main search. The example that they showed brought thumbnail images and metadata into Duet from a contentDM instance. This is interesting to Dartmouth, as we already have ContentDM for many of the objects we’ve digitized.

The tabbed setup might cause confusion about which things are where – certainly some results returned in the Catalog and CatalogPlus tabs will be digital content. There is a similar source of confusion in the faceting that’s provided – users can limit to the catalog by using the tab or the facet on the left. “At the library” is a facet under the heading “Availability” with nothing to show how it’s different from “Library Catalog,” and users may be uneasy with this ambiguity as well.


Questions that I still have:

  • Does it integrate with StackMaps?
  • Is the relatively slow response time seen in the demonstration typical?
  • How can we prevent users from coming to dead ends in their searching? During the demonstration, the presenter used the facet for EBSCO EDS full-text – and the link brought her to a page saying that there were no matching results. (I tried the same search in Summon and got a link through to the article, which is apparently available through Wiley Online Library. I wonder if that’s one provider that wouldn’t work with Encore Duet?) It might be significant that some of the articles found had links for both full-text finder and “WebBridge” which is III’s OpenURL link resolver – which might be confusing to users and it’s not clear what if any maintenance by library staff this would require.
  • What would a service like this cost, especially in comparison to something like Summon?
  • We know that library users want simplicity in finding things, and a single search box with a unified results set seems like a good attempt at simplicity. But many of our peer institutions have been moving toward bento-box style results, with different types of search results broken out into different compartments within a results page. Which approach will better serve our patrons?
  • Have there been published reports on the usability of the service?

Open Educational Resources: New Initiatives for Creation and Discovery

Open Educational Resources, or OERs, include full works like textbooks, as well as smaller units of content that can be repurposed as needed for the learning goals of a course. These are key resources for new approaches to course design and delivery, particularly but not limited to, Massive Open Online Courses (MOOCS). The creation and discovery of OERs has been forwarded by initiatives involving librarians, computing experts, instructional designers, and faculty.  They are enabled by Creative Commons licenses. Here are a few notable examples of technology platforms that make it easier to create OERs, initiatives to support that creation, and discovery services specifically for OERs:

  • Rice University’s Connexions provides a platform including a content management system, an XML structure, and content on which to build, which they call “modules” and “collections”.  Connexions provides tools for writing and assembling content, and content on which to build, licensed for that purpose.
  •  Lumen Learning, founded by David Wiley, BYU Business School, offers support for faculty to work with and develop OER content, and provides consulting services for institutions to help plan for incorporating OERs.  David Wiley explains why in his TED talk:  http://www.youtube.com/embed/Rb0syrgsH6M
  • The Open Education Initiative at UMass Amherst, started in 2011, provides funding for competitive grants to faculty to develop content.  Faculty can use a variety of platforms to develop content, but first learn about resources for finding existing content, and about licensing to make the material reusable.
  • Open Textbook publishing at Oregon State University involves the Library, the OSU Press, and the OSU Extended Campus Open Education Resources Unit, and provides funding for competitive grants to faculty to create open textbooks. See OSU Request for proposals for details on the program.
  • The Open Textbook Library is the result of a new project at the University of Minnesota focused on enhancing discoverability and peer review of OERs, including open textbooks.  David Ernst, University of Minnesota Chief Information Officer in the College of Education and Human Resources, and Executive Director of the Open Academics Textbook Initiative discusses this in his TEDx talk: http://www.youtube.com/watch?v=eA9Tv-OvoZU
  • Flat World Knowledge includes a catalog of resources, and an online editor so faculty can customize materials; it still offers affordable options but no longer completely free access.

For more catalogs, lists and platforms for OERs, see the guide from UMass:  OER For Educators

Questions to ponder:

  1. Should librarians select these kinds of resources for inclusion in our key discovery tools, such as the Catalog and Summon?  If so, which ones?
  2. What is the value to academic institutions in supporting the development of OERs financially and/or with staff support?
Image: Global OER Logo from UNESCO