Bibliographic Database Migration (Bookends to Zotero)

Lane DeNicola
 

Revision History
Overview
The Solution (Short Version)
The Steps Involved in Migration
Considerations Specific to Bookends and Zotero
One Approach to Migrating Your Library
Requirements and Limitations
Bugs and Comments
Licensing

Revision History

Jan 4, 2009: addition of link to Zotero forum thread on field conversion in word processing documents; minor clarifications of prose.
August 5, 2008: correction of typos; addition of link back to post in the Zotero forum; added BE library verification command.
July 6, 2008: correction of typos
June 29, 2008: initial version

Overview

This short document will be of interest to those who are considering a move from the Bookends bibliographic manager to Zotero. The #1 source of information on this topic is (of course) the Zotero documentation itself and the users' import/export forum; this document is intended as an anecdotal supplement to those resources, one that focuses in some detail on a very specific case—Bookends users moving to Zotero—which has seen a bit of discussion, but has remained relatively obscure compared to other more popular reference managers.

A notice about this webpage has been posted to a thread in the Zotero forum.

Since this is typically a one-time process for any given user (exporting your library once for import into a new ref manager), and since the problems lie at the intersection of how Bookends and Zotero do things (and since I haven't yet made the time to sort out the details of the Zotero development process), it made sense to me to have a quick-and-dirty, highly user-configurable utility—independent of both Bookends and Zotero—that "impedance matches" them, rather than making (or lobbying for) alterations to the source code of either.  It may be that a Zotero plugin would be the most simple vehicle for some users, but the approach outlined here relies on a Perl script (
and a custom Bookends bibliographic format file, which would likely be necessary in any case).

A number of advantages are provided by this approach—see the details below.

You don't need to be a programmer to get this approach to work, but a bit of Unix-ish expertise with the MacOS X Terminal (command-line) would be helpful.  You will also likely need to edit a (plain text) data file, so it wouldn't hurt to be generally handy with a good text editor that has no problem dealing with multiple encodings (e.g. TextWrangler or Emacs, both of which are free and available for the Mac).

The be2z package and Zotero import were tested on two test platforms, which are referred to in this document as follows:

Test Platform A:
Intel Core 2 Duo (custom built) with 4Gb of memory running Fedora 8 (Werewolf), Kernal Linux 2.6.25.4-10
Test Platform B: 800 Mhz G4 (PowerPC) iMac with 768 Mb of memory running MacOS X 10.4.11

Wherever used, the generic terms Bookends, Zotero, and Firefox refer specifically to Bookends version 9.1 and Zotero version 1.0.6/Firefox 2.0.0.15.

I won't detail here the advantages and disadvantages of switching to Zotero or any other package—more than enough information and user reactions can be found on the aforementioned forum and the Wikipedia entry comparing reference management software. A principal rationale for me was that I was switching from MacOS X to Linux.

The Solution (Short Version)

This archive contains three items: a Bookends Format file (named "BEDump"), a Perl script (named "be2z") and a data file on which it depends ("be2zm.pm").  After installing BEDump you will be able to export your Bookends library to this custom tagged format. The be2z package can then process that intermediate (BEDump) file—along with your Attachments folder, if one exists—allowing import into Zotero in the usual manner.

The Steps Involved in Migration

The full process of moving from one bibliographic manager to another—even when just a single user will be building and making exclusive use of the database—involves a number of steps aside from installing and configuring the new software:

  1. You also have to learn the ins and outs of a new interface, including the way it works (or doesn't) with your word processor.
  2. You may need to install plug-ins allowing the new bibliographic software to interact with your word processor or other software.
  3. You may need to obtain, define, or at least double-check bibliographic stylesheets for the journals or bibliographic formatting standards you typically use.
  4. Most recently-developed bibliographic packages permit online search and citation download via one or more protocols (Z39.50, http, etc.). In some cases you may need to configure the software to deal with proxies or other aspects of your Internet connection.
  5. Most importantly here, you have to export your existing bibliographic database and import it into your new manager, possibly doing some data clean-up or processing in-between.  This is the part of the migration process this document is concerned with.
  6. A separate issue (not addressed here) is the possible conversion of citation fields in legacy documents.  If you have unfinished drafts or any documents that you hope to cut-and-paste from that contain embedded citation fields (constructed using your old bibliographic software), you'll need to consider whether or not having those citations in the older format is a concern. My experience is that this is the aspect of migration that has been least addressed, and very little provision is ever made for converting existing documents to new bibliographic managers (but see this thread for relevant discussion).

Considerations Specific to Bookends and Zotero

Consideration #1: Export/Import Formats

Bookends 9.1 exports to the following formats:
while Zotero 1.0.6 can import data in the following formats:
The lack of overlap here complicates switching from Bookends to Zotero.  Both the Refer and RIS formats are comparatively simple and text-based, however, and their use for import/export has been advocated by Zotero developers in the forum.  Refer is a text-based, tagged format originally designed as a part of the Unix troff (and GNU groff) typesetting software, but it was also supported by Endnote prior to version 10. RIS is another text-based, tagged format, this one designed for importing references into the proprietary Reference Manager package from ISI (the same folks who make Endnote). Not surprisingly, this is the format now supported in Endnote X (version 10).

The Refer format has no provision for retaining links to attachments, an important consideration if (like me) you've made significant use of Bookends' ability to attach full-text PDFs of articles to their records in the database (an ability Zotero also possesses).

RIS does have some provision for retaining attachment links, but version 9 of Bookends did not come with an RIS output format specification file.  An RIS file is available for download off the developer's website, but these format files (which, sadly, are in a proprietary binary format) require the latest version of Bookends and cannot be used by earlier versions. Version 10 of Bookends, meanwhile—a $29 upgrade for those who purchased any earlier version prior to July 1, 2007—actually produces "faulty" RIS output.  An obvious alternative is to produce your own RIS format file via the Bookends Format Manager, but this has a fairly cumbersome interface, involving obscure control codes and an inscrutable grammar (see the Bookends User Manual section on the Formats Manager,
pp. 118-146 in version 9).

Consideration #2: Mapping Reference Types and Fields

34 reference types are available in Zotero 1.0.6, while 27 are available in Bookends 9 (ten of them user-definable).  Further, not all reference types available in Bookends are available in Zotero (e.g. "Edited Book").  Finally, the 35 reference types available in the RIS spec do not overlap 100% with either those of Zotero OR Bookends.  Some user intervention is therefore required in deciding on the most appropriate mapping from one to the other.  This table compares the three (note that the first four "unused" Bookends reference types are shown with example customizations to illustrate how they could be mapped to Zotero references):

BookendsRISZotero
ArtworkARTArt WorkArtwork?
Audiovisual materialADVSAudiovisual materialFilm
BookBOOKBook, WholeBook
Book chapterCHAPBook chapterBook Section
Conference proceedingsCONFConference proceedingConference Paper
DissertationTHESThesis/DissertationThesis?
Edited bookBOOKBook, WholeBook
Editorial------
In pressINPRIn PressManuscript
Journal articleJOURJournalJournal Article
LetterPCOMMPersonal communicationInterview
MapMAPMapMap?
Newspaper articleNEWSNewspaperNewspaper Article
PatentPATPatentPatent?
Personal communicationPCOMMPersonal communicationInterview
Review------
InternetELECElectronic CitationWeb page
Congressional testimony (Unused 1)HEARHearingHearing?
Film (Unused 2)MPCTMotion pictureFilm
Organizational literature (Unused 3)PAMPPamphletManuscript
Report (Unused 4)RPRTReportReport
Unused 5STATStatuteStatute?
Unused 6------
Unused 7------
Unused 8------
Unused 9------
Unused 10------
---ABSTAbstract---
---BILLBill/ResolutionBill?
---CASECaseCase?
---COMPComputer programComputer Program?
---CTLGCatalog?
---DATAData file?
---GENGeneric?
---ICOMMInternet CommunicationE-mail? Forum post? Instant Message?
---JFULLJournal, full?
---MGZNMagazine articleMagazine Article?
---MUSICMusic Score?
---SERSerial: Book, Monograph?
---SOUNDSound recordingAudio Recording?
---UNBILUnenacted bill/resolution?
---UNPBUnpublished workManuscript?
---VIDEOVideo recordingVideo Recording?
------Blog Post
------Dictionary Entry
------Document
------Encyclopedia Article
------Letter
------Podcast
------Radio Broadcast
------TV Broadcast
------Presentation

Similarly, Bookends, RIS, and Zotero contain different fields, and the fields available generally vary with the reference type (to complicate things, fields may have different labels depending on the reference type). Again, the user must decide how these fields are to be mapped from one application to the other. This table once again shows all the fields available in Bookends and RIS, and how the mapping is configured by default so as to get a desired mapping into Zotero fields (as stated, however, be2z makes this user-configurable, to some degree):

Bookends CodeFieldRIS Tag
For Zotero Import
aAuthor(s)A1 or AUAU
bAbstractN2N2
dDateY1 or PYPY
eEditorA2 or EDED
fJournal (full)JF or JOJF
hAttachmentsL1L1 (Zotero imports as PDF attachment)
iIssue #IS or CPIS
jJournal (short)JAJA
kKeyword(s)KWKW (gets massaged)
lLocationCYCY
nNotesN1 or ABN1
pFirst pageSPSP
p or p-Page rangeEPEP (gets massaged)
sShort titleAVnot imported by Zotero
tTitleT1,TI,CT,BTTI
uPublisherPBPB
vVolumeVLVL
u1User1U1N1
u2User2U2N1 (Bookends often uses for Translator)
u3UserU3N1 (Bookends often uses for Edition)
u4UserU4N1
u5UserU5M1 (Bookends often uses for Call #)
u6User--SN (Bookends often uses for ISSN/ISBM)
u7User--N1 (Bookends often uses for Language)
u8User--N1
u9User--N1
u10User--N1
u11User--N1
u12User--N1
u13User--N1
u14User--N1
u15User--N1
u16User--N1
yRef typeTYTY (see Type Map above)
zURLURL2 (Zotero imports as HTML attachment)
@Unique ID#IDnot imported by Zotero
#Sequential #--be2z strips out
!Database name--be2z strips out
---Misc 1M1M1 (Zotero imports into "Extra:" field)
---Misc 2M2M2 (Zotero imports into "Extra:" field)
---Misc 3M3not imported by Zotero
---Link to fulltext (PDF)L2L2 (Zotero imports as HTML attachment)
---Related recordL3be2z strips out
---Link to imageL4L4 (Zotero imports as image attachment)
---ISSN/ISBNSNbe2z translates from User6
---Reprint statusRPbe2z strips out
---Series titleT3be2z strips out
---Series authorA3be2z strips out (Zotero does not import)
---Secondary dateY2be2z strips out
---Perio abbr 1J1be2z strips out
---Perio abbr 2J2be2z strips out
---AddressADbe2z strips out
---End of RecordERER

Consideration #3: Text Encoding

Text encoding represents another set of issues involved here, at least when using text-based formats as an intermediate import/export mechanism. Currently the Zotero developers are following the RIS spec in assuming that an RIS file to be imported will be text using the IBM-850 encoding (see related forum thread)—though they are not strictly enforcing that spec, as Zotero will (for example) accept entries for the Date field in a variety of formats (see related forum thread), beyond just the "YYYY/MM/DD/other info" stipulated in the spec.

When you export RIS (or any other text-based format) from Bookends, however, it uses the default encoding of "plain text" on MacOS X, which is Unicode (under MacOS 9 it was yet another encoding: MacRoman).
If (like me) you downloaded many of the references in your Bookends library from online sources, you likely have many accented and other specialized characters in your data that you yourself never typed in.

There are utilities for converting text files from one encoding to another (e.g. iconv on Linux), but when you are translating from an encoding that represents many many characters to a more limited encoding that can only represent a small subset of the larger one's characters, decisions have to be made about how to represent those characters for which there is no representation.  For example, there is no "em dash"
character (—) in IBM-850.  It may be fine from the user's perspective to substitute a regular dash (-) or even two (--) in such cases, but iconv has no provision for performing such substitutions, and only some support for helping you find all such characters in your export file.  Many people perform this substitution manually in a text editor (such as Emacs), but performing a search-and-replace on "all characters that cannot be encoded according to IBM-850" is hardly a trivial problem.

On a related note, if you're migrating away from MacOS, the end-of-line characters in your export file must be translated (for Windows or Unix/Linux).

One Approach to Migrating Your Library

The approach detailed here involves exporting your bibliographic library to a customized (but simple) tagged-format text (Unicode) file, intermediate processing using a Perl script, and subsequent import into Zotero.

Phase 1: Export

Download this archive, which contains a Bookends Format file (named "BEDump"), a Perl script (named "be2z") and a data file on which it depends ("be2zm.pm").  Place the dump format file in your Bookends format directory (typically something like <home-directory>/Library/Application Support/Bookends/Custom Formats/). Writing out your entire Bookends library using the BEDump format will ensure that every field—regardless of reference type or customizations—gets written out to a text file (UTF-16 encoded, under MacOS X).  This file can then be parsed and manipulated much more simply using text editors such as TextWrangler or text-savvy tools such as Perl. You can also construct a BEDump format file yourself (which would allow you, for example, to employ this solution with Bookends version 10).  Simply specify the following in the  Primary Order field (for all 27 reference types), ensuring no additional wrapping takes place if you cut-and-paste:

$%y$ y¬$%a$ a¬$%b$ b¬$%d$ d¬$%e$ e¬$%f$ f¬$%h$ h¬$%i$ i¬$%j$ j¬$%k$ k¬$%l$ l¬$%n$ n¬$%p$ p¬$%p-$ p-¬$%s$ s¬$%t$ t¬$%u$ u¬$%v$ v¬$%u1$ u1¬$%u2$ u2¬$%u3$ u3¬$%u4$ u4¬$%u5$ u5¬$%u6$ u6¬$%u7$ u7¬$%u8$ u8¬$%u9$ u9¬$%u10$ u10¬$%u11$ u11¬$%u12$ u12¬$%u13$ u13¬$%u14$ u14¬$%u15$ u15¬$%u16$ u16¬$%z$ z¬$%@$ @¬$%#$ #¬$%!$ !¬$%%$¬

You needn't specify a Secondary Order, and authors' and editors' names can in general be first-name-first or last-name-first (Zotero appears to import either). However, the Punctuation:Between names and Punctuation:Between last names must be a semicolon. See p. 126 of the Bookends User Manual, version 9, for details on the process of creating a file format.

Once you have the format file in place, ensure it's available by selecting Biblio->Formats Manager..., checking the BEDump checkbox, and clicking Done.

At this point you will likely want to do any final clean-up of your Bookends references (e.g. consolidation of keywords; removal of returns from all but the author, editor, keyword, and note fields; insertion of semicolons where necessary between multiple keywords).  Perform a final library verification using File->Database Maintenance->Verify... (see p. 89 of the Users' Manual) and back up the result.

Ensure all your references are showing (Window->List View->All or CMD-L) and select all the records in your library (Refs->Mark->Mark All References or SHIFT-CMD-A), or select some subset if you don't wish to export/import them all.  Go to the Bibliography Formatter (Biblio->Bibliography Formatter... or SHIFT-CMD-B), select BEDump and as UTF-8 from the output format pull-down menus, and select Send the bibiography to Disk. Click Make Bib and you'll be prompted to name the export file. Export may take several minutes depending on the speed of your Mac and the size of your library.

Phase 2: Processing

With export completed, you're ready for processing.  The be2z script translates the BEDump output into a quasi-RIS file, importable by Zotero. The be2zm module (a little chunk of Perl data/code) contains user-configurable mapping tables used by be2z. Ensure the permissions for both files are set correctly (read/write/execute, or 755), and put them wherever you wish (<home-directory>/bin is a fine choice), but keep in mind that be2zm.pm must be on your Perl "include path" (keeping them in the same directory usually works). The be2z package offers the following:The script is designed to be run in a Terminal window/on the command-line, like so:

[lane@localhost]$ be2z <path-to-BEDump-file> <absolute-path-to-attachments-folder>


Given a file "Bibliography.utf8" the script will produce an output file "Bibliography-processed.utf8" ready for inspection and Zotero import. In all liklihood you'll need to tweak some aspects of be2z prior to use. In particular:

1) You may need to change the first line of be2z to match the path to your Perl executable.

2) You should at least look over the three main associative arrays defined in be2zm.pm (%field_map, %type_map, and %encoding_map). If you do modify these structures (add or omit elements), remember to escape all special characters in your substitution—in particular +?.'*"^$()[]{}|\.  Also, all but the last element should be followed by a comma prior to the comment on the same line, e.g.:

%encoding_map = (
     "\N{EN DASH}"                                => "-",        # 2013
     "\N{EM DASH}"                                => "--",       # 2014
     "\N{LEFT SINGLE QUOTATION MARK}"             => "\'",       # 2018
     "\N{RIGHT SINGLE QUOTATION MARK}"            => "\'",       # 2019
     "\N{LEFT DOUBLE QUOTATION MARK}"             => "\"",       # 201c
     "\N{RIGHT DOUBLE QUOTATION MARK}"            => "\"",       # 201d
     "\N{BULLET}"                                 => "\*",       # 2022
     "\N{HORIZONTAL ELLIPSIS}"                    => "...",      # 2026
     "\N{TRADE MARK SIGN}"                        => "\(TM\)",   # 2122
     "\N{BLACK FOUR POINTED STAR}"                => "\*",       # 2726
     "\N{GREEK CAPITAL LETTER OMEGA}"             => "\(omega\)",# 03a9
     "\N{DOUBLE LOW-9 QUOTATION MARK}"            => "\""        # 201e  <------------- Note this line has no comma!
     );

3) For %field_map, you should leave the Bookends field codes (those on the left side of the =>) just as they are, but modify the RIS tags as you see fit.

4) For %type_map, the Bookends reference types (those on the left side of the =>) should be modified to exactly match those in your Bookends library (these are case-sensitive).  The RIS reference type tags (those on the right side of the =>) should be modified to match the desired Zotero reference type.

5) For %encoding_map, any Unicode characters in your Bookends library without a direct translation in the IBM-850 encoding should be listed, along with the string to be substituted for all occurrences of that character. As illustrated above, the characters must be specified using the \N{} notation and the long-form Unicode names, the most recent version of which is available here.  Note that the characters to be substituted must be representable within the IBM-850 encoding (currently there's no check on this).

6) Tracking down precisely which non-translatable characters exist in your Bookends library and where is not a trivial process, but it can be eased somewhat by trial-running be2z on the exported UTF-8 file.  When it attempts to write a non-translatable character into the output IBM-850 file, a warning will be issued and the code of the UTF-8 character responsible (in the form \x{dddd}) will be substituted in the output.  You can then search for strings of that form in the trial output file (using any competent text editor such as Emacs or TextWrangler), look up the long name corresponding to that character in the aforementioned table, and add a corresponding line for substitution to %encoding_map as described above.  Doing so will then take care of every occurrence of that character in your library.  Obviously, if your library contains a great deal of unusual characters (e.g. many of your references are in Korean), this approach would likely be impractical.

be2z currently provides no feedback on its progress unless an error occurs, but
on a library of 1,800 records with approximately 160 attached documents (250 Mb) Test Platform A took approximately 20 seconds to process, while Test Platform B took around 90 seconds.

Phase 3: Import

Import the processed (quasi-RIS) file as you normally would, via Zotero's Actions menu (the little gear icon). As before, how long this takes will vary with the size of your library, the number of attachments, and the speed of your computer.  On the test library described above, the import required about 8-10 minutes on Test Platform A; the same library required about 90 minutes to import on Test Platform B.

Requirements and Limitations

be2z requires Perl (it's been tested successfully on versions 5.8.6 and 5.8.8) as well as a couple of the modules that come standard with Perl (Encode and PerlIO).

Note that be2z will include periods in imported keywords, and splits up authors, editors, keywords and any fields mapped to those fields on semicolons and newlines, not spaces or periods. Thus the following in the keyword field of a Bookends reference:

Weimar Republic -- Germany. Optics.

will get translated into a single tag in Zotero, which is likely not what you want.  The only alternatives are to 1) insert semi-colons in all such instances in Bookends prior to export, or 2) insert semi-colons in the exported BEDump file using a (Unicode-capable) text editor. Either of these may involve quite a bit of work if you have a big library, as there's no simple, generalizable way to search for such cases.

Further, only those fields mapped to author, editor, keyword, or note should contain returns (newlines).  be2z will stop if it detects other fields stretched across multiple lines in the exported quasi-RIS file.  Further, returns/newlines in notes get replaced with spaces on import by Zotero.

There appears to be no RIS tag that gets mapped to the Editor field of Zotero, including "ED."  All editors get mapped to the Contributor field.

be2z should give a bit of feedback on its progress.

Bugs & Comments

I'd be very interested in hearing of your experience with trying to get be2z to work, regardless of the platform you're working on. Let me hear from you, good or bad, and I'll update the script as time permits.  Please mention all relevant version numbers (Bookends, Zotero, Firefox, your operating system, and Perl interpreter) and describe the library you're attempting to import.

Licensing

Creative Commons License
This web page is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 License. See package contents for separate licensing.