History Database Project/C-base: ERD redux!

OK.  This is my third try to create a table structure for C-base.  Yesterday, I met with UVA Library database specialist Mary Ellen MacNeil, who manages a sophisticated FilemakerPro database for the Dolley Madison Papers.  She mentioned that “Notes” really seemed like the core table that should have access to every other table.  One sure way to make sure this could be supported is simply to put Notes at the center of the database and use join tables to connect it with every other table.  I’m not sure it’s right, but I think it’s the best way to move forward and continue developing C-base prior to the first meeting with the grad students on 1/30.  So, today, I try to learn more about using these join tables to open portals between tables to work with data.  



This image represents the ERD (Entity Relationship Diagram) behind C-base.  It describes the eleven tables (or entities) that house different kinds of data and visualizes their relationships to one another.  On its own, each table is like a single spreadsheet:  each row in the spreadsheet is a separate record in the database; each column is a field, or a general type of data.  (In the field “Bibliography,” for instance, you can create a bibliography citation record in Chicago style for The Ideological Origins of the American Revolution).  The power of a relational database is that we can put these different entities together to build searches, generate reports, and organize activities, mixing these data sets together in interesting ways.

Three key ideas inform this structure.

  1. Taking notes is at the center of historical research, and so Notes is the table that is at the center of C-base that all the other tables feed into. Notes are our annotations, comments, ideas, and transcriptions from Sources and Objects that we can tag using Keywords and arrange to support Projects.
  2. Different kinds of data might inform the notes but these must be organized on their own to get the most out of them and to keep our data “clean” (that is, free of repetitions and errors across the database). The Sources table includes bibliographic metadata about books, articles, archival documents, maps, and any other materials historians consult. Objects are the texts, images, PDFs, statistical tables, and documents we collect and store in digital form.  Projects are the scholarly products of this work with Sources, Objects, and Notes, such as chapters, books, articles, visualizations, and annotated bibliographies.  Agents are people–historical figures that we would like to keep track of in our notetaking. Keywords are the terms we use to tag our notes with themes and subjects that will be the way we search the database and organize our Notes to complete Projects.
  3. Join tables are the means by which we manage the relationships of these separate tables to Notes (and to one another).  As this ERD shows, each entity has a defined “one-to-many” relationship with at least one other entity.  Each of our primary tables (Agents, Projects, Objects, Sources, and Keywords) figure as the “one” or the parent in the one-to-many relationships with “many” or child tables that join to our key table, Notes.  We will use these join tables to make use of data from each of these primary tables in Notes.  Getting this relationship diagram right is the key to making a relational database work.  This structure provides a stable architecture on which we can combine data from each of these different tables/entities in illuminating ways.

History Database Project (C-base): Thinking through Sources

Our Filemaker Pro template is taking shape in preparation for an initial presentation on January 30.  I’m going to call it “C-base” (short for Corcoran Department of History Research Database).

This week I met with with Ivey Glendon, manager of the Metadata Analysis & Design Unit at the UVA Library.  We discussed the great promise of using C-base as a personalized research portal that would open onto the web to make use of state-of-the-field tools and databases.

We brainstormed ways to create a rich group of fields in the Sources table without overburdening users who might not want or need all of that articulated metadata associated with our books, articles, maps, and archival materials.

The tentative solution: two layouts for the Source table: one called Citations and the other called Sources.  “Citations” has just two fields visible: Footnote and Bibliography.  Each can hold your source citation in the correct Chicago-style format with the least amount of fuss.  If you use Zotero along with C-base, you can simply generate these two kinds of citations and paste them in.  If you don’t, library catalogs and othe databases usually generate formatted citations.  You can use the Library’s Virgo catalog, for instance, to look up a source like this one.  Then pull down the Item Action menu and select “cite,” which takes you to this page, from which you can cut and paste the citation and plop it in these fields.  It looks like the UVA Library is going to start generating these in Chicago style–so little to no editing necessary.

So that’s fine for a basic citation, but I’m thinking it’s worth a bit more time to have the full range metadata for every source within C-base, allowing for much more detailed analysis.  I’m interested in researching a book project called “The Political Economy of the American Revolution” that would involve working with hundreds of original pamphlets.  It will be worthwhile to be able to search through this corpus by distinct fields.  I could then list these pamphlets in order of publication date and visualize them on a timeline with other events, compare them by place of publication, and do some text analysis of the terms in the titles.  But I can’t do any of this if all of the metadata is lumped together in a single text field.  In the Sources table, I’ve recreated the full Zotero field list, which is displayed in the Sources layout.  My provisional plan is to leverage the power of Zotero‘s bibliographic tools for C-base to populate these fields.  Here’s the idea:  add bibliographic items to Zotero collections, output these to a CSV (spreadsheet) file, and then upload this metadata to C-base.  That’s a few extra steps, but in my tests it works well–especially because Filemaker can import data from files and match the column headers of a spreadsheet to its field names.  One could just forget about C-base and do all of this within Zotero without all of the extra work, but I’m just not satisfied with Zotero as a tool for note-taking and analysis.  Because it’s such a great bibliographic management system, however, I’m working to find the best ways to link it to C-base.  We are going to look into RefWorks and see what that has to offer.  This reminds me of David Weinberger’s book, Small Pieces Loosely Joined: A Unified Theory of the Web, which argues that the ideal tools for digital communication aggregate particular tools that each do their particular thing quite well.