<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <title>Digitization Project</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/" />
   <link rel="self" type="application/atom+xml" href="http://bloggery.wlu.edu/digproj/atom.xml" />
   <id>tag:bloggery.wlu.edu,2008:/digproj/130</id>
   <updated>2008-05-14T13:05:12Z</updated>
   <subtitle>Blog for tracking the libraries digitization efforts.</subtitle>
   <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.32</generator>

<entry>
   <title>Loading Documents</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/05/loading_documets.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.3257</id>
   
   <published>2008-05-14T12:58:20Z</published>
   <updated>2008-05-14T13:05:12Z</updated>
   
   <summary>I tried loading a test document into the archive today, and there are a few things to be aware of when loading documents: First, since each page is its own file, there needs to be a way to tell people...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      I tried loading a test document into the archive today, and there are a few things to be aware of when loading documents:

First, since each page is its own file, there needs to be a way to tell people which file represents the first page, second page, etc.  There is a &quot;description&quot; field for each file, I suggest we use this to designate the order of the pages.  It would also be good if it designated wether the file was the large archival file or the display image.

So:

&quot;First page: Large archival quality&quot;

Is a possible model.

Second, I&apos;m seeing problems with the transcripts-since the file names are going to be part of the URLs, they can&apos;t have spaces in the filenames.  They also don&apos;t seem to have any line breaks, which means when loaded into a web browser, they scroll off to the side.  I am going to try and fix these problems, but it will probably have to be done manually.
      
   </content>
</entry>
<entry>
   <title>Names</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/05/names.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.3256</id>
   
   <published>2008-05-13T18:01:54Z</published>
   <updated>2008-05-13T18:05:19Z</updated>
   
   <summary>For the purposes of forming titles, the names of Barclays&apos; mother is &quot;Mary Elanor Paxton Barclay&quot; and the name of the sister is &quot;Hannah Moore Barclay.&quot;...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      For the purposes of forming titles, the names of Barclays&apos; mother is &quot;Mary Elanor Paxton Barclay&quot; and the name of the sister is &quot;Hannah Moore Barclay.&quot;
      
   </content>
</entry>
<entry>
   <title>Scanning standards</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/05/scanning_standards.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.3232</id>
   
   <published>2008-05-07T18:26:07Z</published>
   <updated>2008-05-14T13:11:41Z</updated>
   
   <summary>I&apos;m relocating this from the staff wiki, since all the rest of the information about the project is here. Note that some of this information conflicts with earlier posts-where that is the case, this information takes precedence. Scanning Standards For...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      <![CDATA[I'm relocating this from the staff wiki, since all the rest of the information about the project is here.  Note that some of this information conflicts with earlier posts-where that is the case, this information takes precedence.

<strong>Scanning Standards</strong>

For the Barclay letters project: 

Approximate time per image: 4 mins 30 secs 

1. Open the HP solution center. 

2. Choose "scan picture." 

3. Make sure the initial scan screen is set to scan a color picture, that it is scanning from the glass, that it is saving to a file, and that it is saving as a .tiff file. The resolution should be set to 300. I have tried to make all these setting defaults, but check just in case. 

4. Click Scan 

5. On the next screen, set the filename and the base scan name. Files should be saved to c:/digital project. Then click OK. 

6. on this next screen, you need to adjust the lighten/darken and sharpness settings. These are found to the right of the screen. The light/darkness settings should be: Highlights: -20, Shadows 0, Midtones 0. Sharpness should be high. The scanning software will give you attitude about some of these settings later, just remember, it's wrong, and it needs to use your settings. 

7. Now adjust the scan area by grabbing and moving the dotted lines. 

8. Now hit accept. The software will start giving you grief here about your settings, make sure you tell it to use yours and not the recommended scanning settings. 

9. Now, open the file in photoshop. As you are opening the file, you will need to right-click on it, choose "rename" and use your delete and arrow keys to clean off the last three numbers in the scan filename. The scanner appends these automatically, I have not been able to make it stop. 

10. Depending on how you scanned it, you may need to rotate the image so the text is reading left to right. Do this by going to Image->Rotate Canvas. 

11. Go to image->Image size and reduce the size to APPROXIMATELY 800 pixels wide by 600 pixels high. This is not a hard and fast rule-the letters are different shapes and the exact measurements you use will vary. Use your judgement. 

12. Go to File->Save As. Set the "format" drop down to "jpeg." Adjust the filename. Hit Save. 
In the image quality dialogue, set the qaulity slider to the exact middle. click OK. 

13. rise and repeat. 

<strong>File Naming</strong>
Files are named by the following protocol: 

First, the date of the letter in YYYY_MM_DD in numeric format: 1867_20_01 

Then "archive" or "web" depending on whether the image is a jpeg or a tiff. 

Then The sequence numbers, separated by colons: 1_5 is the first of five sides. 

Example: 1856_06_02_archive_1_4.tiff 

Is the first page of a five-page letter written on June second, 1856, and this particular image is an archival tiff. 

<strong>Storage</strong>We are storing the files at c:/digital project. Kyle is trying to back them up to a network drive periodically. 
]]>
      
   </content>
</entry>
<entry>
   <title>Workflow</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/02/workflow.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.2918</id>
   
   <published>2008-02-04T14:05:09Z</published>
   <updated>2008-02-04T14:24:42Z</updated>
   
   <summary>Here is how documents get scanned and loaded into Dspace: * Jean Scans the documents according to scanning guidelines we&apos;ve established (they are posted to this blog). All fils-transcripts, and image files, use the same naming convention: (authors last name)-(type)-(month)-(day)-(year)...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      <![CDATA[Here is how documents get scanned and loaded into Dspace:

*  Jean  Scans the documents according to scanning guidelines we've established (they are posted to this blog).  All fils-transcripts, and image files, use the same naming convention:

(authors last name)-(type)-(month)-(day)-(year)

"type" is:

tr = transcript

ar = archival tiff

ds = display jpg

So for example:

<em>Barclay-tr-January-17-1862.txt</em>

Is a typical filename for a transcript.

* As they are available, Jean loads the documents into dspace at the following URL:  http://dspace.nitle.org/handle/10090/1065. She makes a "first pass" at filling out the metadata using the guidelines posted on this blog.  Each submission will contain three files:  An archival tiff, a display jpeg, and a transcript file in text format.

*  Once submitted, the item goes through an accept/reject step, which Kyle is responsible for.  Primarily, this is so that the filenames and types can be checked to make sure all the requisite files are present and uncorrupted.

*  The submission then goes to Holt so that the metadata can be checked and expanded.  Specifically, his responsibility is to expand the description and subject keyword fields.  

*  Once Holt has done this, the submission goes to Vaughan for a final metadata check.  when he is satisfied, it goes into the public archive.]]>
      
   </content>
</entry>
<entry>
   <title>Revised (again) Metadata framework</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/02/revised_again_metadata_framewo.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.2917</id>
   
   <published>2008-02-04T13:53:30Z</published>
   <updated>2008-02-04T14:04:34Z</updated>
   
   <summary>Fields not mentioned in this document should be left blank. Authors Should be filled in with the name of the writer of the letter. Title This would follow the format: Letter, (writer) to (recipient), (date in Month, day, year format)...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      <![CDATA[Fields not mentioned in this document should be left blank.

<strong>Authors</strong>

Should be filled in with the name of the writer of the letter.

<strong>Title</strong>

This would follow the format:  <em>Letter, (writer)  to (recipient), (date in Month, day, year format)</em>

<strong>Publisher</strong>

Leave this blank.  The metadata template will fill this in with the information:

"Washington and Lee Special Collections"


<strong>Date of Issue</strong>

This should be filled in with the date the letter was written.

<strong>Type</strong>

Set to "other"

<strong>Language</strong>

English (united states)

<em>Hit the "next button</em>

<strong>Subject Keywords</strong>

This will contain zero to an unlimited number of key phrases or words that describe important concepts in the letter.  These phrases are not part of a formal subject classification system.  They are selected by our volunteer and expanded and vetted by our historian.

<strong>Description</strong>

This field will contain a one to three sentence description of the letter created by our volunteer and vetted by our historian.

]]>
      
   </content>
</entry>
<entry>
   <title>We Got Problems...</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/01/we_got_problems.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.2854</id>
   
   <published>2008-01-22T19:21:46Z</published>
   <updated>2008-01-22T19:45:15Z</updated>
   
   <summary>I took our prototype metadata framework and tried using it to load a sample item into the database yesterday. There were a couple of problems: * While the &quot;date.created&quot; field does exist, the only way to load any information into...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      I took our prototype metadata framework and tried using it to load a sample item into the database yesterday.  There were a couple of problems:

*  While the &quot;date.created&quot; field does exist, the only way to load any information into this field is to go in as an admin AFTER the entire submission process is over and put in in manually.  Worse than this, you can&apos;t search on this field.  At all.  The only date Dspace seems to care about is the date the item was loaded into the system.

*  Likewise, the &quot;publisher&quot; field has to be filled out manually after the entire submission process is over, and this also cannot be searched on.

* The &quot;type&quot; field, which we had wanted to be able to set as &quot;letter&quot; has only fixed choices, and &quot;letter&quot; is not one of them.  I was able to manually edit this field, but only after the entire submission process is done.


      
   </content>
</entry>
<entry>
   <title>Revised Framework</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2008/01/revised_framework.html" />
   <id>tag:bloggery.wlu.edu,2008:/digproj//130.2830</id>
   
   <published>2008-01-18T14:03:52Z</published>
   <updated>2008-01-22T19:21:00Z</updated>
   
   <summary>On this one, I am labeling everything with the field into which it will go. Publisher &quot;Washington and Lee Special Collections&quot; This information is uniform for all items. Date.created This should be filled in with the date the letter was...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      <![CDATA[On this one, I am labeling everything with the field into which it will go.

<strong>Publisher</strong>

"Washington and Lee Special Collections"

This information is uniform for all items.

<strong>Date.created</strong>

This should be filled in with the date the letter was written, in dd/mm/yyyy format.

<strong>Title</strong>

We are still working on how to formulate unique titles for each document.

<strong>Description</strong>

This field will contain a one to three sentence description of the letter created by our volunteer and vetted by our historian.

<strong>Keywords</strong>

This will contain zero to an unlimited number of key phrases or words that describe important concepts in the letter.  These phrases are not part of a formal subject classification system.  They are selected by our volunteer and expanded and vetted by our historian.

<strong>Type</strong>

We had wanted this field to be "letter," but the choices seem to be fixed, and letter is not one of them.  So I am using "image."

Not metadata per se, the title of the collection within Dspace shall be:

"The letters of John Barclay (MSS number)"

We had discussed the neccessity of creating a page or document describing the collection.   there are a few ways to go about doing this:

*  Within Dspace, each collection has a "description" field.  To see what this looks like when displayed, go here:  <a href="http://dspace.nitle.org/handle/10090/1065">http://dspace.nitle.org/handle/10090/1065</a>

* I could try to find an existing field in the database to use. for a URL, as we discussed previously.]]>
      
   </content>
</entry>
<entry>
   <title>Framework</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2007/12/framework.html" />
   <id>tag:bloggery.wlu.edu,2007:/digproj//130.2733</id>
   
   <published>2007-12-12T18:31:38Z</published>
   <updated>2007-12-12T18:50:43Z</updated>
   
   <summary>From Vaughans notes, with my own expansions: Name of institution This should probably be kept in the &quot;publisher&quot; field. We need a standardized name (&quot;Washington and Lee?&quot; &quot;Washington &amp; Lee?&quot; &quot;W&amp;L?&quot;). Name of Repository Hmmm. I&apos;m not seeing a field...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      <![CDATA[From Vaughans notes, with my own expansions:

<strong>Name of institution</strong>

This should probably be kept in the "publisher" field.  We need a standardized name ("Washington and Lee?"  "Washington & Lee?"  "W&L?").

<strong>Name of Repository</strong>

Hmmm.  I'm not seeing a field that leaps out as a logical place to put this.  We could make it part of the description?

<strong>Name of collection</strong>

I'm not sure this should even go in the metadata, because we could actually use the name of the collection in the real world as the name of the collection in Dspace, where it would serve the same purpose.

<strong>MSS number of collection</strong>

Most logical place I can see would be an "identifier" field.  These fields are usually used for numeric identifiers that are unique to the item, though (examples would be the ISBN number).

Are we sure we need this?  If the document is available on the web, why does it matter to the user how we classify or organize the physical material?  Ideally, they wouldn't even need to access the physical version.  Ever.

<strong>Date of Document</strong>

I'm assuming this refers to when the letter was written.  Given that, the best place would probably be the "date.created" field.

<strong>Identification of document</strong>

Is this the title?  If so, it should def. go in the "title" field.

<strong>Summary Info</strong>

Definitely the "description" field.

<strong>Keywords</strong>

I think these can either be placed in the description field, or in the subject field.  If we actually have some home-grown subject headings, then we ought to put those there and put keywords in the description field.]]>
      
   </content>
</entry>
<entry>
   <title>Answers to Questions</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2007/12/answers_to_questions.html" />
   <id>tag:bloggery.wlu.edu,2007:/digproj//130.2727</id>
   
   <published>2007-12-12T17:36:01Z</published>
   <updated>2007-12-12T18:27:08Z</updated>
   
   <summary>Some of the things I was asked to look into last time: 1. Character limits in the Dspace archive: There appear to be none anywhere. I tried loading an item with a page worth of text from wikipedia entered for...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      Some of the things I was asked to look into last time:

1.  Character limits in the Dspace archive:  There appear to be none anywhere.  I tried loading an item with a page worth of text from wikipedia entered for every free-text field...and it took the information and displayed it.  However, there still may be character limits in the external services that might harvest our data, so it would still be wise to keep them reasonably short.

2.  Copyright:  I called Sally Waint, who is our resident copyright expert.  She said we are good to go.

3.  Coverage/inclusive dates fields:  These fields are free-text fields, just like most of the other fields.  It seems we can put in dates in any format we like:  &quot;7/10/06&quot; or &quot;last tuesday.&quot;  I don&apos;t see anything in the dublin core standard about how dates should be entered, but I will keep looking.
      
   </content>
</entry>
<entry>
   <title>Metadata Frameworks</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2007/11/metadata_frameworks.html" />
   <id>tag:bloggery.wlu.edu,2007:/digproj//130.2657</id>
   
   <published>2007-11-12T19:14:01Z</published>
   <updated>2007-11-12T19:43:32Z</updated>
   
   <summary>I&apos;ve been doing some searching for metadata information for the past few days, and unfortunately, am not having a lot of luck. Some of the letter projects are using metadata in ways that we can&apos;t because of technical limitations. Many...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      <![CDATA[I've been doing some searching for metadata information for the past few days, and unfortunately, am not having a lot of luck.  Some of the letter projects are using metadata in ways that we can't because of technical limitations.  Many simply do not have any metadata information readily available, or the information they do have available is at a level of detail that's not really sufficient for our needs.  And some of them are part of much larger projects that have complicated metadata schemas designed for a large collection made up of many different kinds of objects-letters, maps, photos, etc.  

We may not need a template from another project to begin with.  Our archive is actually pretty rigid about the types of data it expects to receive.  It is using a framework called Dublin Core, which  calls for specific information to be attached to every uploaded object.  You can read more about the Dublin core elements at their <a href="http://dublincore.org/documents/dces/">webpage</a>.  

To summarize, there are fifteen elements that are used to describe any object uploaded into the archive. We cannot add more elements.  We could choose to "ignore" certain elements by not filling them out with any information.  Each element has a name and a general indication of what type of data should be entered into it.  For example, the "title" element is defined as "the name by which the resource is formally known."  The descriptions of the information that go into each element are fairly vague-it is up to us to decide what "title" means in the context of a specific collection.

So the question before us is not "what types of information do we want to keep about this project." It is instead, "How will we use these fifteen elements to describe the items in this collection?" 

Take the "title" example above.  Do we want a title for every letter?  If so, how do we form that title?  Do we take the letter writers name, append the date of the letter, and enter that as "title?"  Or do we form a title in some other, completely different way?  Or not use that element at all (leave it blank)?]]>
      
   </content>
</entry>
<entry>
   <title>Metadata Meeting</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2007/11/metadata_meeting_1.html" />
   <id>tag:bloggery.wlu.edu,2007:/digproj//130.2643</id>
   
   <published>2007-11-08T14:53:57Z</published>
   <updated>2007-11-08T15:05:05Z</updated>
   
   <summary>Present: Felker, Merchant, Stanley We spent most of the first part of this meeting bringing Holt up to speed on the project: what we are trying to do, our limitations, etc. We agreed that the Barclay letters are a really...</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      Present:  Felker, Merchant, Stanley

We spent most of the first part of this meeting bringing Holt up to speed on the project: what we are trying to do, our limitations, etc.

We agreed that the Barclay letters are a really good collection for this project.  Scanning of those items will continue.  For right now, we are storing them on the portable hard drive in my office.

We decided to look at the metadata that other digital projects are using to classify historical materials.  Vaughan will contact some of the people at UVA to ask about the organizational information they keep, And i will do some hunting on the web for other examples.

I also agreed to send Holt a few examples of the digitized images to look at.

I will call another meeting in a week or so.
      
   </content>
</entry>
<entry>
   <title>Metadata meeting</title>
   <link rel="alternate" type="text/html" href="http://bloggery.wlu.edu/digproj/2007/10/metadata_meeting.html" />
   <id>tag:bloggery.wlu.edu,2007:/digproj//130.2575</id>
   
   <published>2007-10-24T15:49:34Z</published>
   <updated>2007-10-24T17:31:51Z</updated>
   
   <summary>Present: Laura turner, Kyle Felker, Vaughan Stanley, Frank Settle We started to discuss creating a metadata schema, but on advisement from Frank, we decided that it would be wise to include a member of the history faculty in these discussions....</summary>
   <author>
      <name>Kyle Felker</name>
      
   </author>
   
   
   <content type="html" xml:lang="en" xml:base="http://bloggery.wlu.edu/digproj/">
      Present:  Laura turner, Kyle Felker, Vaughan Stanley,   Frank Settle

We started to discuss creating a metadata schema, but on advisement from Frank, we decided that it would be wise to include a member of the history faculty in these discussions.  the project that we ahve chosen may not be the one most valued by historians, and we will need their help in trying to do resource description.

So, Vaughan is going to make contact with a member of the history faculty and ask if he is interested in participating.  If he is, we will schedule another meeting that includes him.


      
   </content>
</entry>

</feed>
