Dena'ina Case Study: From Cassette to Easy-Access Software
- Introduction: Easy-Access Software
- The Dena'ina Audio Collection
- Digitization of the Dena'ina audio cassettes
- Aligning with ELAN
- Using XSL to transform to HTML
- The Finished Product
- Follow the path of the Dena'ina Data
The initial audio digitization was funded by the University of Alaska President's Special Projects Fund.
This case study shows the steps taken to convert cassette and open-reel recordings of traditional stories in Dena'ina Athabascan into a sentence-by-sentence, user-friendly HTML display of the stories and their translations. "Easy-access" principles were key in the development of the project, which means that the final products are intended to be simple to use and based on archival-quality files. Specifically, the developers of the project, Dr. Gary Holton, Dr. James Kari, and graduate researchers Andrea Berez, Sadie Williams, and Olga Müller kept the following points in mind:
- Product is written in non-proprietary, open-source code and requires no expensive software to operate (i.e., requires only a browser and QuickTime)
- Product requires only a computer with a CD drive and speakers to operate (i.e., no special hardware needed)
- Product does not require internet connection to operate (i.e., no streaming audio)
- Product has an attractive, easy to use, and intuitive interface
- Similar products could easily be made within the community, with only minimal training
- Product is based on archival-quality files: the very act of creating the product also produces an XML file; furthermore, previously archived XML files can easily be turned into products
Dena'ina is a fairly well-documented language of the Cook Inlet region of Alaska. In the 1960s, 1970s, 1980s and 1990s several linguists collected recordings of spoken Dena'ina on audio cassette and open-reel tape. These recordings included traditional stories, prayers, songs, wordlists, and ethnographic information. Many of these cassettes resided in the Alaska Native Heritage Center Archives, but some were in other locations, like private homes and various Alaskan libraries. Linguist James Kari has been working to gather the audio together to form the Dena'ina Audio Collection, a comprehensive assemblage linguistic recordings from the past forty years.
In 2003 and 2004, graduate researchers at the University of Alaska Fairbanks converted the tapes to digital format. The Dena'ina Audio Collection currently includes over 200 audio CDs, and the collection is growing as more Dena'ina audio recordings surface.
A fraction of the audio has been transcribed and translated into English over the years, and some has been published in various books on the Dena'ina people. Dr. Kari worked with Dena'ina speakers to correct previous translations, and typed them into WordPerfect. Dena'ina orthography has only one non-Roman character, ł, and its capital, Ł. In the past, this character was typed as a slash (\) when the ł character was not available, as with typewriters, or in pre-Unicode days. Dr. Kari's texts contained both versions (note: hatted-h and hatted-y were used in older Dena'ina texts, but not in Dr. Kari's).
Initially, the Alaska Native Language Center in Fairbanks received a grant from the University of Alaska President's Special Projects Fund to support converting the Dena'ina audio collection to digital form. Graduate researchers used a dual tape deck to feed the audio signal through an Edirol and into a PC. She used the Peak software to record the digital file, which she then burned onto CDs. When the entire collection of two hundred cassettes was completely digitized, the collection was backed up to both a free-standing hard drive and to the Arctic Region Supercomputing Center (http://www.arsc.edu).
Dr. Kari provided both the WordPerfect file and the digital audio file of some 20 traditional Dena'ina stories to graduate researcher Andrea Berez. After converting the files to Unicode, she used the ELAN software, created by the Max Planck Institute for Psycholinguistics, to create a two-tier alignment of the audio selections. ELAN also produces an XML file, which can be used for archiving purposes and, more importantly to this project, can be converted to a user-friendly HTML display with XSL (eXtensible Stylesheet Language).
After the alignments were complete, an XSL stylesheet was developed to render the XML files produced by ELAN into HTML. A few pieces of needed information were added to each XML file (for example, the story titles in Dena'ina and English), and then each file was converted to HTML. The stylesheet allowed for the addition of graphics, fonts, colors, and QuickTime plug-ins to play each audio selection. QuickTime was used as the media player because of the ability to use the <starttime> and <endtime> elements to play small portions of the audio, eliminating the need to cut the audio file into many small files.
Below is a screenshot of the final product, a sixteen-story compilation on CD-ROM. This project unites Dena'ina audio and text with an English translation for the first time.
- Get started: Summary of the Dena'ina conversion
- Digitize audio data: Audio pages (Classroom)
- Convert characters to Unicode: Conversion page (Classroom)
- Align text: Interlinearized glossed text pages (classroom)
- Store data: XML pages (classroom)
- Render data: Stylesheets pages (classroom)
|About the Data|
Case Study: ELAN
Alignment with ELAN
Converting XML to HTML
|About the Language|