About Conversion

Page Index


This section is designed for linguists who have data in one format, physical or digital, and wish to convert it to another format. Conversion is a complex topic, and can be divided into at least three categories:

This section will deal with each of these in turn. For all types of conversion, however, the following principles are helpful:

Character Conversion

There is only one kind of character conversion which makes sense in the current digital climate, and that is conversion into the Unicode standard. This standard already contains almost all the characters linguists need, and will, in its later versions, contain them all. Thus there is a simple rule linguists should optimally follow: Convert your textual data into Unicode. Other encodings, such as ascii, are acceptable, but do not support as many languages as Unicode.

Character conversions are difficult for linguists, however. There are numerous utilities in existence for converting character sets to another. The Unix utility iconv will convert character sets between ISO 8859-X and Unicode, and the C3 system, developed by the Trans-European Research and Education Networking Association (TERENA) will convert between European character sets. But such conversion facilities are rarely of use to linguists, since they are designed for the conversion of standard sets of characters to other standard character-sets. They will thus convert Cyrillic into Latin, for example, or ISO-8859-X into Unicode. But linguists have in the past represented IPA either by using the non-Unicode encodings defined by such fonts as IPAKiel or the SIL font-suite, or with the (X)SAMPA alphabet in ASCII. Many have simply used arbitrary characters they themselves selected. These are hard to convert into Unicode, simply because of their arbitrariness.

The best comprehensive character conversion facility existing so far is one produced by SIL, called TECKit. However, this is a complex piece of software, and it requires some skill to use. It can also be modified to incorporate new mappings, but this is not easy to do. If you're interested in trying to do this yourself, there is a useful tutorial on the SIL site here. For the ordinary user, however, it is probably easier just to use a Unicode-aware piece of word-processing software like Word 2000 or XP, and globally replace characters by hand. The E-MELD project is currently developing a utility which will allow you to do simple mappings from one character set to Unicode; but as yet it is not ready for general use.

Find Character Conversion Software

Format Conversion

If you've been storing your data in one program -- FileMakerPro, for example, or even Word -- and wish to move your data to another -- perhaps more useful -- program, there is as yet no straightforward way to do it. What conversions you do will depend upon where you are coming from, and where you are going. The E-MELD project is currently developing utilities which will take data files from programs which linguists commonly use -- in particular Shoebox, Excel and FileMakerPro -- and convert them to a standard XML format, which can be read into XML aware programs such as E-MELD's FIELD lexical analysis tool-set. But as yet these are not ready for general use.

Audio & Video Conversion

To preserve the integrity of your audio and video data, the best rule of thumb is, don't convert. This is obviously seldom practical, since magnetic media deteriorate over time, and the equipment needed to play them often become obsolete even sooner. Therefore, it is recommended that you always preserve the data in its original form and maintain a conversion history for all data. This way, any loss of information during conversion is fully documented and can be traced back later. When you have to convert, do the absolute minimum number of format conversions you need, for some degree of information loss is almost inevitable in conversion from analog to digital formats.

To make a digital copy of analog data:

Digital-to-digital conversion is lossless when done properly. In order to convert from one digital format to another:

Digital conversion from recent audio and video formats is relatively simple. However, professional expertise is needed for conversion of materials on older media, such as audio wire recordings or wax cylinders, or nitrate or cellulose acetate film. If you have this sort of media, look for help from librarians or archivists at your institution, to avoid ruining irreplaceable recordings.

Find Audio and Video Conversion Software

Available Resources

Some good sources on conversion include the following:

The content of this page was developed following the recommendations of the Resource Conversion working group.

User Contributed Notes
About Conversion
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search