Edwardians Online random image from the picture galleryrandom image from the picture gallery
ATTENTION : This web site is no longer updated, but remains functional for archival purposes.
 | About | Search | Original Project | Re-Use | Images | Sound |
Please visit ESDS Qualidata Online for the current version.

XML Application for Qualitative Data

Project Aims
Specific Objectives for an XML Application
Why do we Require a Uniform Format?
Why an XML Format?
Advantages of Using Standards
Development of an XML Application for Qualitative Data
Making Use of Available XML Applications
Applying the TEI and DDI in an Application for Qualitative Data
The TEI
TEI Guidelines and DTDs for Encoding Texts
Edwardians Phase I: Applying the TEI Guidelines for Transcriptions of Speech
The Data Documentation Initiative (DDI)
Extending the DDI Framework for Qualitative Data


Project Aims
A main aim of this project is to produce a comprehensive application in an XML (eXtensible Markup Language) format appropriate for interchange that will enable sophisticated online searching and information retrieval from encoded texts, and which is potentially applicable to other qualitative datasets. Ideally, the application should meet a number of specific objectives.

Specific Objectives for an XML Application
Firstly, we require an application that will support the encoding of the content of various types of documents produced in qualitative research. These include, for example, interview transcriptions, research diaries, survey questionnaires, case notes and transcriptions of focus groups and finally, contextual documentation, such as newspaper clippings; letters of correspondence; and researcher's notes.

The application format should also be capable of supporting the encoding of the researcher's original analysis of the datasets and any annotations they may have added to the primary materials.

We also require an application that is capable of providing formalised links between the texts and associated audio and video materials, with a view to providing, in the long term, integrated, multimedia resources.

Finally, the application should be able to represent metadata at the individual file, or interview level, and for the entire collection.

Why do we Require a Uniform Format?
A uniform format for encoding the content of datasets is useful for both data providers and users for the following reasons:

Why an XML Format?
The internationally defined standard for data exchange: the eXtensible Markup Language (XML) is a potentially useful technology for making the features of qualitative data explicit in machine-readable form.

XML is, in proper terms, a 'meta-language', a language that formally comprises the rules for defining a markup language. The standard allows for the specification of markup languages that make structural elements explicit in a document using a system of ordinary textual tags that are embedded in the text in an ordered hierarchy or 'tree' system. In other words, XML permits descriptive markup systems in which nested pairs of markup codes are used simply to provide names to categorise or classify parts of a document. The power of XML lies in the fact that it is extensible. That is, it allows for the creation of descriptive markup systems based upon a common vocabulary, but this vocabulary can be extended to accommodate the special requirements of a particular user or domain.

Various programs designed for different processing purposes can be applied to a descriptively marked up document. Furthermore, processing can be restricted to certain designated sections of the marked up document, according to the individual requirements of the user, and so the same document can be re-used for different purposes in different ways.

Advantages of Using Standards
Using a standard such as XML has a number of advantages for data developers, data providers and users in general. Firstly, there are a wide range of non-proprietary tools and related languages for manipulating and processing text from XML documents and new tools continue to appear almost on a daily basis. Among the more established tools for processing XML are style sheets; the standard being developed for XML is the eXtensible Stylesheet Language or XSL. Stylesheets are popular among publishers because they allow for the notion of 'writing once and publishing everywhere', thus it is relatively straightforward to provide different audiences with different 'views' of an XML encoded text. Secondly, other standards, frameworks, protocols and applications make use of generic standards, such as XML. Therefore, by adopting XML we are joining a wider community of users and increasing opportunities for compatibility and interchange.

Development of an XML Application for Qualitative Data
XML and related tools for creating and processing documents in XML have rapidly been adopted by communities of users for whom semantic tagging for their own application areas is essential. Examples where XML tag sets are specially adapted to allow markup of the types of information specific to the user community include the Data Documentation Initiative (DDI) for the social sciences and the Text Encoding Initiative (TEI).

With increasing recognition of the benefits of XML in creating non-proprietary, cross-platform applications, there has been interest in, and calls for, the development of a qualitative data XML markup language from members of the social science research community who are eager to encourage the re-use of social science data. Indeed, there has been some progress in applications for exporting coded data produced by CAQDAS (Computer Assisted Qualitative Data Analysis Software) software in an XML format (specifically ATLAS.ti), however further work is required in the definition of a common XML framework and associated DTDs (Document Type Definition).

The development of an XML application for marking up the content of qualitative datasets ideally will require support and contributions from various members of the social science community: data creators; CAQDAS software developers; data providers and end users.

Areas in which a community effort is of particular importance include:

Edwardians Online aims to provide the foundations for such an initiative.

Making Use of Available XML Applications
In our preliminary research we considered the option of 'going back to the drawing board' to create a new application of XML specifically for the purpose of marking up the content of spoken interviews and other types of qualitative material.

There are however two existing XML applications, Text Encoding Initiative (TEI) and the Data Documentation Initiative (DDI) which together are particularly relevant for our encoding objectives.

The TEI and DDI are in use in a wide range of projects, in the UK, the US and Europe and an application for qualitative data would benefit from the expertise and experiences of these user communities. Furthermore, both applications have detailed documentation and making use of these standards would create opportunities for using existing and forthcoming application tools.

A cost-effective and generally advantageous option, is to develop an application tailored for qualitative data, but one which is compliant with these models. In Edwardians Online, we are currently working on the adaptation and integration of TEI and DDI DTDs for a prototype DTD for qualitative data.

Applying the TEI and DDI in an Application for Qualitative Data

The TEI
The Text Encoding Initiative (TEI), now an international consortium, provides an SGML (Standard Generalized Markup Language), more recently, a sophisticated and comprehensive XML application and guidelines for the markup of different types of texts, including prose, drama, dictionaries and verse and transcriptions of speech.

The guidelines, DTDs, a list of international scholarly projects using TEI conformant markup and links to TEI software are available online at the TEI homepage.

TEI Guidelines and DTDs for Encoding Texts
The TEI guidelines are based upon the concept of an all-inclusive DTD for encoding all types of text. Users may select elements from the full TEI DTD in a customized application, thus compiling their own TEI compliant DTD.

Different tagsets are provided for different document types. Thus, in any particular application, a user may select one of the main subsets of the full DTD, a 'base' tag set, according to the particular type of text he is interested in encoding, for example a drama, a dictionary, verse or a transcribed record of speech.

Of course, a document may contain different types of text, for example, a book may contain verse and drama. To encode this type of work, users would select a 'mixed base', in other words, a combination of the basic tagsets.

TEI applications will also make use of a 'core' tag set, which includes a number of mandatory elements for encoding a TEI document, for example, the TEI header for encoding metadata and elements for describing features common to documents in general.

In addition, a number of specialised elements or 'additional' tag sets can be selected according to the specific content requirements of an application. These include tags for identifying features for specific analytical purposes or tags for marking up specific content features such as person names, place names, dates, monetary amounts, tables or figures. The TEI is therefore useful for encoding datasets with a mixed format, for example, collections comprising text, tabular and graphical data.

The generality of the TEI is thus particularly useful for our purposes since it can accommodate the encoding of the various types of qualitative data as described in our specific objectives.

Moreover, the flexibility of the TEI is such that a TEI-conformant DTD can be easily extended or modified to include other specialised element sets from the TEI, for different research requirements. For example, analytical elements for linguistic analysis could be added to the DTD for the markup of datasets in which a more detailed grammatical and transcription analysis was required. This might be important to a secondary study interested in say, conversation analysis. It is therefore especially relevant for our objective of providing qualitative data in a format suitable for re-use.

Edwardians Phase I: Applying the TEI Guidelines for Transcriptions of Speech
Interview Transcripts are perhaps the most common document type within the class of qualitative data. In the Edwardians Online project, which aims to develop a prototype DTD for qualitative data by working with the example of the Edwardians collection of interview texts, we are particularly interested in the guidelines and DTD components for transcribed spoken material. Initial research has shown that these can be used to represent the main structural elements in the FLWE (Family Life Work Experience) text, and the content of qualitative interview texts in general, in a transparent and straightforward manner.

Accordingly, we are undertaking markup of the main interview texts and are in the process of creating and documenting a TEI conformant DTD for the project. In Phase II of the project we hope to generalize from this example and to specify a framework for a full DTD for qualitative data.

In Phase II of the project we will also test the encoding and digitization of contextual documents in the collection of interview material such as letters of correspondence and autobiographical essays.

By selecting the full set of TEI elements for encoding features in transcribed speech in this DTD we are allowing for any future encoding of a more sophisticated transcription of the audiotapes.

The Data Documentation Initiative (DDI)
The Data Documentation Initiative (DDI) is a framework that aims to "establish an international criterion and methodology for the content, presentation, transport, and preservation of 'metadata' about datasets in the social and behavioral sciences". A useful introduction to the DDI and a list of projects implementing this framework is available at the DDI home page.

With the DDI's XML-based DTD for the markup of social science metadata or 'codebooks', metadata can now be created in a uniform, highly structured format that is easily and precisely searchable via web-based interfaces.

The DDI has received support across the social science community and is already in place in major European and US Social Science Archives, for example, the Council of European Social Science Data Archives (CESSDA) is moving its Integrated Data Catalog (IDC) to DDI format. The DDI is also supported by international research projects such as NESSTAR (Networked Social Science Tools and Resources), and any DDI application may benefit from their suite of sophisticated suite of software.

Extending the DDI Framework for Qualitative Data
To date the DDI does not address the problem of describing qualitative material, although it would be advantageous for the social science community if guidelines and an appropriate format were established within this general framework. Consequently in Edwardians Online we are also investigating how elements from the DDI may be adapted for a qualitative model, using the example of the FLWE collections.

Using the TEI, this work also involves research into the integration of and mapping of elements from the two DTDS. In this way we aim to ensure that our prototype application for qualitative data is compliant with the different DTDs, making use of particular elements in both whilst eliminating redundancy or 'overlap'.

About

Introduction

Online Project

Paul Thompson Biography

Citation Information

Contact Details

User Guide

Feedback and Evaluation

ESDS Qualidata, University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ.
Tel : +44 1206 873058 Fax : +44 1206 872003 Email : qualidata@esds.ac.uk
Updated : 20 January 2003 © 1996 - 2003 University of Essex. All rights reserved.