Project Aims
Specific Objectives for an XML Application
Why do we Require a Uniform Format?
Why an XML Format?
Advantages of Using Standards
Development of an XML Application for Qualitative Data
Making Use of Available XML Applications
Applying the TEI and DDI in an Application for Qualitative Data
The TEI
TEI Guidelines and DTDs for Encoding Texts
Edwardians Phase I: Applying the TEI Guidelines for Transcriptions of Speech
The Data Documentation Initiative (DDI)
Extending the DDI Framework for Qualitative Data
The application format should also be capable of supporting the encoding of the researcher's original analysis of the datasets and any annotations they may have added to the primary materials.
We also require an application that is capable of providing formalised links between the texts and associated audio and video materials, with a view to providing, in the long term, integrated, multimedia resources.
Finally, the application should be able to represent metadata at the individual file, or interview level, and for the entire collection.
XML is, in proper terms, a 'meta-language', a language that formally comprises the rules for defining a markup language. The standard allows for the specification of markup languages that make structural elements explicit in a document using a system of ordinary textual tags that are embedded in the text in an ordered hierarchy or 'tree' system. In other words, XML permits descriptive markup systems in which nested pairs of markup codes are used simply to provide names to categorise or classify parts of a document. The power of XML lies in the fact that it is extensible. That is, it allows for the creation of descriptive markup systems based upon a common vocabulary, but this vocabulary can be extended to accommodate the special requirements of a particular user or domain.
Various programs designed for different processing purposes can be applied to a descriptively marked up document. Furthermore, processing can be restricted to certain designated sections of the marked up document, according to the individual requirements of the user, and so the same document can be re-used for different purposes in different ways.
With increasing recognition of the benefits of XML in creating non-proprietary, cross-platform applications, there has been interest in, and calls for, the development of a qualitative data XML markup language from members of the social science research community who are eager to encourage the re-use of social science data. Indeed, there has been some progress in applications for exporting coded data produced by CAQDAS (Computer Assisted Qualitative Data Analysis Software) software in an XML format (specifically ATLAS.ti), however further work is required in the definition of a common XML framework and associated DTDs (Document Type Definition).
The development of an XML application for marking up the content of qualitative datasets ideally will require support and contributions from various members of the social science community: data creators; CAQDAS software developers; data providers and end users.
Areas in which a community effort is of particular importance include:
Edwardians Online aims to provide the foundations for such an initiative.
There are however two existing XML applications, Text Encoding Initiative (TEI) and the Data Documentation Initiative (DDI) which together are particularly relevant for our encoding objectives.
The TEI and DDI are in use in a wide range of projects, in the UK, the US and Europe and an application for qualitative data would benefit from the expertise and experiences of these user communities. Furthermore, both applications have detailed documentation and making use of these standards would create opportunities for using existing and forthcoming application tools.
A cost-effective and generally advantageous option, is to develop an application tailored for qualitative data, but one which is compliant with these models. In Edwardians Online, we are currently working on the adaptation and integration of TEI and DDI DTDs for a prototype DTD for qualitative data.
The guidelines, DTDs, a list of international scholarly projects using TEI conformant markup and links to TEI software are available online at the TEI homepage.
Different tagsets are provided for different document types. Thus, in any particular application, a user may select one of the main subsets of the full DTD, a 'base' tag set, according to the particular type of text he is interested in encoding, for example a drama, a dictionary, verse or a transcribed record of speech.
Of course, a document may contain different types of text, for example, a book may contain verse and drama. To encode this type of work, users would select a 'mixed base', in other words, a combination of the basic tagsets.
TEI applications will also make use of a 'core' tag set, which includes a number of mandatory elements for encoding a TEI document, for example, the TEI header for encoding metadata and elements for describing features common to documents in general.
In addition, a number of specialised elements or 'additional' tag sets can be selected according to the specific content requirements of an application. These include tags for identifying features for specific analytical purposes or tags for marking up specific content features such as person names, place names, dates, monetary amounts, tables or figures. The TEI is therefore useful for encoding datasets with a mixed format, for example, collections comprising text, tabular and graphical data.
The generality of the TEI is thus particularly useful for our purposes since it can accommodate the encoding of the various types of qualitative data as described in our specific objectives.
Moreover, the flexibility of the TEI is such that a TEI-conformant DTD can be easily extended or modified to include other specialised element sets from the TEI, for different research requirements. For example, analytical elements for linguistic analysis could be added to the DTD for the markup of datasets in which a more detailed grammatical and transcription analysis was required. This might be important to a secondary study interested in say, conversation analysis. It is therefore especially relevant for our objective of providing qualitative data in a format suitable for re-use.
Accordingly, we are undertaking markup of the main interview texts and are in the process of creating and documenting a TEI conformant DTD for the project. In Phase II of the project we hope to generalize from this example and to specify a framework for a full DTD for qualitative data.
In Phase II of the project we will also test the encoding and digitization of contextual documents in the collection of interview material such as letters of correspondence and autobiographical essays.
By selecting the full set of TEI elements for encoding features in transcribed speech in this DTD we are allowing for any future encoding of a more sophisticated transcription of the audiotapes.
With the DDI's XML-based DTD for the markup of social science metadata or 'codebooks', metadata can now be created in a uniform, highly structured format that is easily and precisely searchable via web-based interfaces.
The DDI has received support across the social science community and is already in place in major European and US Social Science Archives, for example, the Council of European Social Science Data Archives (CESSDA) is moving its Integrated Data Catalog (IDC) to DDI format. The DDI is also supported by international research projects such as NESSTAR (Networked Social Science Tools and Resources), and any DDI application may benefit from their suite of sophisticated suite of software.
Using the TEI, this work also involves research into the integration of and mapping of elements from the two DTDS. In this way we aim to ensure that our prototype application for qualitative data is compliant with the different DTDs, making use of particular elements in both whilst eliminating redundancy or 'overlap'.