The Handbook of Discourse Analysis
The Handbook of Discourse Analysis The Handbook of Discourse Analysis
184 Douglas Biber and Susan Conrad variation. In section 2.1 we introduce the multidimensional approach to register variation with specific reference to the MD analysis of English. In 2.2 we then briefly summarize the major patterns of register variation across English, Korean, and Somali. 2.1 Overview of the multidimensional (MD) approach to register variation The MD approach to register variation was developed to provide comprehensive descriptions of the patterns of register variation in a language. An MD analysis includes two major components: (1) identification of the underlying linguistic parameters, or dimensions, of variation; and (2) specification of the linguistic similarities and differences among registers with respect to those dimensions. Methodologically, the MD approach has three major distinguishing characteristics: (1) the use of computer-based text corpora to provide a broad representation of the registers in a language; (2) the use of computational tools to identify linguistic features in texts; and (3) the use of multivariate statistical techniques to analyze the co-occurrence relations among linguistic features, thereby identifying underlying dimensions of variation in a language. MD studies have consistently shown that there are systematic patterns of variation among registers; that these patterns can be analyzed in terms of the underlying dimensions of variation; and that it is necessary to recognize the existence of a multidimensional space (rather than a single parameter) to adequately describe the relations among registers. The first step in an MD analysis is to obtain a corpus of texts representing a wide range of spoken and written registers. If there are no pre-existing corpora, as in the case of the Korean and Somali analyses, then texts must be collected and entered into computer. The texts in these corpora are then automatically analysed (or “tagged”) for linguistic features representing several major grammatical and functional characteristics, such as: tense and aspect markers, place and time adverbials, pronouns and nominal forms, prepositional phrases, adjectives, adverbs, lexical classes (e.g. hedges, emphatics, speech act verbs), modals, passives, dependent clauses, coordination, and questions. All texts are postedited interactively to correct mis-tags. Next, the frequency of each linguistic feature in each text is counted. (All counts are normalized to their occurrence per 1000 words of text.) A statistical factor analysis is then computed to identify the co-occurrence patterns among linguistic features, that is, the dimensions. These dimensions are subsequently interpreted in terms of the communicative functions shared by the co-occurring features. Interpretive labels are posited for each dimension, such as “Involved versus Informational Production” and “Narrative versus Non-narrative Concerns.” In addition, dimension scores for each text are computed by summing the major linguistic features grouped on each dimension; this score provides a cumulative characterization of a text with respect to the co-occurrence pattern underlying a dimension. Then, the mean dimension scores for each register are compared to analyze the salient linguistic similarities and differences among spoken and written registers. To illustrate, consider English Dimension 1 in figure 9.1. This dimension is defined by two groups of co-occurring linguistic features, listed to the right of the figure. The top group (above the dashed line) consists of a large number of features, including
Register Variation: A Corpus Approach 185 Involved telephone conversations 35 30 25 20 15 10 5 0 –5 –10 –15 face-to-face conversations Personal letters spontaneous speeches public conversations Romance fiction prepared speeches Mystery and adventure fiction General fiction Professional letters broadcasts Science fiction Religion Humor Popular lore; Editorials; Hobbies Biographies Press reviews Academic prose; Press reportage linguistic features Positive features: Private verbs that deletion Contractions Present tense verbs 2nd person pronouns do as proverb Analytic negation Demonstrative pronouns General emphatics 1st person pronouns Pronoun it be as main verb Causative subordination Discourse particles Indefinite pronouns General hedges Amplifiers Sentence relatives wh-questions Possibility modals Nonphrasal coordination wh-clauses Final prepositions Negative features: Nouns Word length Prepositions Type–token ratio Attributive adjectives –20 Official documents Ecology research articles Informational F = 111.9, p < .0001, r 2 = 84.3% Figure 9.1 Mean scores of English Dimension 1 for twenty-three registers: “Involved versus Informational Production”
- Page 156 and 157: 134 Gregory Ward and Betty J. Birne
- Page 158 and 159: 136 Gregory Ward and Betty J. Birne
- Page 160 and 161: 138 Laurel J. Brinton 7 Historical
- Page 162 and 163: 140 Laurel J. Brinton of particular
- Page 164 and 165: 142 Laurel J. Brinton Shakespeare,
- Page 166 and 167: 144 Laurel J. Brinton argues that i
- Page 168 and 169: 146 Laurel J. Brinton typical of th
- Page 170 and 171: 148 Laurel J. Brinton 1 What is the
- Page 172 and 173: 150 Laurel J. Brinton politeness. 1
- Page 174 and 175: 152 Laurel J. Brinton that this are
- Page 176 and 177: 154 Laurel J. Brinton functions; -s
- Page 178 and 179: 156 Laurel J. Brinton Linguistics,
- Page 180 and 181: 158 Laurel J. Brinton Kastovsky, Di
- Page 182 and 183: 160 Laurel J. Brinton semantic-prag
- Page 184 and 185: 162 John Myhill verb, and direct ob
- Page 186 and 187: 164 John Myhill refer to a supposed
- Page 188 and 189: 166 John Myhill (2) Binasa ng lalak
- Page 190 and 191: 168 John Myhill These data have bee
- Page 192 and 193: 170 John Myhill Sometimes, translat
- Page 194 and 195: 172 John Myhill text-count methodol
- Page 196 and 197: 174 John Myhill Sun, Chaofen and Ta
- Page 198 and 199: 176 Douglas Biber and Susan Conrad
- Page 200 and 201: 178 Douglas Biber and Susan Conrad
- Page 202 and 203: 180 Douglas Biber and Susan Conrad
- Page 204 and 205: 182 Douglas Biber and Susan Conrad
- Page 208 and 209: 186 Douglas Biber and Susan Conrad
- Page 210 and 211: 188 Douglas Biber and Susan Conrad
- Page 212 and 213: 190 Douglas Biber and Susan Conrad
- Page 214 and 215: 192 Douglas Biber and Susan Conrad
- Page 216 and 217: 194 Douglas Biber and Susan Conrad
- Page 218 and 219: 196 Douglas Biber and Susan Conrad
- Page 221 and 222: Nine Ways of Looking at Apologies 1
- Page 223 and 224: Nine Ways of Looking at Apologies 2
- Page 225 and 226: Nine Ways of Looking at Apologies 2
- Page 227 and 228: Nine Ways of Looking at Apologies 2
- Page 229 and 230: Nine Ways of Looking at Apologies 2
- Page 231 and 232: Nine Ways of Looking at Apologies 2
- Page 233 and 234: Nine Ways of Looking at Apologies 2
- Page 235 and 236: Nine Ways of Looking at Apologies 2
- Page 237 and 238: Interactional Sociolinguistics 215
- Page 239 and 240: Interactional Sociolinguistics 217
- Page 241 and 242: Interactional Sociolinguistics 219
- Page 243 and 244: Interactional Sociolinguistics 221
- Page 245 and 246: Interactional Sociolinguistics 223
- Page 247 and 248: Interactional Sociolinguistics 225
- Page 249 and 250: Interactional Sociolinguistics 227
- Page 251 and 252: Discourse as an Interactional Achie
- Page 253 and 254: Discourse as an Interactional Achie
- Page 255 and 256: Discourse as an Interactional Achie
184 Douglas Biber and Susan Conrad<br />
variation. In section 2.1 we introduce the multidimensional approach to register variation<br />
with specific reference to the MD analysis <strong>of</strong> English. In 2.2 we then briefly<br />
summarize the major patterns <strong>of</strong> register variation across English, Korean, and Somali.<br />
2.1 Overview <strong>of</strong> the multidimensional (MD) approach to<br />
register variation<br />
<strong>The</strong> MD approach to register variation was developed to provide comprehensive<br />
descriptions <strong>of</strong> the patterns <strong>of</strong> register variation in a language. An MD analysis includes<br />
two major components: (1) identification <strong>of</strong> the underlying linguistic parameters,<br />
or dimensions, <strong>of</strong> variation; and (2) specification <strong>of</strong> the linguistic similarities<br />
and differences among registers with respect to those dimensions.<br />
Methodologically, the MD approach has three major distinguishing characteristics:<br />
(1) the use <strong>of</strong> computer-based text corpora to provide a broad representation <strong>of</strong> the<br />
registers in a language; (2) the use <strong>of</strong> computational tools to identify linguistic features<br />
in texts; and (3) the use <strong>of</strong> multivariate statistical techniques to analyze the<br />
co-occurrence relations among linguistic features, thereby identifying underlying<br />
dimensions <strong>of</strong> variation in a language. MD studies have consistently shown that there<br />
are systematic patterns <strong>of</strong> variation among registers; that these patterns can be analyzed<br />
in terms <strong>of</strong> the underlying dimensions <strong>of</strong> variation; and that it is necessary to recognize<br />
the existence <strong>of</strong> a multidimensional space (rather than a single parameter) to<br />
adequately describe the relations among registers.<br />
<strong>The</strong> first step in an MD analysis is to obtain a corpus <strong>of</strong> texts representing a wide<br />
range <strong>of</strong> spoken and written registers. If there are no pre-existing corpora, as in the<br />
case <strong>of</strong> the Korean and Somali analyses, then texts must be collected and entered into<br />
computer. <strong>The</strong> texts in these corpora are then automatically analysed (or “tagged”)<br />
for linguistic features representing several major grammatical and functional characteristics,<br />
such as: tense and aspect markers, place and time adverbials, pronouns and<br />
nominal forms, prepositional phrases, adjectives, adverbs, lexical classes (e.g. hedges,<br />
emphatics, speech act verbs), modals, passives, dependent clauses, coordination, and<br />
questions. All texts are postedited interactively to correct mis-tags.<br />
Next, the frequency <strong>of</strong> each linguistic feature in each text is counted. (All counts are<br />
normalized to their occurrence per 1000 words <strong>of</strong> text.) A statistical factor analysis is<br />
then computed to identify the co-occurrence patterns among linguistic features, that<br />
is, the dimensions. <strong>The</strong>se dimensions are subsequently interpreted in terms <strong>of</strong> the<br />
communicative functions shared by the co-occurring features. Interpretive labels<br />
are posited for each dimension, such as “Involved versus Informational Production”<br />
and “Narrative versus Non-narrative Concerns.” In addition, dimension scores for<br />
each text are computed by summing the major linguistic features grouped on each<br />
dimension; this score provides a cumulative characterization <strong>of</strong> a text with respect to<br />
the co-occurrence pattern underlying a dimension. <strong>The</strong>n, the mean dimension scores<br />
for each register are compared to analyze the salient linguistic similarities and differences<br />
among spoken and written registers.<br />
To illustrate, consider English Dimension 1 in figure 9.1. This dimension is defined<br />
by two groups <strong>of</strong> co-occurring linguistic features, listed to the right <strong>of</strong> the figure. <strong>The</strong><br />
top group (above the dashed line) consists <strong>of</strong> a large number <strong>of</strong> features, including