The Handbook of Discourse Analysis

The Handbook of Discourse Analysis The Handbook of Discourse Analysis

29.10.2014 Views

186 Douglas Biber and Susan Conrad first and second person pronouns, questions, “private” verbs (such as think or know), and contractions. The bottom group has fewer features, including nouns, attributive adjectives, and prepositional phrases. The statistical analysis shows that these two groups have a complementary relationship and thus constitute a single dimension: when a text has frequent occurrences of the top group of features, it will tend to have few occurrences of the bottom group, and vice versa. When dimension scores are computed for English Dimension 1, conversation texts are identified as the register that makes the most frequent use of the top group of features. Figure 9.1 plots the Dimension 1 score for several English registers, providing a graphic representation of the relations among registers with respect to this group of linguistic features. Conversation texts, with the largest positive Dimension 1 score, tend to have frequent occurrences of first and second person pronouns, questions, stance verbs, hedges, and the other features above the dashed line; at the same time, relative to the other registers, conversation texts have notably few occurrences of nouns, adjectives, prepositional phrases, and long words. At the other extreme, registers such as official documents and academic prose have the largest negative score, showing that they are marked for the opposite linguistic characteristics: very frequent occurrences of nouns, adjectives, prepositional phrases, and long words, combined with notably few occurrences of first and second person pronouns, questions, stance verbs, etc. Considering both the defining linguistic features together with the distribution of registers, each dimension can be interpreted in functional terms. Thus, the top group of linguistic features on English Dimension 1, associated most notably with conversation, is interpreted as reflecting interactiveness, high involvement, and on-line production. For example, interactiveness and involvement are reflected in the frequent use of you and I, and the private verbs that convey the thoughts and feelings of the participants, as well as many other features. The reduced and vague forms – such as contractions, that deletions, and general emphatics and hedges – are typical of language produced under real-time constraints. The bottom group of linguistic features, associated most notably with informational exposition, is interpreted as reflecting careful production and an informational focus. That is, as exemplified below, nouns, prepositional phrases, and attributive adjectives all function to convey densely packed information, and the higher type–token ratio and longer words reflect a precise and often specialized choice of words. Such densely informational and precise text is nearly impossible to produce without time for planning and revision. As noted earlier, one of the advantages of a comparative register perspective is to understand the linguistic characteristics of a particular register relative to a representative range of registers in the language. This advantage can be illustrated with respect to the specific register of research articles in biology (in the subdiscipline of ecology). Figure 9.1 shows that this register is extremely marked on Dimension 1, with a considerably larger negative score than academic prose generally. Even a short extract from an article shows the high density of informational features from Dimension 1 (nouns are underlined, prepositions italicized, and attributive adjectives capitalized): There were MARKED differences in root growth into regrowth cores among the three communities, both in the distribution of roots through the cores and in the

Register Variation: A Corpus Approach 187 response to ELEVATED CO 2 . In the Scirpus community, root growth was evenly distributed throughout the 15-cm profile, with no SIGNIFICANT differences in root biomass among the 5-cm sampling intervals within a treatment. All three of these features serve the purpose of densely packing the text with information about specific referents. Nouns refer to entities or concepts, and are then further specified by prepositional phrases, attributive adjectives, or other nouns which function as premodifiers (e.g. root growth). Clearly, the emphasis in this text is on transmitting information precisely and concisely, not on interactive or affective concerns. Furthermore, by considering the scores of other registers on Dimension 1, we can see that such densely packed informational features are not typical in more colloquial registers of English. For this reason, it is not surprising that many novices experience difficulty when asked to read biology research articles or write up research reports like a professional (cf. Walvoord and McCarthy 1990; Wilkinson 1985). Even with this very brief examination of just one dimension in the MD model of English, we can see why, linguistically, these texts are challenging and why students are unlikely to have had practice with such densely informational prose. 2.2 Comparison of the major oral/literate dimensions in English, Korean, and Somali The MD methodological approach outlined in the last section has been applied to the analysis of register variation in English, Korean, and Somali. Biber (1995) provides a full description of the corpora, computational and statistical techniques, linguistic features analyzed, and multidimensional patterns of register variation for each of these languages. That book synthesizes these studies to focus on typological comparisons across languages. Here we present only a summary of some of the more striking cross-linguistic comparisons. Table 9.4 presents a summary of the major “oral/literate” dimensions in English, Korean, and Somali. Oral/literate dimensions distinguish between stereotypical speech – i.e. conversation – at one pole, versus stereotypical writing – i.e. informational exposition – at the other pole. However, as discussed below, each of these dimensions is composed of a different set of linguistic features, each has different functional associations, and each defines a different set of relations among the full range of spoken and written registers. The first column in table 9.4 lists the co-occurring linguistic features that define each dimension. Most dimensions comprise two groups of features, separated by a dashed line on table 9.4. As discussed above for Dimension 1 in English, these two groups represent sets of features that occur in a complementary pattern. That is, when the features in one group occur together frequently in a text, the features in the other group are markedly less frequent in that text, and vice versa. To interpret the dimensions, it is important to consider likely reasons for the complementary distribution of these two groups of features as well as the reasons for the co-occurrence pattern within each group. It should be emphasized that the co-occurrence patterns underlying dimensions are determined empirically (by a statistical factor analysis) and not on any a priori

Register Variation: A Corpus Approach 187<br />

response to ELEVATED CO 2 . In the Scirpus community, root growth was evenly<br />

distributed throughout the 15-cm pr<strong>of</strong>ile, with no SIGNIFICANT differences in root<br />

biomass among the 5-cm sampling intervals within a treatment.<br />

All three <strong>of</strong> these features serve the purpose <strong>of</strong> densely packing the text with information<br />

about specific referents. Nouns refer to entities or concepts, and are then further<br />

specified by prepositional phrases, attributive adjectives, or other nouns which function<br />

as premodifiers (e.g. root growth). Clearly, the emphasis in this text is on transmitting<br />

information precisely and concisely, not on interactive or affective concerns.<br />

Furthermore, by considering the scores <strong>of</strong> other registers on Dimension 1, we can<br />

see that such densely packed informational features are not typical in more colloquial<br />

registers <strong>of</strong> English. For this reason, it is not surprising that many novices experience<br />

difficulty when asked to read biology research articles or write up research reports<br />

like a pr<strong>of</strong>essional (cf. Walvoord and McCarthy 1990; Wilkinson 1985). Even with this<br />

very brief examination <strong>of</strong> just one dimension in the MD model <strong>of</strong> English, we can see<br />

why, linguistically, these texts are challenging and why students are unlikely to have<br />

had practice with such densely informational prose.<br />

2.2 Comparison <strong>of</strong> the major oral/literate dimensions in<br />

English, Korean, and Somali<br />

<strong>The</strong> MD methodological approach outlined in the last section has been applied to the<br />

analysis <strong>of</strong> register variation in English, Korean, and Somali. Biber (1995) provides a<br />

full description <strong>of</strong> the corpora, computational and statistical techniques, linguistic<br />

features analyzed, and multidimensional patterns <strong>of</strong> register variation for each <strong>of</strong><br />

these languages. That book synthesizes these studies to focus on typological comparisons<br />

across languages. Here we present only a summary <strong>of</strong> some <strong>of</strong> the more striking<br />

cross-linguistic comparisons.<br />

Table 9.4 presents a summary <strong>of</strong> the major “oral/literate” dimensions in English,<br />

Korean, and Somali. Oral/literate dimensions distinguish between stereotypical speech<br />

– i.e. conversation – at one pole, versus stereotypical writing – i.e. informational<br />

exposition – at the other pole. However, as discussed below, each <strong>of</strong> these dimensions<br />

is composed <strong>of</strong> a different set <strong>of</strong> linguistic features, each has different functional<br />

associations, and each defines a different set <strong>of</strong> relations among the full range <strong>of</strong><br />

spoken and written registers.<br />

<strong>The</strong> first column in table 9.4 lists the co-occurring linguistic features that define<br />

each dimension. Most dimensions comprise two groups <strong>of</strong> features, separated by a<br />

dashed line on table 9.4. As discussed above for Dimension 1 in English, these two<br />

groups represent sets <strong>of</strong> features that occur in a complementary pattern. That is,<br />

when the features in one group occur together frequently in a text, the features in the<br />

other group are markedly less frequent in that text, and vice versa. To interpret the<br />

dimensions, it is important to consider likely reasons for the complementary distribution<br />

<strong>of</strong> these two groups <strong>of</strong> features as well as the reasons for the co-occurrence<br />

pattern within each group.<br />

It should be emphasized that the co-occurrence patterns underlying dimensions<br />

are determined empirically (by a statistical factor analysis) and not on any a priori

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!