|
Mark Myslín :: Academic :: Review of “Lexical repulsion between sense-related pairs” by Antoinette Renouf and Jayeeta Banerjee Mark Myslín Mark Myslin in International Journal of Corpus Linguistics 12:3. 2007. (pp. 415–444) Antoinette Renouf and Jayeeta Banerjee say they have discovered the tails side of the traditional collocation coin: active ‘lexical repulsion’ between word pairs, the opposite phenomenon to the lexical attraction that has long informed corpus-based linguistic studies. Certain word pairs seem to repel each other based entirely on convention: we do not say not almost although not nearly is perfectly acceptable. Renouf and Banerjee argue lexical repulsion is an “unexplored textual feature”[1] and propose some original applications, but overshadowing their entire study—and sometimes seeping unintentionally into their analyses—is the largely unacknowledged reality that lexical repulsion is less a new force of language than it is a simple extension of the notion of collocation. Renouf and Banerjee frame this particular article as a jumping-off point for future studies of lexical repulsion, here focusing exclusively on sense-related pairs: the synonyms almost/nearly, seat/chair, argue/discuss, and pretty/attractive, and the antonyms black/white and hot/cold. Z-scores compare two words’ observed frequency of collocation and the expected frequency if all the words in the corpus were randomly ordered, with lexical repulsion defined as z ≤ -2, ‘weak’ collocation defined as -2 < z < 2, and ‘strong’ collocation defined as z ≥ 2. With almost/nearly, for example, nearly repelled 78 words that strongly collocated with almost (such as see, seems, and become), while almost repelled 28 words that strongly collocated with nearly (such as far, man, and not). For each pair, Renouf and Banerjee semantically categorize the ‘anti-collocates’ and offer a brief analysis—in this case, that almost repels physical measurement while nearly repels contexts of modality. The paper concludes lexical repulsion is thus semantically explicable, in general, and not a mere matter of convention. Renouf and Banerjee’s presentation is generally clean and direct. The tone of the article is clear and even entertaining at times, with quips such as one that it is “a sad reflection on UK broadsheet journalism that attractive is more closely associated with business deals than with human appearance.”[2] Methodology and data are for the most part sufficiently transparent—with one important exception. Each word’s anti-collocates are listed by ascending z-score (presumably) but with no individual numerical data, leading one to wonder just how strong certain anti-collocates are or just how many tokens there actually were. This is especially true when the authors toss out teasers such as this one, with no numerical explanation following: “Where the collocate in question, e.g. reclining, is rare in the corpus, the significance scores for its co-occurrence are bigger.”[3] Simply including individual word frequencies in the lists would have removed a lot of mystery. The choice of corpus for the study, while not spectacular, is hardly dismissible: 800 million words compiled from one British newspaper from 1989 to 2000 and another from 2000 to 2006, a corpus chosen simply “for convenience,” as a neatly tucked-away endnote admits.[4] Although the usual representative deficiencies of newspaper corpora[5] do apply and the data sample is subject to the whims of just two newspaper editorial boards, Renouf and Banerjee are refreshingly upfront about the limitations of their corpus. Rather than pretending to analyze all of the English language, they specify early on that the paper’s conclusions apply only to journalese, and periodically remind the reader of this with phrases such as “in our journalistic corpus.”[6] Regarding the meat of the study, however, Renouf and Banerjee consistently tiptoe around the fact that lexical repulsion can be seen as a function of collocation, and not as “another force” of language.[7] This shows through in their analyses of word pairs. Taking the above example that almost avoids numerical measurement while nearly repels modal auxiliaries used in hedging, Renouf and Banerjee go on to rephrase their claim almost verbatim but this time in terms of attraction. “In other words,” they write (with striking regularity across examples), “nearly seems to modify precise numbers, while almost contributes to a discourse which is down-playing certainty.”[8] Their final analyses, then, are exclusively in the language of traditional attraction-based studies. One such study, Kaunisto (1999), for example, achieves the same goal of drawing semantic differences between related word pairs (electric/electrical and classic/classical) more simply and perhaps more gracefully than Renouf and Banerjee: Kaunisto compares, side-by-side, the frequency of each word’s occurrence in different contexts using clean and legible tables that allow the reader to judge the strength (or presence) of repulsion rather than rely on numberless anti-collocate lists.[9] In the end, it seems, Renouf and Banerjee’s new framing of “lexical repulsion” may be little more than a freshly-branded but slightly less direct route to the kind of knowledge already available through traditional methods. It is also important to note that in some cases Renouf and Banerjee’s semantic analysis is essentially nonexistent. One of their stated goals is to describe objective, quantifiable differences between words that seem to be differentiated strictly by convention, but there is little apparent effort to this end in cases such as that of pretty/attractive, for example. Although they thoroughly describe the more obvious differences—pretty is a general adverbial, while attractive describes deals and propositions—they offer little insight into the more interesting and problematic question of physical good looks, writing somewhat banally that “pretty repels nouns with human reference which are not described as pretty.”[10] Unhelpful conclusions such as this one are thankfully infrequent, but for some readers they may, at first glance, diminish the appeal of lexical repulsion as a useful tool of semantic analysis. Renouf and Banerjee briefly propose a number of applications of lexical repulsion that are admirable in intent but ultimately imperfect from a practical viewpoint. For instance, non-native speakers of English would seem to benefit from knowledge of conventionally incompatible, but otherwise plausible, word combinations. To this end, Renouf and Banerjee suggest generating “lexical repulsion lists” for given headwords that non-native speakers could consult to improve their writing. Any fully automated compilation of such lists based on Renouf and Banerjee’s method, however, would be problematic. Because corpora provide only probabilistic, and not absolute, negative evidence, it would be erroneous to assume a headword and a word x with a co-occurence z-score of <-2 in a particular corpus cannot co-occur, or that the headword’s sense-related counterpart would be more appropriate with x. Examples abound in Renouf and Banerjee’s study: although it lists words such as people, groups, scientists, analysts, and supporters as repelled by discuss but attracted by argue, it does not follow that Campaign supporters discussed their options is ungrammatical or best rephrased Campaign supporters argued their options. An infallible lexical repulsion list, if such a thing can exist, would probably necessitate significant manual trimming, but the subjectivity of such a process would compromise the empirical advantage of a purely corpus-based method. A second, and related, application proposed by Renouf and Banerjee is in NLP. Typists would be alerted to contextually ‘inappropriate’ words, as determined by lexical repulsion measures, in much the same way they are already alerted to misspelled words. Although this approach would catch all contextually inappropriate words, not just the (possibly relatively few) words a second language learner would think to look up in a cumbersome lexical repulsion list, the fundamental hazards discussed above still apply. Further, automatically checking typed words’ collocates against potentially vast repulsion lists in real time of typing might, for the time being, be excessively taxing on mainstream personal computers. All of that is not to say Renouf and Banerjee’s perspective of examining repulsion rather than attraction is without its points. One interesting sociocultural application that cropped up unexpectedly in the comparison of black and white—and that received only a passing mention in the study—is the idea of lexical repulsion as a measure of subtle, potentially subconscious, stereotyping in speech and writing. In opposition to black as an ethnic term, white repelled positive words such as successful, talented, ambitious, and respected, suggesting that in the popular mind, such qualities may be defaults for white people but must be expressly specified if applied to black people (a problem that received some attention in 2007 when a United States senator was described as an “articulate and bright” African-American[11]). One can imagine a number of potential studies of stereotyping based on what is not said—in other words, what is quietly assumed—about certain groups, as opposed to what is said about other groups.
Renouf and Banerjee’s study is commendable for highlighting a less commonly considered
perspective on collocation and sharing some truly interesting data, but the fact remains that their
findings are not groundbreaking. In their concluding remarks, they describe their finding that
“lexical repulsion is not just an arbitrary matter of convention but is explicable in terms of semantic and other qualities” as “rather fundamental.”[12] However, if we understand lexical repulsion to be a
function of collocation in general, a reasonable claim considering that attraction-based approaches
such as Kaunisto’s show repulsion as clearly as Renouf and Banerjee’s study does, it follows that the
basic principles of collocation also apply to repulsion, and traditional studies such as Kaunisto’s have
already shown that collocation reveals semantic differences between seemingly conventionally-differentiated words. Examining language through the lens of lexical repulsion rather than attraction
can yield some intriguing new insights in certain contexts, but it is important to understand the place
of repulsion in the broader context of established language features such as collocation. Notes
|