Doppelleben

Between Belgium and Berlin

Pushing the limits of sociolinguistics May 15, 2009

This essay is merely a wandering in my mind and contains thinking experiments that I would like to debate with my collegues.

Studying actual language use shows immediately that there is a huge amount of variation to be accounted for. Differences between two distinct languages (inter-lingual variation) are easy to spot. You could try to compare Dutch and German, and make an intuitive list of similarities and differences. Sometimes even, variation within one language (intra-lingual variation) is clear as well. Intuitively, laymen can write down what the difference is between Belgian Dutch and Netherlandic Dutch, or German German and Austrian German, or British English and Irish English, etc.

Sociolinguistics

Sociolinguistics looks for a connection between social parameters and intra-lingual variation. A very simple research question could be about the difference between male and female speakers of Irish English. Traditionally, sociolinguists look for differences along the following sociological lines: region, socio-economic situation (social class), ethnicity, sex (gender), age, education, religion, power differences between two speakers, social distance between speakers, etc. They hope to find a correlation with a certain linguistic feature. These linguistic features can be of all kinds: syntax (e.g. word order), morfology (e.g. adjective declination), phonology (e.g. h-deletion), pragmatics (e.g. requests), etc.

There are two ways of looking at the situation: either you start from a certain sociological profile, and you look how a linguistic feature behaves, or you start from a varying linguistic feature, and see if there is a sociological profile that correlates with the internal linguistic variation. A lot of interesting research has been performed, with big names such as Labov, Chambers, Eckert, etc. Traditionally, the sociolinguists – sometimes called variationalists – are opposed to Generative linguists. The latter do research on an abstract concept “competence” that might be the perfect language. They are not interested in variation, but want to describe the language as “it is in all our heads”. The variationalists on the other hand are more interested in language as “it comes out of our mouth”, thus incorporating “errors” and “mistakes”. Moreover, this variation is not random but claimed to be sociologically motivated.

Looking at the large body of work that sociolinguists already produced, you cannot but notice that the research only looks at one feature at the time, or sometimes combining just a couple of features. As an example, the difference between male, working class people from village A are compared to male, working class people from village B is measured by their use of the work “like”. It would be of course very ambitious (but oh so complete and beautiful) if research could compare all possible sociological fingerprints, based on all possible linguistic features. This task is in itself not feasible because of limits to data collection, time constraints, sampling, ethics etc.

Not everything, but…

Starting from this ideal situation, one could say that each possible sociological fingerprint matches with (probably) a unique “vector” of linguistic features. The granularity of these sociological fingerprints will be so high that probably one or two actual language users fit the fingerprint. One could conclude that we could skip the step of detailed sociological description, and just observe individual people.

Based on their linguistic features vector, one could use similarity measures as used in neo-structuralist distributional lexical semantics which language users “speak” alike. Only at that time, sociology pops in looking for sociological structure withing the group of “speak alike” language users.

However, a big flaw in the reasoning appears here. The leap from a sociological fingerprint to an actual language user is problematic as language users tend to “change” their language depending on situational micro-sociological parameters, such as power, social distance, etc. Moreover, one cannot expect the same behavioural tendencies for all language users alike; I might speak more regional, when I meet somebody from my own region, but somebody else might actually try to speak more “standard” when he or she meets someone from my region.

This thus calls for a three dimensional matrix representation. The cells will contain the values that are approriate; we assume that we have the sociological fingerprint of the speakers (that will be “described” by using linguistic features) in a separate database. The left-to-right rowcells contain the values for the linguistic features; the top-to-bottom columncells contain the different speakers that we observe in the corpus; the front-to-bottom cells represent the conversational partner. In other words, for every other conversational partner, we make a vector of linguistic features of the speaker.

The Sky is the Limit: caveat

A representation in a three dimensional matrix might seem the end of what we can conceptualize, but it is not the limit. Consider a four dimensional matrix, with the three dimension already mentioned, plus a dimension of age. Suddenly, the study is no longer “in apparant time” and synchronous, but opens up to a diachronic survey.

This of course seems all wonderful, but there is an important caveat and it is called time. To collect this amount of data, years would be needed, yet alone the transcription of possible recorded samples will take decades. At that moment, we only have data, and no coding for linguistic features and sociological footprint whatsoever.

There would be two ways of dealing with this issue. On the one hand, the coding could be done automatically. Computational linguists might be able to help with the implementation of programs that can fill in coding schemes automatically. On the other hand, the principle of “many hands make the work light” can be applied. If there was some sort of generally accepted coding schema, a standard way of POS tagging, syntactic parsing and transcription of audio material, it would be possible to combine all the research that is done in the field into a large large dataset that can be studied.

Will this remain a dream?

 

Variational pragmatics May 11, 2009

Source: Schneider, K. and Barron A., 2008. Where pragmatics and dialectology meet: Introducing variational pragmatics. In Variational Pragmatics. Schneider K. and Barron A. (eds.) pages: 1-31. Pragmatics and Beyond New Series, John Benjamins Publishing Company.

The domain of Variational Pragmatics is a combination of pragmatics and variational linguistics. It studies pragmatic variation in geographical and social space, but also functions on the interface of pragmatics and variational linguistics as it focuses on both regional and social variation. Variational pragmatics hope to combine and to complete dialectology and pragmatics.

On the one hand, dialectology is claimed to miss out on certain pragmatic elements.  A long time, dialectology focused on accentology (variation in pronunciation). Lexical and some grammatical aspects entered the domain as well, but research questions about i.e. variation on the level of polite language do not occur. However, “the choice of how to say something may depend upon who is talking to whom under what social circumstances” (Wolfram & Schilling, 2006).

On the other hand, pragmatics have been looking for universal features, assuming that language communities are homogeneous wholes. Wierzbicka wrote a seminal paper on this, titled “different cultures, different languages, different speech acts”. The title gives it away; this paper goes against the common assumption that what is valid for English, counts for other languages as well. In a rather intuitive way, she shows that pragmatics should look into the naturally occuring variation in language use (as dialectology does). In that sense, the approaches of intralingual pragmatic variation are the forerunners of variational pragmatics.

Before stepping up to a detailed survey of the research possibilities, a note on terminology. The term “dialect”, common in dialectology, refers to regional variation. However, because of the shift to other types of variation, the term “dialect” is generalized to denote all types of language use, be it different due to region, sociological issues, etc. The ambiguity between “dialect” as ‘regional variation’ and “dialect” as the hypernym ‘all types of varying language use’ is problematic.  Therefore, the term “variety” is introduced to denote the hypernym; the term “dialect” is replace by “regiolect”, as “dialect” has grown a negative connotation. In short, the term “variety” is an umbrella-term for whatever variation one may encounter in language use.

Combining the knowledge of both pragmatics and dialectology, the dimensions along which language can vary are now more or less defineable. On the one hand, there is more or less stable variation that “defines” a language user in a constant way.

  • regional variation (national, subnational, pluricentric)
  • socio-economical variation
  • ethnic variation (native, non-native, integration)
  • gender (male, female, homosexual)
  • age (young, old)
  • education
  • religion
  • etc.

On the other hand, situational variation may occur along the following (micro-social) dimensions:

  • social distance (symmetric, asymmetric conversations)
  • power
  • assimilation
  • etc.

From pragmatics, it is possible to find out which tangible features can be looked at to measure these dimensions. There are five levels of features:

  • formal level: T/V pronouns, response tokens, discourse markers, hedges
  • action level: requests, thanking, apologies, invitations
  • interactional level: small talk, adjacency pairs
  • topic level: taboo, discourse structuring device cf. conversational topics
  • organisational level: turn taking, pauses, overlaps

These (levels of) features can be used to distinguish the dimensions mentioned above. As an example, the combination of a certain scientific topic, long pauses and many requests may “define” the dimensions of an asymmetrical conversation between an older professor and a younger student (age, social distance, power). This observation is one of the main points in my PhD research.