Doppelleben

Between Belgium and Berlin

Variational pragmatics May 11, 2009

Source: Schneider, K. and Barron A., 2008. Where pragmatics and dialectology meet: Introducing variational pragmatics. In Variational Pragmatics. Schneider K. and Barron A. (eds.) pages: 1-31. Pragmatics and Beyond New Series, John Benjamins Publishing Company.

The domain of Variational Pragmatics is a combination of pragmatics and variational linguistics. It studies pragmatic variation in geographical and social space, but also functions on the interface of pragmatics and variational linguistics as it focuses on both regional and social variation. Variational pragmatics hope to combine and to complete dialectology and pragmatics.

On the one hand, dialectology is claimed to miss out on certain pragmatic elements.  A long time, dialectology focused on accentology (variation in pronunciation). Lexical and some grammatical aspects entered the domain as well, but research questions about i.e. variation on the level of polite language do not occur. However, “the choice of how to say something may depend upon who is talking to whom under what social circumstances” (Wolfram & Schilling, 2006).

On the other hand, pragmatics have been looking for universal features, assuming that language communities are homogeneous wholes. Wierzbicka wrote a seminal paper on this, titled “different cultures, different languages, different speech acts”. The title gives it away; this paper goes against the common assumption that what is valid for English, counts for other languages as well. In a rather intuitive way, she shows that pragmatics should look into the naturally occuring variation in language use (as dialectology does). In that sense, the approaches of intralingual pragmatic variation are the forerunners of variational pragmatics.

Before stepping up to a detailed survey of the research possibilities, a note on terminology. The term “dialect”, common in dialectology, refers to regional variation. However, because of the shift to other types of variation, the term “dialect” is generalized to denote all types of language use, be it different due to region, sociological issues, etc. The ambiguity between “dialect” as ‘regional variation’ and “dialect” as the hypernym ‘all types of varying language use’ is problematic.  Therefore, the term “variety” is introduced to denote the hypernym; the term “dialect” is replace by “regiolect”, as “dialect” has grown a negative connotation. In short, the term “variety” is an umbrella-term for whatever variation one may encounter in language use.

Combining the knowledge of both pragmatics and dialectology, the dimensions along which language can vary are now more or less defineable. On the one hand, there is more or less stable variation that “defines” a language user in a constant way.

  • regional variation (national, subnational, pluricentric)
  • socio-economical variation
  • ethnic variation (native, non-native, integration)
  • gender (male, female, homosexual)
  • age (young, old)
  • education
  • religion
  • etc.

On the other hand, situational variation may occur along the following (micro-social) dimensions:

  • social distance (symmetric, asymmetric conversations)
  • power
  • assimilation
  • etc.

From pragmatics, it is possible to find out which tangible features can be looked at to measure these dimensions. There are five levels of features:

  • formal level: T/V pronouns, response tokens, discourse markers, hedges
  • action level: requests, thanking, apologies, invitations
  • interactional level: small talk, adjacency pairs
  • topic level: taboo, discourse structuring device cf. conversational topics
  • organisational level: turn taking, pauses, overlaps

These (levels of) features can be used to distinguish the dimensions mentioned above. As an example, the combination of a certain scientific topic, long pauses and many requests may “define” the dimensions of an asymmetrical conversation between an older professor and a younger student (age, social distance, power). This observation is one of the main points in my PhD research.

 

Economy and science, and what about open standards? April 5, 2009

While working on some corpus based things, I suddenly became aware of the fact that most of the material that I am using is not free. “Free” has two meanings:

  • “Free” as in “free to use, and you do not have to pay for it”.
  • “Free” as in “the source behind it can be seen and manipulated by everybody”.

Although I do not have to pay anything for all the data that I am using, the university does (free, sense 1);  the largest chunck of my time is dedicated to making all my scripts speak with each other (free, sense 2). Indeed, every corpus, parser, tagger, etc. has its own output format, abbreviations, and pecularities, so that comparing their outputs is quite a task. Luckily, these outputs are open (and not in some proprietary format) so that I am actually able to manipulate the structure. This is not always possible (imagine output in MS Access format, which would force me into using MS Office…)

So, here comes my almost rethoric question: should scientific material be free, in both senses of the word? I am inclined in answering “no”, to the money part, but “yes” to the openness of the source. Let me argue and rant about this a bit.

Economy

E. is being educated to work in a publishing house, or at least in the publishing industry. One of her teachers sometimes reminds the students that publishing is also an economic activity. He says that as an excuse. Indeed, publishing scientific editions and knowledge with the goal of helping science can be seen as a part of science itself, and why should we block off scientific progress because of economy and profit? Pulling this card a little harder would mean that all scientific books should be free (gratis) to use in science — let’s not stall science because of money issues! However, these scientific books are usually very expensive.

Now, E. learns why they are so expensive. I am not able to go into the details, but there is a lot of work (not jobs, it’s just work-intensive) in editing literature, or in thinking about structures that make the information accessible. And of course, a publishing house is an economic activity, and every worker needs to get paid. And to be true, these expensive books are very often bought by university libraries using university budgets, and the plain scientist simply fills in a form to order the book. The scientist does not really feel the cost of this book. Admittedly, students often buy these books also, using their own budget.

Science

This brings me to the point of view of the science guy – and I now bluntly generalize my own opinion, other people may think differently, and please do. So here is my situation: I get a kind of payment, in the form of a scholarship; moreover, I get a little, nice, warm office where I can work; in this office stands a computer for which I never had to pay the bill, along with all office tools that I can dream off. There is a huge library two doors to the right, in which I can freely browse in the latest books, and if I do not find it, then I can order it, without seeing a bill. My research is heavily corpus based, and all (or at least most of) these texts are freely available from the research unit server, which is a real power horse. I also read a lot of articles, and these can be downloaded from the internet, for free, because the university has bought accounts for all the relevant websites.

So, in two words: my “work” is completely based on an underlying system that provides me with tools, material, etc, and it is seemingly free (gratis) for me. This of course is a great stimulus for doing research, while not bothering about financial aspects (the golden rule here is “to stay within reasonable boundaries”). In that aspect, the university and my research unit have succeeded completely in creating a motivating environment.

Out of the control of the university is of course the quality of the material. Up to now, I spoke about the financial part. Now, I’d like to turn to the “standardization” and “open source” aspect of my lengthy post. As an example, I will use my beloved corpora. Each one of them has a different structure (which is ok, of course), but also another way of presenting this structure. Some use XML, others use comma separated value text files, others don’t have structure at all… Within the conveniently XML structured corpora, a proliferation of tags and DTDs are being used. Practically, this means that every corpus needs its own dedicated “program” to be able to search in it.

Things get even more complicated when you add linguistic knowledge to the corpora with a part-of-speech tagger or full parser. Each parser (but it’s slowly improving!!!) has its own way of presenting the data, again making it necessary to have different “programs”. Combining corpora for extending the research is a very painful job, as a lot of time is needed to make “the corpora talk to each other”.

The same for literature and scientific Editions of for instance Medieval works. Every edition, sometimes within a publishing house!, has a different structure (internally and digitally). If there was some standard available for hypertext editions, it would be able to make a expandable interface for searching in all kinds of editions, making it easier for scientists. That is what an edition is there for, right?

One of the arguments for NOT doing this is economy and profit. If every publishing house has to present its material in the same format/structure, and compatible to one interface, the diversity between publishing houses would be gone, and there would be no reason to buy the edition of publishing house X, instead of Y. (This is not entirely true, I know, but please, for the sake of the argument)

Reconciliation?

So finally, is it possible to reconcile both parts of “free”: one part pulling into the direction of proprietary formats for economicy reasons, the other part asking for open formats for the sake of science? I think there is, just by looking to the success of OpenOffice or sourceforge.net and other open projects.

Moreover, I think that it will create a huge step forward in quality. When you don’t have to buy MS Office anymore because everybody else uses it, you can start to look into other Office Suites and compare their functions, instead of their usage in your circle.

To conclude, open standards and formats are a way to go, and they certainly represent the future of working with computers. In a short while, people will not accept the fact that they can open certain files in only one program – and I am speaking about accountants, SAP-related things, etc. On that day, the world will truly become “free”, at least in the second sense of the word.

 

Eternal waiting December 10, 2008

Filed under: phd — Tom @ 12:48
Tags: , , , , , ,

I am about two months far in my PhD project, and the waiting time for the script that counts the non-lexical features in my corpus is already so long that the cartoon below is very appropriate (yet not entirely correct, but who cares).

Eternal Waiting

Eternal Waiting

I have 43 features implemented and working (more or less) correctly, but there are still 60+ to go. If the runtime will double, I will have to wait more than an hour to get the counts. After that, the Factor Analysis and factor scores take some time as well.

The eternal waiting for the counts sometimes drives me crazy…