blog.berndweiss.net

Unter anderem mit diesen Themen: Quantitative Soziologie, Statistik, R & LaTeX

Sociolects in R

without comments

Die folgende Meldung zu “Sociolects in R” kam gerade auf der r-help-mailing list herein. Dabei zeigt sich (wieder einmal) der visionäre Charme von R, dem etwa SPSS noch Jahrzehnte hinterherhecheln wird. Peter Dalgaard schreibt euphorisch:

“The R translation teams have done a great job in making R usable for people who do not have English as their mother tongue. However, even within English speaking countries, there are groups which have trouble with the language, and it may be valuable to support the Sociolects of these groups too.

Thanks to a generous contribution from Lars Polifo, these features will be made available in an upcoming version of R. As it turns out, there are some particularly interesting challenges that needs to be addressed. Consider for instance the translation of the t-test in the locale en_SF_US.UTF8 (notice the interjection of the code “SF” to denote “San Fernando Valley“)

t.test(extra ~ group, oh, baby, data = sleep)

Welch Two Sample t-test

data:  extra by group

t = -1.8608, like, df = 17.776, like, wow, p-value = 0.0794

alternative hypothesis: true difference in means is like, ya know, not equal to 0

95 percent confidence interval:

-3.3654832  0.2054832

sample estimates:

mean in group 1 mean in group 2

0.75            2.33

Notice that in addition to the simple message string modifications, it has been necessary to modify the parser so as to delete obviously superfluous arguments such as “oh” or “baby” (a particular issue here is that the argument “like” might actually be intended to mean likelihood). Similarly, for se_KC_SE.UTF8 (KC for “kitchen”) we have alternate spellings of arguments like “data”:

t.test(ixtra ~ gruoop, deta = sleep)

Velch Tvu Semple-a t-test

deta:  ixtra by gruoop

t = -1.8608, dff = 17.776, p-felooe-a = 0.0794

elterneteefe-a hypuzeesees: trooe-a deefffference-a in meuns is nut iqooel tu 0

95 percent cunffeedence-a interfel:

-3.3654832  0.2054832

semple-a isteemetes:

meun in gruoop 1 meun in gruoop 2

0.75            2.33

Canadian English poses particular problems, which have not yet been resolved. If we are to do it properly, it would entail modifications to the R language itself. For instance we’d have to introduce a “four” loop and change the end-brace to the four-character string “eh?}”.

(Das ist Beitrag Nr. 50 im Quanti|Soz|Blog!)

Written by Bernd Weiss

April 1st, 2008 at 5:14 pm

Leave a Reply

Bad Behavior has blocked 218 access attempts in the last 7 days.

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Germany
This work by Bernd Weiß is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Germany.