Chinese linguistics is highly interdisciplinary as it lies at the intersection of the humanities and social sciences in relation to studying Chinese language, culture, society, and mind. The Workshops on Data and Methods in Chinese Linguistics will bring insights from data science to shed light on research methods of Chinese (psycho)linguistics, and hope to bring language researchers’ attention to research methods and data analysis, which are at the center of the current methodological movements across disciplines. The Workshops will be conducted with invited talks and hands-on exercises that will take place both virtually on Zoom and in the Social Science Research Commons of IUB for those who are on campus. The theme of the conference is "Data, Methods, and Application in Chinese Linguistics”, which is also meant to draw attention to the nature of different types of linguistic data, the ways data can be handled and analyzed, and the applications of these fundamental methodological issues in linguistics and language pedagogy. Through the workshop and the theme of the conference, we hope to help researchers share their practices of data processing by paying attention to the methods in dealing with linguistic data. Selection of this conference theme is in line with current movements in multiple disciplines that emphasize replicability of research, data analysis using new statistical methods, and implications for research designs and methodology. Our confirmed speakers include James Myers (National Chung-Cheng University) and Shravan Vasishth (University of Potsdam). This workshop is co-sponsored by the Workshop in Methods (WIM) of Indiana University, Bloomington. Participants can also obtain information about the workshops at WIM's event sites.


Workshop 1: 9-11am Friday, September 23 (EST)

On zoom and in IU's Social Science Research Commons

Characters and grammar: How linguists can become more fluent in R

James Myers

National Chung Cheng University

R is not just statistics software, but a full-fledged computer language, and with its thousands of extra packages linguists can program it to do things far beyond basic statistical analysis and graph-making (though of course R is great at those things too). In this talk I hope to give a painless introduction to the grammar of R for those who have not yet dared to try it, while still offering some new ideas to more experienced users. The empirical focus is on my own explorations of the “grammar” of Chinese characters, which will allow me to survey a variety of methods of particular use to Chinese linguists, including how to work with non-Roman writing systems, how to analyze text corpora, how to compile data from lab experiments, and of course how to run statistical analyses and make graphs, from the simple to the fancy, including new types of analyses that you can invent yourself with some basic concepts in probability and a bit of programming. If time permits, I will also demonstrate how diverse and powerful R’s extra packages really are by highlighting tools for sound and image processing (the latter useful for the study of writing systems and sign language). Above all, I will emphasize that there is no reason to feel intimidated: anybody can become ever more fluent in R through workshops like these, textbooks, internet searches, and most importantly, patient trial and error.

Workshop 2: 4-5pm (For discussion & Q&A) Friday, September 23 (EST)

A pre-recorded lecture will be posted here. Screening of the lecture will take place 2-4pm at Social Science Research Commons.

Q&A will take place on zoom and in IU's Social Science Research Commons during the above time.

The amazing saga of the Chinese relative clause: A cautionary tale for tomorrow's psycholinguists

Shravan Vasishth

University of Potsdam

In 2003, an explosive paper appeared in the prestigious journal Cognition. The explosive and surprising claim, based on a single 40-subject self-paced reading experiment, was that in Chinese, object relative clauses are read more easily than subject relatives. The claim is surprising because in other languages, subject relatives are generally easier to read than object relatives. Then, in 2013, a second single-experiment paper was published in Language and Cognitive Processes (this time with 37 subjects) that claimed to replicate the original 2003 effect.

I will use the 2013 data (the 2003 data are not available, but I will discuss  some problems in the published statistics as well) as a case study to  show that the claim had no support in that data-set. The reason that there was  no support for the claim was that the data were analyzed incorrectly and several odd properties of the data were ignored.  

The 2013 data-set and its published analysis is a nice illustration of some of main problems plaguing psycholinguistics and linguistics today. These problems have contributed to the replication crisis unfolding in the field. The idea that one can just load the data into R or some other software and essentially hit enter to run the analysis needs careful rethinking. The central problems that I will discuss are the dangers of interpreting p-values in low-power studies, fitting models that have serious violations of the underlying assumptions, and finally, the fact that surprising claims require  overwhelming evidence.

These problems are not unique to these Chinese relative clause data; they appear in many published papers (I will discuss some specific examples).  The talk will be a tutorial-style introduction to some ways in which data analysis in repeated-measures designs can be carried out in a more thoughtful manner, without overstating the findings. Such an approach will lead to more robust analyses and in doing so, save us from repeatedly being sent down a garden path.    







