Welcome to the website of the workshop on Methodology and the Evaluation of Intelligent Virtual Agents that will be organised at the Intelligent Virtual Agent conference on November 5, 2018 in Sydney Australia.

The aim of the workshop is to critically but constructively discuss the empirical methods that are used in Human Computer Interaction, specifically in the area of Intelligent Virtual Agents. The social and life sciences are in a crisis of methodology as the results of many scientific studies are difficult or impossible to replicate in subsequent investigation (e.g. Pashler & Wagenmakers, 2012). The Open Science Collaboration (2015) observed, for example, that the effect size of replications was about half of the reported original effect size and that where 97% of the original studies had significant result, only 39% of the replication studies had significant results. In fact it has been suggested that more than 50% of psychological research results are likely false (i.e. theories hold no or very low verisimilitude) (Ioannidis, 2005). Many of the methods employed by HCI researchers come from the fields that are currently in a replication crisis. Hence, do our studies have similar issues?

Long before the replication crisis hit psychology, Meehl (1990) suggested ten obfuscating factors that make that research on psychological theories are often uninterpretable. Viewing these factors gives us an idea of the scope of the problems that our research methodology might face:

  1. Loose derivation chain: Very few derivation chains running from the theoretical premises to the predicted observational relation are deductively tight;
  2. Problematic auxiliary theories: each auxiliary theory is itself nearly as problematic as the main theory we are testing;
  3. Problematic ceteris paribus clause;
  4. Experimenter error;
  5. Inadequate statistical power;
  6. Crud factor: everything correlates with everything;
  7. Pilot studies: A true pilot study is a main study in the small. But these are often not published which can lead to line of research being dropped;
  8. Selective bias in submitting report;
  9. Selective editorial bias;
  10. Detached validation claim for psychometric instruments: claiming a measure is ‘valid’ without further consideration;

A variety of ideas to improve research practices have been proposed and it is likely these ideas can be beneficial to the methods used in the field of HCI. Some actionable points leading to open and reproducible science are pre-registration of experiments, replication of findings, collaboration and education of researchers. The replication crisis needs our attention and as we reflect on our methods it makes sense to discuss in general our scientific methods and practises.

In the past years there has been limited activity on this topic at IVA conferences. A workshop aimed at improving the quality of IVA research and methods should be welcomed by all IVA researchers. During the workshop we will discuss the methodological challenges identified in other fields and how they relate to the methods we use in our field. Additionally, we will discuss the proposed remedies and whether these are applicable for the research we conduct. We will discuss whether questions such as those posed above are relevant and, if so, how to go about answering them. This workshop is intended as a starting point and it will be the first of a series of workshops (at IVA and other conferences in the field) on this topic.

The goal is to embrace a positive, proactive approach that is sustainable and will lead to better science (no naming and shaming). The idea is to foster discussion and one way to achieve this is by having provocative statements to respond to. We invite participants to submit thought provoking statements about the methodology in HCI and/or respond to statements that we propose. Ten provocative statements to start the discussion:

  1. HCI research is too much novelty focussed.
  2. Sample size estimation is impossible to do when evaluating new technology.
  3. Experimental design/methodology are seen as necessary evil & boring by HCI researchers.
  4. Theory building is difficult because of technical implementation of auxiliary hypotheses.
  5. Knowledge of theories and concepts is insufficient in HCI (e.g. Basic emotions) and failures do not lead re-evaluation of assumptions.
  6. HCI relies on small corpora and ground truth does not exist.
  7. Technology focus creates legacy problems.
  8. Custom/proprietary technology prevents accurate replication.
  9. Open science is prevented by novelty focus, technical one-off solutions, and conference schedule.
  10. Valorisation & entrepreneurialism, for which HCI is a key field, are at odds with proper conduct of science.

Participation is encouraged for all who are interested in good science. Contributions are welcome discussing methodology in HCI and/or related to the following topics:

  • Replicability of studies;
    • Methodological pitfalls specific to HCI;
    • Tools and procedure that can improve the replicability;
  • Validity of HCI research;
    • What are we investigating (are the definitions clear)?
    • Do we agree on definitions and what we are investigating?
    • Are we asking the right questions?
    • What are the answers worth?
  • Generalisability of results;
    • From theoretical background to concrete predictions;
    • Relating data from HCI experiments back to theory (e.g. Gratch, 2017);

Program Schedule

08.30 – 09.30 Conference registration and welcome coffee

09.30 – 09.45 Introduction by the organizers

09.45 – 11.00 Keynote (Alex Holcombe): Boosting reproducibility and the robustness of results: Lessons from experimental psychology

11:00 – 11:30 Coffee break

11:30 – 12:30 Presentations of submission (3 x 20 min)

11:30 – 11:50 Hannes Högni Vilhjálmsson – When a Virtual Agent is a Flawed Stimulus

11:50 – 12:10 Jelte van Waterschoot – Challenges in Long-term Evaluations of ECAs in the RealWorld

12:10 – 12:30 Merijn Bruijnes – The Embodied Debate Corpus

12:30 – 13:30 Lunch break

13.30 – 14.15 Presentations of submissions (2 x20  min)

13:30 – 13:50 Claudio Pedica – Believability needs skin in the game

13:50 – 14:10 Deborah Richards – Should we use human-human factors for validating human-agent relationships? A look at rapport

14.15 – 14.30 Panel discussion

14.30 – 15.30 Introduction talk: create a generic IVA evaluation questionnaire

15.30 – 16.00 Coffee break

16.00 – 17.00 Breakout Session (including presentations of group discussions)

17.00 – 17.15 Finishing Notes by Organizers

17.30 – 20.30 Social


Gratch, J. (2017). Understanding the mind by simulating the body: virtual humans as a tool for cognitive science research. (S. E. F. Chipman, Ed.)The Oxford Handbook of Cognitive Science (Vol. 1). Oxford University Press.

Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.

Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological reports, 66(1), 195-244.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence?. Perspectives on Psychological Science, 7(6), 528-530.