New media, new methods... new understanding of representativity?

September 4, 2019

Welcome to our blog! We will be using this blog to ongoingly publish articles, news and insights related to the development of our software toolkit named PAUL. Starting off, we would like to shortly summarize what Twister Research is working on. So, in a rather broad context this first post is supposed to explain why we decided to start building PAUL. At the same time, we would like to give some thoughts and insights on the research methods we are using and why we are convinced that this approach will further develop as very relevant part of market and opinion research.


So, what do we do?

PAUL allows our clients to carry out live surveys at large scale, and analyze their results directly as they come in. It was specifically developed for the needs of media companies, so journalists, editors and researchers can easily and meaningfully engage with their audiences. Participants, on the other hand, can share their opinions on a variety of topics, have a direct influence on media content and compare their opinions with others.

With regard to the method, surveys in PAUL can be categorized as online polling. In the areas of market and opinion research, this method has proven itself as a great and modern alternative to classic methods like telephone interviews, mostly due to being more time and cost effective, as well as being more scalable.

When professional market researchers want to gain insights about their group of interest in a time- and cost-effective manner, they mostly work with so-called online panels (so to say communities of people that are willing to regularly participate in online surveys). PAUL is developed to give companies the opportunity to build their own communities and to allow companies to get access to the PAUL community and interact with it.

In that sense we are building a classic online panel. However, the PAUL community is envisioned to become a very dynamic one, as our media clients will use PAUL to regularly interact with their audience. In the process, they recruit participants from all kinds of socio-demographic and social backgrounds. As the context and content of our surveys are very diverse, so are the interests and psychographic characteristics of our participants.


Sample, selection and representativity

Germany is our launch market. Strictly speaking, the samples we provide do not represent the entire German population, but only the portion that has internet access. Since we offer both smartphone apps, as well as a web app, participants need to have access to at least one internet-ready device. For the participation via our smartphone app, this is limited to the iOS (10.0+) and Android (4.4+) operation systems, although these represent the vast majority of smartphone operating systems today. [1]

As an increasing share of the German population is equipped with mobile devices and internet access [2], by now the majority of the German population has the possibility of participating in our surveys. Nevertheless, one should bear in mind that, as a result, this channel of panel or sample recruitment may lead to distortions. Consequently, some survey results should be interpreted with awareness of these limitations.

Firstly, it comes as a logic consequence that internet users and those that prefer not to use the internet, exhibit different media consumption behavior. A 60-year-old user of a smartphone might have a different opinion about media and technology than a user that does not use smartphones and has a conservative attitude towards new technology. PAUL users therefore tend to have a rather high web-affinity.

Furthermore, the self-selection of our participants needs to be considered, which basically means, that they must be aware of our community and have to make the decision to participate. As long as our surveys are only visible to some specific audiences or users, they are obviously not random sampled. As with other sampling methods, this fact may come with the implication that participants have an affinity towards the specific topic of the survey.  

Another particularity of our sampling is a consequence of the fact that PAUL is primarily deployed in the context of TV-formats. When a participant is recruited for a survey via a TV show, and that TV show deals with topics that are also a part of the survey, one can not rule out the possibility that the respective opinion is influenced by that specific show itself.

PAUL sampling process

Is distortion entirely avoidable?

In general, selection biases may arise when three types of populations deviate: the target population, the frame population and the inference population. The target population is the entirety of all research subjects about whom one wants to formulate a statement. The frame population is the entirety of research subjects who have a probable chance of becoming part of the sample. The inference population is the entirety of research subjects on whom one can actually draw conclusions.

As an example:

  • We want to make a statement about adult citizens (target population).
  • To do so, a researcher uses extensive telephone lists. All citizens that are in such a telephone list can become a part of the sample (frame population).
  • It is only possible to make statements about citizens that, a) own a telephone, b) are part of the list and c) are willing and able to participate (inference population).

Taking the example into account, it becomes clear that distortions through the process of sampling might occur with any survey method. Selection biases are the norm, not the exception. Consequently, a researcher has to control for differences between those who participate in telephone interviews and those who do not. The same goes for online panels, where a researcher should expect to find higher web-affinity among participants than among nonparticipants.

Online panel providers often work with monetary incentivization, which risks distortion from so-called 'professional participants'. Professional participants are those whose main motivation to participate in surveys is to receive monetary rewards, and consequently end up taking many surveys, including ones for which they may not fit the target population.

So is all of this representative?

In the discussion about representativity of online surveys, self-selection is often named as a critical point. This is because according to strictly scientific standards, results can only be representative when the criterium of random sampling is fulfilled, so when every person in the target population has the same chance of getting into the sample.

Considering this requirement against the facts that landline telephones are decreasingly present in households [2], and that willingness to participate in telephone surveys is also decreasing [3], it is questionable if representativity in that sense still exists (or has ever existed). Nowadays, it is neither possible to reach the whole population via online, nor via offline methods. In practice, a combination of several methods is often the preferred way to fulfill this demand in the best possible way. This, of course, is rather complex, expensive and time-intensive.

Our approach is to focus on online research, and to control and clarify potential distortions in our samples and survey results. By equally placing great value on data quality, transparency and consultation, we believe our clients will get a better understanding of how their insights can be interpreted. One of our goals is to create a more differentiated understanding of the concept of representativity and the different kinds of populations, as described above. Only then it will be possible to reflect on the deeper meanings of survey results.

Using the term "representative" is tempting because it is well established and is seen as synonymous with quality results, yet using it this way does not do justice to the complex nature of empirical analysis.


What does “representative” even mean?

As the discussions about representativity seem to be more current and present than ever, we would like to go further into detail. Not only in public debate, but also among professionals there is a lack of consensus on how the term should be used and understood. The use of the term as an indicator for quality is a basic problem of empirical research, because it describes how well the characteristic features in the sample represent the characteristic features of the population. [4]

The main problem is that, in practice, it is hardly possible to guarantee that every person in the population has the same chance of ending up in the sample. The problem is least prominent with small and clearly defined populations, in which all individuals are equally reachable. For example, students enrolled in one study program who all have an active school e-mail address.

The term “representativity” is, in fact, not a statistical term [5], but rather a linguistic convention that describes the eligibility of survey results for generalization. In the context of mathematically stringent statistics the term would not be used due to its ambiguity. Instead, external validity would be used as a quality criterion.

So even in professional discussion “representativity” is not unambiguous terminology, but rather a popular linguistic convention. This convention is very popular and widespread with people not actively engaged in statistics. So why is it used professionally, even when professionals do not share a common understanding? We consider the following reasons:

  • The wider public, and thereby also media users, have a basic understanding of what the term broadly represents.
  • Journalists tend to use and echo such popular terminology, because it has meaning for their audience and because they are often not specialists themselves. Another example includes the use of the term Artificial Intelligence, or AI, in both professional and popular culture for certain types of statistical modeling, even though it is often strictly inappropriate.
  • Market or opinion researchers tend to use the term to indicate quality, and consequently justify the price of their services.

In a certain sense, one could speak of a colloquial and not strictly scientific meaning of representativity. From the perspective of market research, a sample is often considered representative when it corresponds with the basic population in terms of essential characteristics (e.g. shares for age, gender, region), because in practice it is never possible to control all possible characteristics.


What to make of it

Let us summarize: if a researcher wants to conduct a survey and generate “representative” results as to the common definition, she needs to ensure that the sample matches with the target population on essential characteristics.

The recent changes in media consumption has led to different accessibility of candidate survey participants. Due, among other factors, to the decreasing willingness to participate in telephone interviews, online research has become an established method, particularly in market research.

Landline telephone equipment has been steadily decreasing in Germany since 1998.

It has become clear that innovative methods and approaches are at least as suitable for the generation of general statements as traditional methods. On the other hand, it should be noted, that several methodological details need to be considered, if one wants to keep distorting effects as low as possible.

With all the above in mind, we challenge ourselves to be able to generate insights that are just as “representative” (or externally valid) for the basic population as insights generated via traditional methods.

We will go into further detail in our upcoming blog posts. Thanks for reading!

[1] See: StatCounter 2019:

[2] See: Statistisches Bundesamt 2018:

[3] See: Pew Research Center 2019:

[4] Bortz, J. / Döring, N.: Forschungsmethoden und Evaluation in den Sozial- und Humanwissenschaften, 2016,S. 298.

[5] ib. and Schnell, R. /Hill, P. B. / Esser, E.: Methoden der empirischen Sozialforschung, 2008,S.305f.

Back to blog overview