Evaluating Research beyond Scientific Impact How to Include Criteria for Productive Interactions and Impact on Practice and Society

Currently, established research evaluation focuses on scientific impact – that is, the impact of research on science itself. We discuss extending research evaluation to cover productive interactions and the impact of research on practice and society. The results are based on interviews with scientists from (organic) agriculture and a review of the literature on broader/social/societal impact assessment and the evaluation of interdisciplinary and transdisciplinary research. There is broad agreement about what activities and impacts of research are relevant for such an evaluation. However, the extension of research evaluation is hampered by a lack of easily usable data. To reduce the effort involved in data collection, the usability of existing documentation procedures (e.g., proposals and reports for research funding) needs to be increased. We propose a structured database for the evaluation of scientists, projects, programmes and institutions, one that will require little additional effort beyond existing reporting require ments.


Methods
Literature analysis of core issues 1 and 2.
Guideline-based qualitative interviews with 22 agricultural researchers with regard to core issues 1 and 3: The scientists in terviewed were selected to cover a broad range of subjects within (organic) agricultural sciences and neighbouring disciplines, to include expertise in interdisciplinary and transdisciplinary research, knowledge transfer, research management and research evaluation, and to involve a diversity of research institutions -universities, federal departments and

FORSCHUNG | RESEARCH
and the impact of research on practice and society, 2 with the focus on (organic) agricultural research.We also sought to identify barriers to such procedures.In the following, we present the main results of our study and propose a concept for improved data collection.

Material and Methods
Three core issues were investigated interdependently using three methodological approaches.
Core issues 1.Review of the established evaluation system, specifically from the perspective of applied, practice-oriented, inter disciplinary and transdisciplinary research (see Wolf et al. forthcoming).2. Reviews of existing concepts for the evaluation of inter disciplinary and transdisciplinary research and social/societal/broader impact assessment.3. Suggestions for the evaluation of practice-oriented organic agricultural research.

>
1 Funded by the German Federal Ministry of Food, Agriculture and Consumer Protection (BMELV) in the Federal Organic Farming Programme (BÖLN). 2 We refer to "productive interactions" because they comprise research practices that may lead to impact (see Spaapen and Van Drooge 2011) and the impact itself, with regard to both "practice" and "society".We thus aim to cover the many terms used for evaluation beyond scientific impact, such as social, societal, broader, political and environmental impact, end-user relevance and the evaluation of inter-and transdisciplinary research.While "society" is the more inclusive definition, "practice" refers to the importance of the practical application of agricultural research.The term "practice orientation" includes all research activities providing results for use and benefit beyond science (practitioners and society) -regardless of the approach used.
Example for practice-oriented agricultural research: Below-root fertilisation with compost.The use of compost has a long history in sustainable food systems.Basic research discovered the biology of pathogen suppression by compost.Applied research gathered evidence of the effects in field trials.Together with an agricultural machinery maker, a machine for the line application of compost during potato planting was developed.
based on certain implications as to how research should be conducted or how impact is generated.Evaluation of interdisciplinary and transdisciplinary research focuses on participative process es and knowledge integration (Nowotny et al. 2001, Pohl andHirsch Hadorn 2006), which are important for change processes (Manring 2007) and the generation of target and transformation knowledge (Hennen et al. 2004, Pohl andHirsch Hadorn 2006, p. 35).These qualitative concepts (table 1, p. 109) aim to contribute to learning processes in particular.Some theoretical concepts use logic models3 divided into inputs, (processes), outputs, outcomes and impact.On the one hand, they are used in cost/benefit approaches, for example, in ag ri-environmental (Pearson et al. 2012) and in agricultural research in development cooperation, where impact is focused on improvements in the production of certain commodities (Davis et al. 2008).Pearson et al. (2012) as well as Davis et al. (2008) indicate shortcomings in the recording of societal and environmental impacts and their dependency on various assumptions.On the other hand, such logic models are used for qualitative eval uation like the HERG Payback model, where the authors point out that impact is a result of the whole system (Buxton 2011).This is also stated for agricultural innovation systems and applied in Impact Pathway Evaluation for research and innovation (Douthwaite et al. 2003) and in concepts for development cooperation (Reuber and Haas 2009).
Bridging processes and impacts without assessing the whole system is a task of the SIAMPI approach, another theoretical grounded model, which focuses on productive interactions as a prerequisite and proxy for impact.Productive interactions are de fined here as "exchanges between researchers and stakeholders in which knowledge is produced and valued that is both scientifically robust and socially relevant" (Spaapen and Van Drooge 2011, p. 212).They are divided into direct, indirect (by publications, exhibitions, etc.) and financial interactions (Spaapen and Van Drooge 2011).The SIAMPI concept -in combination with Impact Pathway Evaluation -was also used by the French National Institute for Agricultural Research (INRA) to evaluate cases where sets of actions related to certain impacts (ASIRPA 2012).

Results: Criteria, Evaluation Tools and Challenges for the Extension of Research Evaluation
The results of interviews, synthesis workshop and literature analysis presented in this paper concern the following aspects: criteria for the evaluation of practice-oriented (organic) agricultural research; evaluation tools; challenges confronting the extension of research evaluation.

Evaluation beyond Scientific Impact in Different Fields
Concepts for an evaluation beyond scientific impact exist for different evaluation objects (project, programme, institution or scientist) and feature a high degree of conceptual diversity.We will focus mainly on ex post evaluation.
Many concepts are specifically tailored to certain approaches like interdisciplinary (e. g., Huutoniemi et al. 2010, Huutoniemi 2012) and transdisciplinary research (e. g., Pohl et al. 2011 for ex ante, Bergmann et al. 2005 for formative and ex post evaluation).Some are defined in combination with a certain discipline or research field and/or specification of the impact, for example, enduser relevance of applied agricultural research (Lyall et al. 2004), societal impact of translational research in medicine (Niederkrotenthaler et al. 2011), or impact of environmental research on policy (Shaw and Bell 2010).
Concepts that aim to evaluate the social/societal/broader impact of research in general are increasingly being developed, as evidenced by a special issue of Research Evaluation (Donovan 2011) and a recent literature review (Bornmann 2012).Key examples are the Payback model of the Health Economics Research Group (HERG), created for medicine and adapted for general use (Klautzer et al. 2011), and the Social Impact Assessment Methods for research and funding instruments through the study of Productive Interactions between science and society (SIAMPI).The latter is based on the conception of Evaluating Research in Context (ERiC)(2010), part of the Standard Evaluation Protocol used by all Dutch universities and academic research organisations (Spaapen et al. 2011).Furthermore, the Research Councils UK (RCUK) insist on the inclusion of an "impact summary" and intended "pathways to impact" in proposals submitted (RCUK 2010), and on the ex post rec ording of outcomes and impacts for case studies (RCUK 2010(RCUK , 2011a)).
Another distinction can be drawn between pragmatic ad hoc concepts and theoretically grounded concepts.Examples of pragmatic concepts in agriculture are Formas (2007Formas ( , 2009) ) and Pedersen et al. (2009), and some concepts of funding agencies that evaluate the impact of environmental research on policy (Shaw andBell 2010, Bell et al. 2011).Theoretically grounded concepts are

> Criteria for the Evaluation of Practice-oriented (Organic) Agricul tural Research
The main results of the interviews are a clear statement indicating the necessity for a broad and adaptable set of criteria, the identification and rating of core issues for evaluation criteria, and the suggestion of criteria for productive interactions and the impact of research on practice and society.
The interview results are followed by the number of persons who offered suggestions in relation to the total number of interviewees4 (e. g., 6/22) or participants of the synthesis workshop (e. g., 4/10 WS).The outcome of the semi-quantitative assessment is marked with the initial (Q).

Necessity for a Broad, Adaptable Set of Criteria
Figure 1 shows the interviewees' opinion that evaluation of practice-oriented research in (organic) agriculture should be based on a broad set of criteria (11/22 + 7/10 WS).Broad means 1. research practices that may lead to impact (productive interactions) as well as the subsequent impact itself (11/22) and 2. both quantitative and qualitative criteria (7/22).Interviewees argue in favour of a broad, adaptable set of criteria so as to respect the individuality of research and differences in target groups, leading to unique impacts and specific benefits.According to the goal of our study -i.e., extending research evaluation in general -the criteria need to fit and be adaptable to specific evaluation objects, contexts, purposes and times.The process of adaption should include and be open to the selection of criteria (4/22 + 7/10 WS), the selection and integration of representatives from practice and/or society (8/21) as well as the consideration of specific conditions for scientists' work, for example, in relation to resources and re-quirements for research (2/10 WS) or the context for implementation in practice (8/22).
The preference observed here for a broad set of criteria corresponds to recommendations in the evaluation literature (Holbrook andFrodeman 2011, Frodeman andHolbrook 2011).Quality is seen as a relative concept "driven by the variability of goals and criteria" (Klein 2006, p.75), and the combination of nar ratives with relevant qualitative and quantitative indicators is regarded as state-of-the-art (Donovan 2011).Criteria and interpretation of evaluation results can be adapted according to the characteristics and contexts of projects (Daschkeit andLoibl 2007, ERiC 2010) or the goals of research programmes (Braun et al. 2009), or developed with the participation of evaluees (Stokols et al. 2003, Bergmann et al. 2005, Blackstock et al. 2007) or evaluees and practitioners (Pedersen et al. 2009).

Core Issues for Evaluation from Scientists' Viewpoints
Following the qualitative analysis (figure 2, p. 108), interviewees consid er practice orientation and interdisciplinarity to be of particular importance.They also discussed intensively social relevance, as it was found to be important but difficult to assess.Diversity of topics seemed to be an issue primarily at research funding level.Other issues were often seen "in conjunction with" or "in relation to" practice impact and interdisciplinarity.The majority of the interviewed scientists pointed out that target groups and other persons and groups affected by research need to be identified and taken into account (17/21), especially for the evaluation of so cial/societal relevance and sustainability.

Criteria for the Evaluation of Impact on Practice and Society
In accordance with the results from the interviews, we concentrate here on criteria relating to productive interactions and their subsequent impact on practice and society.The criteria (table 2, p.110) have been drawn from the literature review and the results Interview results concerning the evaluation of practice-oriented research in (organic) agriculture: interviewees and workshop participants favour specific evaluation on the basis of a broad and adaptable set of criteria.

FIGURE 1:
research evaluation to include productive interactions and the impact of research on practice and society.
Most evaluations are carried out as "stand-alone procedures", which means that data are assessed solely for one specific evaluation.Thus existing sources like project reports (1,3,8,10,15,16,18) or data/documentation of research programmes (8,10,18) are used.Usually, additional information is collected via interviews with the evaluees (2,3,8,10,15,16,18) and/or scientific experts (2,8,10,11), and in some cases via workshops/focus groups or mentoring (10,13,18).Some evaluations also include data collection within practice and society with interviews (2,3,5,8,15) or workshops (3,10).With the focus on assessing the impact on policy, Boaz et al. (2009) also show the predominant use of interviews, case studies and documentary analysis.Data collection thus re-5 Discursive evaluation is a process in which evaluees and evaluators reflect on evaluation results in a constructive discourse (cf.Bergmann et al. 2005).
Core issues of organic agricultural research evaluation, which were asked for in the interview guidelines.Ratings derived from qualitative analysis of the interviews.

FIGURE 2:
of the interviews and synthesis workshop.On the basis of our literature review, we selected concepts for the detailed analysis of criteria and evaluation tools, with the focus on formative and ex post evaluation and a broad range of evaluation objects (table 1).In table 2, we cite the concepts with the numbers used in table 1 if they refer to similar criteria or involve criteria that seemed suitable for supplementation of the interview results.
The criteria listed in table 2 can be used for the interdependent evaluation of scientists, projects, institutions and research programmes.They are characterised on a three-level scale referring to their rough estimated applicability.
The concepts differ in extent and detail.Having said that, table 2 shows a high level of agreement within the literature and interview results regarding the criteria proposed.In general, processes and outputs receive more emphasis in the literature than impact.Interviewees mention content, contacts, publications and products as preconditions for impact and provide detailed criteria, most of which seem to be applicable with moderate effort.Suggestions for impact assessment -seen as outstandingly important for agricultural research -are underlined with less detailed criteria, but can be divided into three main stages: response, application and impact of application.There are also fewer concepts with criteria for impact because of the specific nature of the challenges involved in impact assessment.

Evaluation Tools
Evaluation tools and methods were reviewed in the literature (below cited with the numbers from table 1) and explored in the interviews.We focus on the usability of tools for the extension of the evaluation of preconditions for impact, such as processes and outputs, is the predominant focus in evaluation concepts.
However, evaluation beyond scientific impact is generally confronted with problems regarding the way scientific and nonscientific reviewers are selected and whether or not they are sufficiently competent (e. g., Daschkeit 2007, Skolits et al. 2009).Furthermore, shortcomings in the extent, availability and quality of data (e.g., Uriarte et al. 2007, Spaapen and Van Drooge 2011, Bell et al. 2011, Holbrook and Frodeman 2011) were extensively discussed in the literature and interviews, thus becoming a point of focus in our further investigation.
Criteria beyond scientific impact are already used for ex ante assessment, for example, in specific calls of the Seventh Framework Programme (Regulation [EC] 1906/ 2006, Holbrook and Frodeman 2011) and grant proposals for the US National Science Foundation (NSF) (Frodeman andHolbrook 2011, Holbrook 2012).Braun et al. (2009) remark that data available for the Framework Programmes are more suitable for programme administration than for ex post evaluation.They recommend simplified da tabases for the creation of indicators.In the Seventh Framework Programme, for example, contributions to sustainable develop ment based on expected impacts were monitored via a qualitative text analysis and interviews (Martinuzzi and Hametner 2012).
In addition, work is being done on the standardisation of data collection at institutional level (e. g., EC 2010, Van Vught and Ziegele 2011,Science and Technology for America's Reinvestment: Measuring the Effect of Research on Innovation, Competitiveness and Science [STAR METRICS] 6 , WR 2013).However, attempts to assess the transfer of knowledge, social impact or regional engagement are hampered by a lack of reliable data (EC 2010, Van Vught and Ziegele 2011).

A Concept for Improved Data Collection
The lack of easily usable data for evaluation beyond scientific impact is a problem affecting all evaluation objects from project to programme, from scientists to institutions.Accordingly, we developed an initial concept for continuous data collection that is independent of individual evaluations but provides a basic serv- For the detailed analysis of evaluation criteria and tools, 18 concepts were selected with the focus on formative and ex post evaluation in transdisciplinary and agricultural research and societal impact in general.quires great effort.One example of generalised continuous data assessment can be taken from the RCUK (7).The Outcome Collection System uses a database to record outcomes and impact for three years after project completion (RCUK 2010(RCUK , 2011a)).The system also provides data for, and accommodates reporting requirements of the Research Excellence Framework and the Higher Education Statistics Agency (RCUK 2010, 2011 b).
Interviewees mentioned the same possible data sources for "stand-alone procedures", but put more emphasis on "self-documentation" of scientists' productive interactions and impacts as a basis for the general extension of research evaluation (9/10 WS).Core issues of database-assisted self-documentation were that it needs to be complemented by feedback from practice and society and that additional effort relating to documentation by scientists needs to be avoided.

Challenges to Extending Research Evaluation
The direct evaluation of impact is most challenging.It raises the question of the attribution of impact to research, which in "systems of innovation" is regarded more in terms of "contribution".Additionally, it is difficult to deal with the time gap between research and impact, which includes unexpected and unintended impacts (Buxton 2011, Spaapen andVan Drooge 2011).Therefore,  12), e. g., applicability (12), potential benefit (8), and risks (3) contacts in general (14 + 5 WS) dissemination/presentations (9) communication/contact/collaboration(incl.workshops, field days)(8) (Q 16/16 e ) networking (Q 14/16), capacity building (3) contact/collaboration supported with structures and methods (9) quality of contact/collaboration (8) learning process/synthesis between scientific and practical knowledge (5) co-financing by practice (2 advocating, 1 refusing) articles in newspapers, periodicals, trade journals, magazines (9), specialist books (2), electronic media (2), recommendations for adviso ry service (Q 14/15) products (1) preparation of results and media appropriate for target groups (8 + 2 WS) response in practice and society reaching relevant target groups/multipliers (4 + 2 WS) size/characteristics of target groups (5 + 2 WS) follow-up projects with practice (1) (adapted) application of research results, concepts, products (20) extent of use: user groups (5 + 2 WS), area (1), sales volume, etc. (3) impact on politics (Q 12/16) impact of application/contribution to problem solving ( 7), e. g., (extent of) benefit (11) for target group, environment and society resp. in sustainability categories Criteria for evaluating practice-oriented achievements of scientists, projects, institutions and programmes.

TABLE 2: applicability
selected concepts (table 1) b mentioning similar criteria and criteria to be supplement ed from the literature 3, 12, 14, 16, 18; general presumption for transdisciplinarity; adequately broad know-how for problem-solving, in particular relating to causes of problems, natural/economic/judicial connections, feasibility 13 1, 2, 12, 14, 15, 16, 18; interest of stakeholders 5; strategy for commitment/ support of application 14, 16 direct productive interactions 6; translation into practice undertaken 9 5, 6, 7, 8, 10, 12, 13, 14, 16 3, 5, 6, 7, 9, 10, 12; number of non-scientific experts in projects 5, 10 4, 5, 6, 7, 9, 10, 14, 15, 18 1, 2, 6, 14, 16; representative integration of different views 15 3, 6, 12, 14, 16, 18; internal/external transparency of project 15 5, 6, 7, 10, 13; financial productive interactions 6 publications 3,5,6,7,10,12,13,14,16; indirect productive interactions 6; exhibition/performance 7; products 3, 6, 8, 14, 16; patents, trade mark 7; outputs/innovation: technical, political, capacity-building 4; many different outputs 7 1, 2, 3, 6, 14, 16 reaching relevant target groups, 5, 14; national/international awards from practice, education or politics 7, 10, 13; occupational change to practice 13 3, 5, 8, 12, 14, 16, 18; impact on politics 7, 8, 10; ratio: de facto/potential user group 4; translation into practice accomplished: level (regional, national, international), status (preliminary, permanent), target group (individuals, subpopulations, public) 9; spinoff company, exploitation 7 5, 8, 14, 15; extent of what is the problem solved 14, 16; cost/benefit ratio of impact 4; sustainability categories 1, 4; detailed impact summary, various impact types 7 a A: quantitative, easily applicable; B: quantitative and qualitative assessment combined, more effort involved; C: involvement of non-scientific persons required, with the amount of effort involved depending on the method used | b numbers refer to selected concepts in table 1 | c number of suggestions in the interviews/total number of interviewees | d number of suggestions in the workshop/total number of workshop participants (WS) | e outcome of the semi-quantitative assessment (Q) FORSCHUNG | RESEARCH > a critical mass of scientists for a profound and regularly updated dataset (see Holbrook 2012).enable data input whenever results are generated or impacts observed -independent of any specific evaluation.This will alleviate the time gap problem.support the assignment of productive interactions and impacts to different evaluation objects (project, programme, institution or scientist) via a database.The combination of categorised and free text information ensures 1. high usability for different evaluation objects, purposes and research contexts via individual filtering and detailed or aggregated use of data, 2. transparency and reliability of data, and 3. consideration of the individuality of research.be grounded in the "reality" of practice-oriented research to en sure robust, reliable data.Thus, documentation needs to be developed with and subsequently driven by researchers, and should be complemented by information from practice and society, in order to record the real productive interactions and impacts and admit of more viewpoints in evaluation than the purely scientific ones.
For future developments the benefit can be broadened or linked with current trends in science, like open access and potentialities of new media, and possibilities for stakeholder communication.ice for them.It aims to reduce the time and expense involved, and enables existing evaluation processes (e. g., in institutes) to be extended and common evaluation procedures for practical and societal impact to be developed (e.g., in conjunction with scientific publications like suggested by Niederkrotenthaler et al. 2011 or an ex post peer review of projects).
The concept outlined in figure 3 shows that the main data sour ces are structured proposals and reports.They include information about research processes, results, productive interactions, impact and context.Via the database system they can be assigned to and filtered for different evaluation objects and serve evaluation processes.
In the following, we describe the concept in terms of its required characteristics: Documentation should be broadly applicable because it 1.transcends the borders of individual evaluation objects, 2. includes different types of research, for example, different approaches, topics and disciplines, and 3. accommodates multiple evaluation purposes and a complete range of productive interactions and impacts.A broadly applicable documentation enables connections to be made between the establishment of an extended evaluation and the high variety of objectives of evaluation.use application and reporting for projects 7 in a structured form (example in figure 4, p.112) as the main (but not the only!) source of data.Most research is conducted in the context of projects, which have to be documented anyway.Therefore, the type of documentation may change, but the amount of effort involved in it does not increase.Using structured applications and reports also provides benefits for research funding by facilitating project administration and further development.Furthermore, use by research funding will assure the participation of Improved data collection: The use of structured proposals and reports (and other existing data sources) for the database system reduces the amount of effort required.The database allows data to be assigned to different evaluation objects and used for different evaluation processes.

FIGURE 3:
7 A rough analysis of requirements for research project applications and reports used by the German government departments, together with consideration of the objectives, documented for the BMELV research in general and its programmes BÖLN and Innovationsförderung, indicates that they go a long way toward covering the information needs of evaluation but -as already mentioned -not as easily usable data (see Wolf et al. forthcoming).

FORSCHUNG | RESEARCH
The documentation procedure described above will be further developed in a test phase, integrating the needs of scientists, evaluators and project funding organisations.We will also explore the inclusion of non-scientific persons in the evaluation process.Our further research will aim at a high degree of procedure usability and verify whether or not the expected benefits can be achieved.
Using the data for different purposes, for example, funding de cisions and learning processes (e. g., Rogers andJordan 2010, Manring 2007) would indeed raise questions of trust and users' conflicting interests.Thus, verification, careful use of data and evaluation consequences, and/or an accepted institution to run the database would be required.
Some might see extended evaluation as a threat to the "freedom of research", but it might equally well be asked whether the established hierarchical and discipline-based evaluation system really serves that freedom.From our point of view, evaluation with regard to scientific excellence and societal benefits opens up freedom in terms of the plurality of research (cf.Frodeman et al. 2012), and strengthens the democratisation of public research funding as called for by the Federation of German Scientists (VDW 2010).
The overview of documentation in figure 3 is detailed in figure 4. It illustrates the possible structure of proposals and reports and gives an example of the information that should be accessed for direct productive interactions and impact.

Discussion and Conclusions: Improved Documentation and Commitment in the Science-Society System Are Needed
The proposed concept for improved data collection is based on the variety of reviewed evaluation concepts and their substantial overlap in terms of criteria.Accordingly, even evaluations with different foci may be based on the same data -and all are currently hampered by the lack of reliable, easily usable data.
Our study started with the focus on agricultural research, but the literature analysis shows comparable challenges in different research fields.Therefore, our findings seem to be applicable in other disciplines too.A categorisation based on the properties instead of the "names" of productive interactions and impacts (e. g., vice-versa communication, practical demonstration and target group instead of "field day") (figure 4) will serve this generalisation.
The extension of evaluation beyond scientific impact requires discussion by a wide range of actors in the science-society system, which is already in progress.Indeed, in our study the interviews and literature review of criteria were carried out independently.For that reason, the interviews deliver important background information for developing criteria, tools and data assessment, but lack a discussion of existing criteria and tools.This discussion will be conducted in the follow-up project with regard to the data available.
First draft of the structure of proposals and reports (left) as an input for the database system shown in figure 3. Exemplified details of information for productive interactions and impact, accessed in the structured reports via text and categories (right).