Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Factors influencing plagiarism in higher education: A comparison of German and Slovene students

Roles Conceptualization, Data curation, Formal analysis, Project administration, Supervision, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Department of Personnel and Education, Faculty of Organizational Sciences, University of Maribor, Kranj, Slovenia

ORCID logo

Roles Funding acquisition, Writing – original draft, Writing – review & editing

Affiliation Faculty of Natural Sciences and Mathematics, University of Maribor, Maribor, Slovenia; School of Electronic and Information Engineering, Beihang University, Beijing, China

Roles Conceptualization, Data curation, Investigation, Supervision, Writing – original draft, Writing – review & editing

Affiliation Department of Economics and Law, Frankfurt University of Applied Sciences, Frankfurt, Germany

Roles Data curation, Formal analysis, Investigation, Writing – original draft

Affiliation Department of Methodology, Faculty of Organizational Sciences, University of Maribor, Kranj, Slovenia

Roles Formal analysis, Resources, Writing – original draft

Roles Funding acquisition, Project administration, Supervision, Writing – original draft

Roles Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing

  • Eva Jereb, 
  • Matjaž Perc, 
  • Barbara Lämmlein, 
  • Janja Jerebic, 
  • Marko Urh, 
  • Iztok Podbregar, 
  • Polona Šprajc

PLOS

  • Published: August 10, 2018
  • https://doi.org/10.1371/journal.pone.0202252
  • Reader Comments

Table 1

Over the past decades, plagiarism has been classified as a multi-layer phenomenon of dishonesty that occurs in higher education. A number of research papers have identified a host of factors such as gender, socialisation, efficiency gain, motivation for study, methodological uncertainties or easy access to electronic information via the Internet and new technologies, as reasons driving plagiarism. The paper at hand examines whether such factors are still effective and if there are any differences between German and Slovene students’ factors influencing plagiarism. A quantitative paper-and-pencil survey was carried out in Germany and Slovenia in 2017/2018 academic year, with a sample of 485 students from higher education institutions. The major findings of this research reveal that easy access to information-communication technologies and the Web is the main reason driving plagiarism. In that regard, there are no significant differences between German and Slovene students in terms of personal factors such as gender, motivation for study, and socialisation. In this sense, digitalisation and the Web outrank national borders.

Citation: Jereb E, Perc M, Lämmlein B, Jerebic J, Urh M, Podbregar I, et al. (2018) Factors influencing plagiarism in higher education: A comparison of German and Slovene students. PLoS ONE 13(8): e0202252. https://doi.org/10.1371/journal.pone.0202252

Editor: Andreas Wedrich, Medizinische Universitat Graz, AUSTRIA

Received: May 21, 2018; Accepted: July 6, 2018; Published: August 10, 2018

Copyright: © 2018 Jereb et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: MP was supported by the Slovenian Research Agency (Grant Nos. J1-7009 and P5-0027), http://www.arrs.gov.si/ . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Many of those who teach in higher education have encountered the phenomenon of plagiarism as a form of dishonesty in the classroom. According to the Oxford English Dictionary online 2017, the term plagiarism is defined as ‘the practice of taking someone else's work or ideas and passing them off as one's own’. Perrin, Larkham and Culwin define plagiarism as the use of an author's words, ideas, reflections and thoughts without proper acknowledgment of the author [ 1 – 3 ]. Koul et al. define plagiarism as a form of cheating and theft since in cases of plagiarism one person takes credit for another person’s intellectual work [ 4 ]. According to Fishman, ‘Plagiarism occurs when someone: 1) uses words, ideas, or work products; 2) attributable to another identifiable person or source; 3) without attributing the work to the source from which it was obtained; 4) in a situation in which there is a legitimate expectation of original authorship; 5) in order to obtain some benefit, credit, or gain which need not be monetary’ [ 5 ]. But why do students use someone else's words or ideas and pass them on as their own? Which factors influence this behaviour? That is the main focus of our research, to discover the factors influencing plagiarism and see if there are any differences between German and Slovene students.

Koul et al. pointed out that particular circumstances or events should be considered in the definition of plagiarism since plagiarism may vary across cultures and societies [ 4 ]. Hall has described Eastern cultures (the Middle East, Asia, Africa, South America) and Western cultures (North America and much of Europe) using the idea of ‘context’, which refers to the framework, background, and surrounding circumstances in which an event takes place [ 6 ]. Western societies are generally ‘low context’ societies. In other words, people in Western societies play by external rules (e.g., honour codes against plagiarism), and decisions are based on logic, facts, and directness. Eastern societies are generally ‘high context’ societies, meaning that people in Eastern societies put strong emphasis on relational concerns, and decisions are based on personal relationships. Nisbett et al. have suggested that differences between Westerners and Easterners may arise from people being socialised into different worldviews, cognitive processes and habits of mind [ 7 ]. In Germany, there has been ongoing reflection on academic plagiarism and other dishonest research practices since the late 19th century [ 8 ]. However, according to Ruiperez and Garcia-Cabrero, in Germany, 2011 became a landmark year with the appearance of an extensive public debate about plagiarism—brought back into the limelight because of an investigation into the incumbent German Defence Minister’s doctoral thesis [ 9 ]. Aside from the numerous cases of plagiarism detected in academic work since 2011, several initiatives have enriched the debate on academic plagiarism. For example, the development of a consolidated cooperative textual research methodology using a specific Wiki called ‘VroniPlag’ has made Germany one of the most advanced European countries in terms of combating these practices. Similar to Germany, Slovenia has also paid increased attention to plagiarism in recent years. The debate about plagiarism became public after it was discovered that certain Slovene politicians had resorted to academic plagiarism. Today, universities in Slovenia use a variety of tools (Turnitin, plagiarism plug-ins for Moodle, plagiarisma.net, etc.) in order to detect plagiarism. The focus of this research is to investigate the factors influencing plagiarism and if there are any differences between Slovene and German students’ factors influencing plagiarising. The research questions (RQ) of the study were divided into three groups:

  • RQ group 1: Which factors influence plagiarism in higher education?
  • RQ group 2: Are there any differences between male and female students regarding factors influencing plagiarism? Are the factors influencing plagiarism connected with specific areas of study (technical sciences, social sciences, natural sciences)?
  • RQ group 3: Does the students’ motivation affect their factors influencing plagiarism? Are there any differences between male and female students regarding this?

In addition, for all three research question groups, we also wanted to know if there were any differences between the German and Slovene students.

Theoretical background

Plagiarism is a highly complex phenomenon and, as such, it is likely that there is no single explanation for why individuals engage in plagiarist behaviours [ 10 ]. The situation is often complex and multi-dimensional, with no simple cause-and-effect link [ 11 ].

McCabe et al. noted that individual factors (e.g. gender, average grade, work ethic, self-esteem), institutional factors (e.g., faculty response to cheating, sanction threats, honour codes) and contextual factors (e.g., peer cheating behaviours, peer disapproval of cheating behaviours, perceived severity of penalties for cheating) influence cheating behaviour [ 12 ]. Giluk and Postlethwaite also related individual characteristics and situational factors to cheating—individual characteristics such as gender, age, ability, personality, and extracurricular involvement; and situational factors such as honour codes, penalties, and risk of detection [ 13 ]. The study of Jereb et al. also revealed that specific individual characteristics pertaining to men and women influence plagiarism [ 14 ]. Newstead et al. suggested that gender differences (plagiarism is more frequent among boys), age differences (plagiarism is more frequent among younger students), and academic performance differences (plagiarism is more frequent among lower performers) are specific factors for plagiarism [ 15 ]. Gerdeman stated that the following five student characteristic variables are frequently related to the incidence of dishonest behaviour: academic achievement, age, social activities, study major, and gender [ 16 ].

One of the factors influencing plagiarism could be that students do not have a clear understanding of what constitutes plagiarism and how it can be avoided [ 17 , 18 ]. According to Hansen, students don’t fully understand what constitutes plagiarism [ 19 ]. Park states genuine lack of understanding as one of the reasons for plagiarism. Some students plagiarise unintentionally, when they are not familiar with proper ways of quoting, paraphrasing, citing and referencing and/or when they are unclear about the meaning of ‘common knowledge’ and the expression ‘in their own words’ [ 11 ].

Furthermore, it is important to remember that, in our current day and age, information is easily accessed through new technologies. In addition, as Koul et al. have stated, the belief that we as people have greater ownership of information than we have paid for may influence attitudes towards plagiarism [ 4 ]. Many other authors have also stated that the Internet has increased the potential for plagiarism, since information is easily accessed through new technologies [ 14 , 20 , 21 , 22 ]. Indeed, the Internet grants easy access to an enormous amount of knowledge and learning materials. This provides an opportunity for students to easily cut, paste, download and plagiarise information [ 21 , 23 ]. Online resources are available 24/7 and enable a flood of information, which is also constantly updated. Given students' ease of access to both digital information and sophisticated digital technologies, several researchers have noted that students may be more likely to ignore academic ethics and to engage in plagiarism than would otherwise be the case [ 24 ].

In a study of the level of plagiarism in higher education, Tayraukham found that students with performance goals were more likely to indulge in plagiarism behaviours than students who wanted to achieve mastery of a particular subject [ 25 ]. Most of the students plagiarised in order to provide the right responses to study questions, with the ultimate goal of getting higher grades—rather than gaining expertise in their subjects of study. Anderman and Midgley observed that a relatively higher performance-oriented classroom climate increases cheating behaviour; while a higher mastery-oriented classroom climate decreases cheating behaviour [ 26 ]. Park also claimed that one of the reasons that students plagiarise is efficiency gain, that is, that students plagiarise in order to get a better grade and save time [ 11 ]. Songsriwittaya et al. stated that what motivates students to plagiarise is the goal of getting good grades and comparing their success with that of their peers [ 27 ]. The study of Ramzan et al. also revealed that the societal and family pressures of getting higher grades influence plagiarism [ 21 ]. Such pressures sometimes push students to indulge in unfair means such as plagiarism as a shortcut to performing better in exams or producing a certain number of publications. Engler et al. and Hard et al. tended to agree with this idea, stating that plagiarism arises out of social norms and peer relationships [ 28 , 29 ]. Park also stated that there are many calls on students’ time, including peer pressure for maintaining an active social life, commitment to college sports and performance activities, family responsibilities, and pressure to complete multiple work assignments in short amounts of time [ 11 ]. Šprajc et al. agreed that students are under an enormous amount of pressure from family, peers, and instructors, to compete for scholarships, admissions, and, of course, places in the job market [ 30 ]. This affects students’ time management and can lead to plagiarism. In addition to time pressures, Franklin-Stokes and Newstead found another six major reasons given by students to explain cheating behaviours: the desire to help a friend, a fear of failure, laziness, extenuating circumstances, the possibility of reaping a monetary reward, and because ‘everybody does it’ [ 31 ].

Another common reason for plagiarism is the poor preparation of lecture notes, which can lead to the inadequate referencing of texts [ 32 ]. Šprajc et al. found out that too many assignments given within a short time frame pushes students to plagiarise [ 30 ]. Poor explanations, bad teaching, and dissatisfaction with course content can also drive students to plagiarise. Park exposed students’ attitudes towards teachers and classes [ 11 ]. Some students cheat because they have negative attitudes towards assignments and tasks that teachers believe to have meaning but that they don’t [ 33 ]. Cheating tends to be more common in classes where the subject matter seems unimportant or uninteresting to students, or where the teacher seemed disinterested or permissive [ 16 ].

Park mentioned students’ academic skills (researching and writing skills, knowing how to cite, etc.) as another reason for plagiarism [ 11 ]. New students and international students whose first language is not English need to transition to the research culture by understanding the necessity of doing research, and the practice and skills required to do so, in order to avoid unintentional plagiarism [ 21 ]. According to Park to some students, plagiarism is a tangible way of showing dissent and expressing a lack of respect for authority [ 11 ]. Some students deny to themselves that they are cheating or find ways of legitimising their behaviour by passing the blame on to others. Other factors influencing plagiarising are temptation and opportunity. It is both easier and more tempting for students to plagiarise since information has become readily accessible with the Internet and Web search tools, making it faster and easier to find information and copy it. In addition, some people believe that since the Internet is free for all and a public domain, copying from the Internet requires no citation or acknowledgement of the source [ 34 ]. To some students, the benefits of plagiarising outweigh the risks, particularly if they think there is little or no chance of getting caught and there is little or no punishment if they are indeed caught [ 35 ].

One of the factors influencing plagiarism could be also higher institutions’ attitudes towards plagiarism, that is, whether they have clear policies regarding plagiarism and its consequences or not. The effective communication of policies, increased student awareness of penalties, and enforcement of these penalties tend to reduce dishonest behaviour [ 36 ]. Ramzan et al. [ 21 ] mentioned the research of Razera et al., who found that Swedish students and teachers need training to understand and avoid plagiarism [ 37 ]. In order to deal with plagiarism, teachers want and need a clear set of policies regarding detection tools, and extensive training in the use of detection software and systems. According to Ramzan et al., Dawson and Overfield determined that students are aware that plagiarism is bad but that they are not clear on what constitutes plagiarism and how to avoid it [ 21 , 38 ]. In Dawson and Overfield’s study, students required teachers to also observe the rules set up to avoid plagiarism and be consistently kept aware of plagiarism—in order to enforce the university’s resolve to control this academic misconduct.

According to this literature review and our experiences in higher education teaching, we determined that the following factors influence plagiarism: students’ individual factors, information-communication technologies (ICT) and the Web, regulation, students’ academic skills, teaching factors, different forms of pressure, student pride, and other reasons. The statements used in the instrument we developed, and the results of our research are presented in the following chapters.

Participants

The paper-and-pencil survey was carried out in the 2017/18 academic year at the University of Maribor in Slovenia and at the Frankfurt University of Applied Sciences in Germany. Students were verbally informed of the nature of the research and invited to freely participate. They were assured of anonymity. The study was approved by the Ethical Committee for Research in Organizational Sciences at Faculty of Organizational Sciences University of Maribor.

A sample of 191 students from Slovenia (SLO) (99 males (51.8%) and 92 (48.2%) females) and 294 students from Germany (GER) (115 males (39.1%) and 171 (58.2%) females) participated in this study. Slovene students’ ages ranged from 19 to 36 years, with a mean of 21 years and 1 months ( M = 21 . 12 and SD = 1 . 770 ) and German students’ ages ranged from 18 to 40 years, with a mean of 22 years and 10 months ( M = 22 . 84 and SD = 3 . 406 ). About half (49.2%) of the Slovene participants were social sciences students, 34.9% were technical sciences students, and 15.9% were natural sciences students. More than half (58.5%) of the German participants were social sciences students, 32% were technical sciences students and 2% were natural sciences students. More than half of the Slovene students (53.4%) attended blended learning, and 46.6% attended classic learning. The majority of German students (87.8%) attended classic learning, and 6.8% attended blended learning. More than half of the Slovene students (61.6%) were working at the time of the study, and 39.8% of all participants had scholarships. In addition, in Germany, more than half the students (65.0%) were working at the time of the study, but only 10.2% of all the German participants had scholarships. More than two thirds (68.9%) of the Slovene students were highly motivated for study and 31.1% less so; 32.6% of the students spend 2 or fewer hours per day on the Internet, 41.6% spend between 2 and 5 hours on the Internet, and 25.8% spend 5 or more hours on the Internet per day. Also, more than two thirds (73.1%) of the German students were highly motivated for study and 23.8% less so; 33.3% of the students spend 2 or fewer hours per day on the Internet, 32.3% spend between 2 and 5 hours on the Internet, and 27.9% spend 5 or more hours on the Internet per day. The general data can be seen in S1 Table .

The questionnaire contained closed questions referring to: (i) general/individual data (gender, age, area of study, method of study, working status, scholarship, motivation for study, average time spent on the internet), and factors influencing plagiarism (ii) ICT and Web, (iii) regulation, (iv) academic skills, (v) teaching factors, (vi) pressure, (vii) pride, (viii) other reasons. The items in the groups (ii) to (viii) used a 5-point Likert scale from strongly disagree (1) to strongly agree (5), with larger values indicating stronger orientation.

The statements used in the survey were as follows:

  • 1.1 It is easy for me to copy/paste due to contemporary technology
  • 1.2 I do not know how to cite electronic information
  • 1.3 It is hard for me to keep track of information sources on the web
  • 1.4 I can easily access research material using the Internet
  • 1.5 Easy access to new technologies
  • 1.6 I can easily translate information from other languages
  • 1.7 I can easily combine information from multiple sources
  • 1.8 It is easy to share documents, information, data
  • 2.1 There is no teacher control on plagiarism
  • 2.2 There is no faculty regulation against plagiarism
  • 2.3 There is no university regulation against plagiarism
  • 2.4 There are no penalties
  • 2.5 There are no honour codes relating to plagiarism
  • 2.6 There are no electronic systems of control
  • 2.7 There is no systematic tracking of violators
  • 2.8 I will not get caught
  • 2.9 I am not aware of penalties
  • 2.10 I do not understand the consequences
  • 2.11 The penalties are minor
  • 2.12 The gains are higher than the losses
  • 3.1 I run out of time
  • 3.2 I am unable to cope with the workload
  • 3.3 I do not know how to cite
  • 3.4 I do not know how to find research materials
  • 3.5 I do not know how to research
  • 3.6 My reading comprehension skills are weak
  • 3.7 My writing skills are weak
  • 3.8 I sometimes have difficulty expressing my own ideas
  • 4.1 The tasks are too difficult
  • 4.2 Poor explanation—bad teaching
  • 4.3 Too many assignments in a short amount of time
  • 4.4 Plagiarism is not explained
  • 4.5 I am not satisfied with course content
  • 4.6 Teachers do not care
  • 4.7 Teachers do not read students' assignments
  • 5.1 Family pressure
  • 5.2 Peer pressure
  • 5.3 Under stress
  • 5.4 Faculty pressure
  • 5.5 Money pressure
  • 5.6 Afraid to fail
  • 5.7 Job pressure
  • 6.1 I do not want to look stupid in front of peers
  • 6.2 I do not want to look stupid in front of my professor
  • 6.3 I do not want to embarrass my family
  • 6.4 I do not want to embarrass myself
  • 6.5 I focus on how my competences will be judged relative to others
  • 6.6 I am focused on learning according to self-set standards
  • 6.7 I fear asking for help
  • 6.8 My fear of performing poorly motivates me to plagiarise
  • 6.9 Assigned academic work will not help me personally/professionally
  • 7.1 I do not want to work hard
  • 7.2 I do not want to learn anything, just pass
  • 7.3 My work is not good enough
  • 7.4 It is easier to plagiarise than to work
  • 7.5 To get a better/higher mark (score)

All statistical tests were performed with SPSS at the significance level of 0.05. Parametric tests (Independent–Samples t-Test and One-Way ANOVA) were selected for normal and near-normal distributions of the responses. Nonparametric tests (Mann-Whitney Test, Kruskal-Wallis Test, Friedman’s ANOVA) were used for significantly non-normal distributions. Chi-Square Test was used to investigate the independence between variables.

The average values for the groups (and standard deviations) of the responses referring to the factors influencing plagiarism can be seen in Table 1 (descriptive statistics for all statements can be seen in S2 Table ), shown separately for Slovene and German students. An Independent Samples t-test was conducted to obtain the average values of the responses, and thus evaluate for which statements these differed significantly between the Slovene and German students.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0202252.t001

According to the Friedman’s ANOVA (see Table 2 ), the Slovene students’ factors influencing plagiarism can be formed into four homogeneous subsets, where in each subset, the distributions of the average values for the responses are not significantly different. At the top of the list is the existence of ICT and the Web (group 1). The second subset consists of teaching factors (group 4). The third subset is composed of academic skills, other reasons, and pride, in order from highest to lowest (groups 3, 7 and 6). The fourth subset is composed of other reasons, pride, pressure, and regulation, respectively (groups 7, 6, 5 and 2).

thumbnail

https://doi.org/10.1371/journal.pone.0202252.t002

For the Slovene students, ICT and the Web were detected as the dominant factors influencing plagiarism and, as such, we investigated them in greater detail. A Friedman Test ( Chi-Square = 7.180, p = .066) confirmed that the distributions of the responses to the statements 1.1, 1.4, 1.5 and 1.8—those with the highest sample means—are not significantly different. Consequently, the average values (means) of the responses to the statements 1.1, 1.4, 1.5 and 1.8 are not significantly different. The average values of the responses for all the other statements (1.7, 1.6, 1.2, and 1.3 listed in the descending order of sample means) are significantly lower. A Mann-Whitney Test showed that there is no statistically significant difference between the distributions of the responses in the group of ICT and Web reasons considering gender (male, female) and motivation for study (lower, higher). For statement 1.2, a Kruskal-Wallis Test ( Chi-Square = 7.466, p = .024) confirmed that there are different distributions for the responses when the area of study is considered (technical sciences, social sciences, natural sciences).

According to the Friedman’s ANOVA (see Table 3 ), the German students’ factors influencing plagiarism can be formed into five homogeneous subsets, where in each subset, the distributions of the average values for the responses are not significantly different. At the top of the list is the existence of ICT and the Web (group 1). The second subset is composed of pressure and pride, in order from highest to lowest (groups 5 and 6). The third subset consists of pride, teaching factors and other reasons, respectively (groups 6, 4 and 7). The fourth subset is composed of teaching factors, other reasons and academic skills, in order from highest to lowest (groups 4, 7 and 3). Finally, the last subset consists of regulation (group 2).

thumbnail

https://doi.org/10.1371/journal.pone.0202252.t003

Just like the Slovene students, for the German students ICT and the Web were detected as the dominant factors influencing plagiarism. That the distributions of the responses to the statements 1.4, 1.5 and 1.8—those with the highest sample means—are not significantly different was confirmed by Friedman Test ( Chi-Square = 5.815, p = .055). Consequently, the average values (means) of the responses to the statements 1.4, 1.5 and 1.8 are not significantly different. The average values of the responses for all the other statements (1.1, 1.7, 1.6, 1.2, and 1.3 listed in the descending order of sample means) are significantly lower. A Wilcoxon Signed Ranks Tests also confirmed that the distributions of the responses to the statements 1.6 and 1.7 are not statistically significantly different ( Z = -0.430, p = .667). The same holds for statements 1.2 and 1.3 ( Z = -0.407, p = .684). A Mann-Whitney Test showed that there is no statistically significant difference between the distributions of the responses in the group of ICT and Web reasons considering gender (male, female), area of study (technical and social sciences (students of natural sciences were omitted due to the small sample size)) and motivation for study.

ICT and Web reasons were detected as the dominant factors influencing plagiarism for Slovene and German students. As can be seen in Table 1 , there are significant differences ( t = 4.177, p = .000 ) between the Slovene and German students regarding this factor. It seems that the Slovene students ( M = 3.69, SD = 0.56) attribute greater importance to the ICT and Web reasons than the German students ( M = 3.47, SD = 0.55). There are also significant differences ( t = 5.137, p = .000 ) between the Slovene and German students regarding regulation. It seems that the Slovene students ( M = 2.35, SD = 0.63) attribute greater importance to regulation reasons than the German students ( M = 2.05, SD = 0.61). Both, however, consider this factor to have the lowest impact on plagiarism overall. There are no significant differences ( t = 1.939, p = .053 ) between the Slovene students ( M = 2.56, SD = 0.67) and the German students ( M = 2.44, SD = 0.68) regarding academic skills. The Slovene students ( M = 2.87, SD = 0.68) attribute greater importance to teaching factors than the German students ( M = 2.56, SD = 0.72). The differences are significant ( t = 4.827, p = .000 ) . There are significant differences ( t = -3.522, p = .000 ) between the Slovene and German students regarding pressure, whereas the German students ( M = 2.71, SD = 0.91) attribute greater importance to this reason than the Slovene students ( M = 2.42, SD = 0.86). The same goes for pride. The German students ( M = 2.67, SD = 0.80) attribute greater importance to pride reasons than the Slovene students ( M = 2.43, SD = 0.84). The differences are significant ( t = -3.032, p = .003 ) . There are no significant differences ( t = - 0.836, p = .404 ) between the Slovene students ( M = 2.47, SD = 0.82) and the German students ( M = 2.54, SD = 0.94) regarding other factors influencing plagiarism.

We conducted an Independent Samples t-test to compare the average time (in hours) spent per day on the Internet by the Slovene students with that of the German students. The test was significant, t = -2.064, p = .004. The Slovene students on average spent less time on the Internet ( M = 3.52, SD = 2.23) than the German students ( M = 4.09, SD = 3.72).

The average values of the responses for individual statements according to gender (male, female) and the significances for the t-test of equality of means are shown in S3 Table for the Slovene students and in S4 Table for the German students. The average values of the responses for these statements are significantly different. They are higher for males than for females (except in the case of statement 3.8 for the Slovene students and 4.1 for the German students). Slovene and German male students think that they will not get caught and that the gains are higher than the losses. Both also think that teachers do not read students’ assignments.

The average values of the responses for individual statements according to area of study (technical sciences, social sciences, natural sciences) and the results for ANOVA for the Slovene students are shown in S5 Table . Gabriel's post hoc test was used to confirm the differences between groups. The significant difference between the students of technical sciences and the students of social sciences was confirmed for all statements listed in S5 Table . There were higher average values of responses for the students of technical sciences. The only significant difference between the students of technical sciences and the students of natural sciences was confirmed for statement 5.6 (there were higher average values of responses for the students of technical sciences). No other pairs of group means were significantly different.

The average values of the responses for individual statements according to area of study (technical sciences, social sciences) and the significances for the t-test of equality of means for German students are shown in S6 Table . For German students, only technical and social sciences were considered because of the low number of natural sciences students. The average values of responses for these statements are significantly different. They were higher for the students of technical sciences than for the students of social sciences.

The average values of the responses for individual statements according to the motivation of the students (lower, higher) and the significances for t-Test of equality of means are shown in S7 Table for the Slovene students and in S8 Table for the German students. The average values of the responses for these statements are significantly different. They were higher for students with lower motivation for both groups of students, except in the case of statements 2.1 and 6.6 for Slovene students.

We conducted an Independent Samples t-test to compare the average time (in hours) spent per day on the Internet by groups of low motivated students with groups of highly motivated students. For Slovene students, the test was not significant, t = -1.423, p = .156. For German students, the test was significant, t = 2.298, p = .024. Students with lower motivation for study ( M = 5.24, SD = 4.84) on average spent more time on the Internet than those with higher motivation for study ( M = 3.76, SD = 3.27).

The Chi-Square Test of Independence was used to determine whether there is an association between gender (male, female) and motivation for study (lower, higher). There was a significant association between gender and motivation for the Slovene students ( Chi-Square = 4.499, p = .034). Indeed, it was more likely for females to have a high motivation for study (76.9%) than for males to have a high motivation for study (61.6%). For the German students, the test was not significant ( Chi-Square = 0.731, p = .393).

In this study, we aimed to explore factors that influence students’ factors influencing plagiarism. An international comparison between German and Slovene students was made. Our research draws on students from two universities from the two considered countries that cover all traditional subjects of study. In this regard the conclusions are representative and statistically relevant, although we of course cannot exclude the possibility of small deviations if other or more institutions would be considered. Taken as a whole, there are no major differences between German and Slovene students when it comes to motivation for study and working habits. In both cases, more than two thirds of the students were highly motivated for study and more than 60% were working during their time of study. About 33% of the surveyed students spend on average two or less hours a day on the Internet, and about one quarter spend on average more than five hours a day on the Internet.

When it comes to explaining plagiarism in higher education, the German and Slovene students equally indicated the ease-of-use of information-communication technologies and the Web as the top one cause for their behaviour. Which does not lag behind other notions of current contributions to the topic of plagiarism in the world. Indeed, our findings reinforce the notion that new technologies and the Web have a strong influence on students and are the main driver behind plagiarism [ 20 , 21 , 22 ]. An academic moral panic has been caused by the arrival in higher education of a new generation of younger students [ 39 ], deemed to be ‘digital natives’ [ 40 ] and allegedly endowed with an inherent ability for using information-communication technologies (ICT). This younger generation is dubbed ‘Generation Me’ [ 41 ], and it is believed that their expectations, interactions and learning processes have been affected by ICT. Introna, et al., Ma et al., and Yeo, agree that the understanding of the concept of plagiarism through the use of ICT is the main contributor to it being a problem [ 42 , 43 , 44 ]. The effortless use of ICT such as the Internet has made it easy for students to retrieve information with a simple click of the mouse [ 45 , 46 ].

The Slovene students in our study nominated the teaching factor as the second most important reason for plagiarism. This result is also found in other studies, namely those of Šprajc et al. [ 30 ] and Barnas [ 47 ]. Young people in Slovenia are, like in other Western societies, given a prolonged period of identity exploration and self-focus, i.e., freedom from institutional demands and obligations, competence, and freedom to decide for themselves [ 48 , 49 ]. The results of the German students however, contradict this finding that teaching factors are one of the most important factors influencing plagiarism. Indeed, the top two factors influencing plagiarism for the German students are actually pressure and pride—and not teaching factors. Overall though, the findings for both the German students and the Slovene students are in line with e.g. Koul et al., who suggest that factors influencing plagiarism may vary across cultures [ 4 ]. Among German students, the pressure and pride in the second and third places in terms of importance are mostly reflected, which does not lag behind the mention of the author Rothenberg stated that in Germany today ‘pride could be expressed for individual accomplishments’ [ 50 ]. As far as the Slovene students are concerned, the authors Kondrič et al. presumed that there is a specific set of values in Slovenia, which perhaps intensify the distinction between the collectivist culture of former socialist countries and the individualism of Western countries [ 51 ]. This might shed light on why the Slovene students consider teaching factors as being one of the most important factors influencing plagiarism.

Furthermore, several studies have implied that individual characteristics, especially gender, play an important role when it comes to plagiarism [ 12 , 13 , 15 , 16 ]. A number of studies from around the world have shown that men more frequently plagiarise than women do. For example, Reviews of North American’s research into conventional plagiarism has indicated that male students cheat more often than female students [ 12 ]. The results we found are basically in line with these findings. Since the average values of responses are significantly different for male and female students, gender seems to play an important role in terms of plagiarism.

Park pointed out that one reason for plagiarism is efficiency gain [ 11 ]. About 15 years after this statement, the study at hand is empirical evidence that efficiency gain due to different forms of pressure is still a factor that influences students’ behaviour in terms of plagiarism. Lack of knowledge and uncertainties about methodologies are additional factors that are frequently recognized as reasons for plagiarism [ 11 , 17 , 18 ]. The results at hand support these studies since the responses about e.g. academic skills demonstrate students’ lack of knowledge.

Another interesting finding of our study shows that students with a lower motivation for study spend more time on the Internet, which complements our finding that the Internet is one of the simplest solutions for studying. The German students showed a somewhat higher level of motivation to study than the Slovene students, but the difference is not statistically significant.

We would nevertheless like to draw attention to the perceived difference, which refers to the perception of the factors influencing the plagiarism of the teacher factors and academic skills (Slovene students) and pride and pressure (German students). The perceived difference between students is one of the social dimensions that represents a tool to promote true motivation for study and proper orientation without ethically disputable solutions (such as plagiarism). In all this, it makes sense to direct students and educate them from the beginning of education together with information technology, while also builds responsible individuals who will not take technology and the Internet as a negative tool for studying and succeeding, but to help them to solve and make decisions in the right way. The main aim of this research into Slovene and German students was to increase understanding of students’ attitudes towards plagiarism and, above all, to identify the reasons that lead students to plagiarise. On this basis, we want to expose the way of non-plagiarism promotion to be developed in a way that will be more acceptable and more understandable in each country and adequately controlled on a personal and institutional level.

Conclusions

In contrast to a number of preliminary studies, the major findings of this research paper indicate that new technologies and the Web have a strong and significant influence on plagiarism, whereas in this specific context gender and socialisation factors do not play a significant role. Since the majority of the students in our study believe that new technologies and the Web have a strong influence on plagiarism, we can assume that technological progress and globalisation has started breaking down national frontiers and crossing cultural boundaries. These findings have also created the impression that at universities the gender gap is not predominant in all areas as it might be in society.

Nevertheless, some minor results in our study indicate that there are still some differences between Slovene and German students. For example, it seems like in Slovenia, teaching factors have a greater influence on plagiarism than in Germany. Indeed, in Germany, the focus should rest on the implementation and publication of a code of ethics, and on training students to deal with pressure.

This research focuses on only two countries, Slovenia and Germany. Thus, the findings at hand are not necessarily generalizable, though they do manifest a certain trend in terms of the reasons why students resort to plagiarism. Furthermore, the results could be a starting point for additional comparative studies between different European regions. In particular, further research into the influence of digitalization and the Web on plagiarism, and the role of socialisation and gender factors on plagiarism, could contribute to the discourse on plagiarism in higher education institutions.

Understanding the reasons behind plagiarism and fostering awareness of the issue among students might help prevent future academic misconduct through increased support and guidance during students’ time studying at the university. In this sense, further reflection on preventive measures is required. Indeed, rather than focusing on the detection of plagiarism, focusing on preventive measures could have a positive effect on good scientific practice in the near future.

Supporting information

S1 table. frequency distributions of the study variables..

https://doi.org/10.1371/journal.pone.0202252.s001

S2 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by nationality and results of the t-Test.

https://doi.org/10.1371/journal.pone.0202252.s002

S3 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by gender and results of the t-Test (SLO).

https://doi.org/10.1371/journal.pone.0202252.s003

S4 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by gender and results of the t-Test (GER).

https://doi.org/10.1371/journal.pone.0202252.s004

S5 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by area of study and results of the One-Way ANOVA (SLO).

https://doi.org/10.1371/journal.pone.0202252.s005

S6 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by study area and results of the t-Test (GER).

https://doi.org/10.1371/journal.pone.0202252.s006

S7 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by motivation and results of the t-Test (SLO).

https://doi.org/10.1371/journal.pone.0202252.s007

S8 Table. Descriptive statistics for items referring to the factors influencing plagiarism, by motivation and results of the t-Test (GER).

https://doi.org/10.1371/journal.pone.0202252.s008

S1 File. Individual data.

https://doi.org/10.1371/journal.pone.0202252.s009

  • 1. Perrin R. Pocket guide to APA style. 3 rd ed. Boston, MA: Wadsworth; 2009.
  • View Article
  • Google Scholar
  • 5. Fishman T. We Know it When We See it is not Good Enough: Toward a Standard Definition of Plagiarism that Transcends Theft, Fraud, and Copyright. Paper presented at the 4th Asia Pacific Conference on Educational Integrity, NSW, Australia. 2009. Available from: http://www.bmartin.cc/pubs/09-4apcei/4apcei-Fishman.pdf
  • 6. Hall TE. Beyond culture. New York: Anchor Books; 1979.
  • PubMed/NCBI
  • 8. Schwinges RC. (Ed.). Examen, Titel, Promotionen. Akademisches und Staatliches Qualifikationswesen vom 13. bis zum 21. Jahrhundert [Examinations, Titles, Doctorates, Academic and Government Qualifications from the 13th to the 21th Century]. Basilea: Schwabe; 2007. https://doi.org/10.1093/acprof:osobl/9780199694-044.003.0009
  • 16. Gerdeman RD. Academic dishonesty and the community college, ERIC Digest, ED447840; 2000. Available from: https://www.ericdigests.org/2001-3/college.htm
  • 27. Songsriwittaya A, Kongsuwan S, Jitgarum K, Kaewkuekool S, Koul R. Engineering Students' Attitude towards Plagiarism a Survey Study. Korea: ICEE & ICEER; 2009.
  • 37. Razera D, Verhage H, Pargman TC, Ramberg R. Plagiarism awareness, perception, and attitudes among students and teachers in Swedish higher education-a case study. Paper Presented at the 4th International Plagiarism Conference-Towards an authentic future. Newcastle Upon Tyne, UK. 2010. Available from http://www.plagiarismadvice.org/researchpapers/item/plagiarism-awareness
  • 39. Bennett S, Maton K. Intellectual field or faith-based religion: moving on from the idea of ‘digital natives’. In Thomas M, editor. Deconstructing digital natives. London: Routledge; 2011. pp. 169–185.
  • 40. Prensky M. Digital wisdom and homo sapiens digital. In Thomas M, editor. Deconstructing digital natives. London: Routledge; 2011. pp. 15–29.
  • 42. Introna L, Hayes N, Blair L, Wood E. Cultural attitudes towards plagiarism. Lancaster: University of Lancaster; 2003.
  • 48. Puklek-Levpušček M, Zupančič M. Slovenia. In Arnett JJ, editor. International encyclopedia on adolescence. New York: Routledge; 2007. pp. 866–877. 10.1007/s10734-011-9481-4 .
  • 49. Zupančič M. Razvojno obdobje prehoda v odraslost—temeljne značilnosti. [Developmental period of transition to adulthood—basic characteristics]. In Puklek-Levpušček M, Zupančič M, editors. Študenti na prehodu v odraslost [Students in transition to adulthood]. Ljubljana: Znanstveno raziskovalni inštitut Filozofske fakultete; 2011. pp. 9–38.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Brief Communication
  • Published: 02 August 2023

Modern threats in academia: evaluating plagiarism and artificial intelligence detection scores of ChatGPT

  • Andrea Taloni 1 ,
  • Vincenzo Scorcia   ORCID: orcid.org/0000-0001-6826-7957 1 &
  • Giuseppe Giannaccare 1  

Eye volume  38 ,  pages 397–400 ( 2024 ) Cite this article

1285 Accesses

8 Citations

21 Altmetric

Metrics details

  • Scientific community

Plagiarism and research integrity are sensitive issues in the academic setting, especially after the recent offspring of artificial intelligence (AI) and large language models (LLMs) such as GPT-4.0 [ 1 ]. As the popularity of ChatGPT increases, some authors have attempted to write abstracts and full-text articles using AI, obtaining essays that resemble genuine scientific papers [ 2 , 3 , 4 ]. Detection systems for AI-generated texts have been recently developed. Miller et al. performed AI detection on a large sample of abstracts belonging to articles published between 2020 and 2023, reporting a significant increase in AI-assisted writing [ 5 ].

We evaluated herein the plagiarism and AI-detection scores of GPT-4.0 when paraphrasing original scientific essays, and furthermore tested methods that could possibly evade AI detection.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 18 print issues and online access

251,40 € per year

only 13,97 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

OpenAI, https://openai.com/ . Accessed June 2023.

Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, et al. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. npj Digital Med. 2023;6:1–5.

Article   Google Scholar  

Else H. Abstracts written by ChatGPT fool scientists. Nature. 2023;613:423.

Article   CAS   PubMed   Google Scholar  

Májovský M, Černý M, Kasal M, Komarc M, Netuka D. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: pandora’s box has been opened. J Med Internet Res. 2023;25:e46924. https://doi.org/10.2196/46924 .

Article   PubMed   PubMed Central   Google Scholar  

Miller LE, Bhattacharyya D, Miller VM, Bhattacharyya M. Recent trend in artificial intelligence-assisted biomedical publishing: a quantitative bibliometric analysis. Cureus. 2023;15:e39224. https://doi.org/10.7759/CUREUS.39224 .

Download references

Author information

Authors and affiliations.

Department of Ophthalmology, University Magna Graecia of Catanzaro, Catanzaro, Italy

Andrea Taloni, Vincenzo Scorcia & Giuseppe Giannaccare

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, AT and GG; Methodology, AT, GG and VS; Validation, GG and VS; Formal Analysis, AT, and GG; Investigation, AT; Data Curation, AT; Writing—Original Draft Preparation, AT and GG; Writing—Review and Editing, AT, GG and VS; Visualization, AT, GG and VS; Supervision, GG and VS; Project Administration, AT, GG and VS. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Giuseppe Giannaccare .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Taloni, A., Scorcia, V. & Giannaccare, G. Modern threats in academia: evaluating plagiarism and artificial intelligence detection scores of ChatGPT. Eye 38 , 397–400 (2024). https://doi.org/10.1038/s41433-023-02678-7

Download citation

Received : 18 July 2023

Accepted : 18 July 2023

Published : 02 August 2023

Issue Date : February 2024

DOI : https://doi.org/10.1038/s41433-023-02678-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

The model student: gpt-4 performance on graduate biomedical science exams.

  • Daniel Stribling

Scientific Reports (2024)

‘Fighting fire with fire’ — using LLMs to combat LLM hallucinations

  • Karin Verspoor

Nature (2024)

Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology

  • Andrea Taloni
  • Massimiliano Borselli
  • Giuseppe Giannaccare

Scientific Reports (2023)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research article about plagiarism

Academic Plagiarism Detection: A Systematic Literature Review Academic Plagiarism Detection: A Systematic Literature Review

ACM Comput. Surv., Vol. 52, No. 6, Article 112, Publication date: October 2019. DOI: https://doi.org/10.1145/3345317

This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.

ACM Reference format: Tomáš Foltýnek, Norman Meuschke, and Bela Gipp. 2019. Academic Plagiarism Detection: A Systematic Literature Review. ACM Comput. Surv. 52, 6, Article 112 (October 2019), 42 pages. https://doi.org/10.1145/3345317

INTRODUCTION

Academic plagiarism is one of the severest forms of research misconduct (a “cardinal sin”) [ 14 ] and has strong negative impacts on academia and the public. Plagiarized research papers impede the scientific process, e.g., by distorting the mechanisms for tracing and correcting results. If researchers expand or revise earlier findings in subsequent research, then papers that plagiarized the original paper remain unaffected. Wrong findings can spread and affect later research or practical applications [ 90 ]. For example, in medicine or pharmacology, meta-studies are an important tool to assess the efficacy and safety of medical drugs and treatments. Plagiarized research papers can skew meta-studies and thus jeopardize patient safety [ 65 ].

Furthermore, academic plagiarism wastes resources. For example, Wager [ 261 ] quotes a journal editor stating that 10% of the papers submitted to the respective journal suffered from plagiarism of an unacceptable extent. In Germany, the ongoing crowdsourcing project VroniPlag 1 has investigated more than 200 cases of alleged academic plagiarism (as of July 2019). Even in the best case, i.e., if the plagiarism is discovered, reviewing and punishing plagiarized research papers and grant applications still causes a high effort for the reviewers, affected institutions, and funding agencies. The cases reported in VroniPlag showed that investigations into plagiarism allegations often require hundreds of work hours from affected institutions.

If plagiarism remains undiscovered, then the negative effects are even more severe. Plagiarists can unduly receive research funds and career advancements as funding agencies may award grants for plagiarized ideas or accept plagiarized research papers as the outcomes of research projects. The artificial inflation of publication and citation counts through plagiarism can further aggravate the problem. Studies showed that some plagiarized papers are cited at least as often as the original [ 23 ]. This phenomenon is problematic, since citation counts are widely used indicators of research performance, e.g., for funding or hiring decisions.

From an educational perspective, academic plagiarism is detrimental to competence acquisition and assessment. Practicing is crucial to human learning. If students receive credit for work done by others, then an important extrinsic motivation for acquiring knowledge and competences is reduced. Likewise, the assessment of competence is distorted, which again can result in undue career benefits for plagiarists.

The problem of academic plagiarism is not new but has been present for centuries. However, the rapid and continuous advancement of information technology (IT), which offers convenient and instant access to vast amounts of information, has made plagiarizing easier than ever. At the same time, IT also facilitated the detection of academic plagiarism. As we present in this article, hundreds of researchers address the automated detection of academic plagiarism and publish hundreds of research papers a year.

The high intensity and rapid pace of research on academic plagiarism detection make it difficult for researchers to get an overview of the field. Published literature reviews alleviate the problem by summarizing previous research, critically examining contributions, explaining results, and clarifying alternative views [ 212 , 40 ]. Literature reviews are particularly helpful for young researchers and researchers who newly enter a field. Often, these two groups of researchers contribute new ideas that keep a field alive and advance the state of the art.

In 2013, we provided a first descriptive review of the state of the art in academic plagiarism detection [ 160 ]. Given the rapid development of the field, we see the need for a follow-up study to summarize the research since 2013. Therefore, this article provides a systematic qualitative literature review [ 187 ] that critically evaluates the capabilities of computational methods to detect plagiarism in academic documents and identifies current research trends and research gaps.

The literature review at hand answers the following research questions:

  • Did researchers propose conceptually new approaches for this task?
  • Which improvements to existing detection methods have been reported?
  • Which research gaps and trends for future research are observable in the literature?

To answer these questions, we organize the remainder of this article as follows. The section Methodology describes our procedure and criteria for data collection. The following section, Related Literature Reviews , summarizes the contributions of our compared to topically related reviews published since 2013. The section Overview of the Research Field describes the major research areas in the field of academic plagiarism detection. The section Definition and Typology of Plagiarism introduces our definition and a three-layered model for addressing plagiarism (methods, systems, and policies). The section Review of Plagiarism Typologies synthesizes the classifications of plagiarism found in the literature into a technically oriented typology suitable for our review. The section Plagiarism Detection Methods is the core of this article. For each class of computational plagiarism detection methods, the section provides a description and an overview of research papers that employ the method in question. The section Plagiarism Detection Systems discusses the application of detection methods in plagiarism detection systems. The Discussion section summarizes the advances in plagiarism detection research and outlines open research questions.

METHODOLOGY

To collect the research papers included in our review, we performed a keyword-based automated search [ 212 ] using Google Scholar and Web of Science. We limited the search period to 2013 until 2018 (including). However, papers that introduced a novel concept or approach often predate 2013. To ensure that our survey covers all relevant primary literature, we included such seminal papers regardless of their publication date.

Google Scholar indexes major computer science literature databases, including IEEE Xplore, ACM Digital Library, ScienceDirect, SpringerLink, and TandFonline, as well as grey literature. Fagan [ 68 ] provides an extensive list of “ recent studies [that] repeatedly find that Google Scholar's coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations ” [ 68 ]. Therefore, we consider Google Scholar as a meta-database that meets the search criteria recommended in the guidelines for conducting systematic literature reviews [ 40 , 137 ]. Using Google Scholar also addresses the “lack of conformity, especially in terms of searching facilities, across commonly used digital libraries,” which Brereton et al. [ 40 ] identified as a hindrance to systematic literature reviews in computer science.

Criticism of using Google Scholar for literature research includes that the system's relevance ranking assigns too much importance to citation count [ 68 ], i.e., the number of citations a paper receives. Moreover, Google Scholar covers predatory journals [ 31 ]. Most guidelines for systematic reviews, therefore, recommend using additional search tools despite the comprehensive coverage of Google Scholar [ 68 ]. Following this recommendation, we additionally queried Web of Science. Since we seek to cover the most influential papers on academic plagiarism detection, we consider a relevance ranking based on citation counts as an advantage rather than a disadvantage. Hence, we used the relevance ranking of Google Scholar and ranked search results from Web of Science by citation count. We excluded all papers (11) that appeared in venues mentioned in Beall's List of Predatory Journals and Publishers . 2

Our procedure for paper collection consisted of the five phases described hereafter. We reviewed the first 50 search results when using Google Scholar and the first 150 search results when using Web of Science.

In the first phase , we sought to include existing literature reviews on plagiarism detection for academic documents. Therefore, we queried Google Scholar using the following keywords: plagiarism detection literature review, similarity detection literature review, plagiarism detection state of art, similarity detection state of art, plagiarism detection survey, similarity detection survey .

In the second phase , we added topically related papers using the following rather general keywords: plagiarism, plagiarism detection, similarity detection, extrinsic plagiarism detection, external plagiarism detection, intrinsic plagiarism detection, internal plagiarism detection .

After reviewing the papers retrieved in the first and second phases, we defined the structure of our review and adjusted the scope of our data collection as follows:

  • We focused our search on plagiarism detection for text documents and hence excluded papers addressing other tasks, such as plagiarism detection for source code or images. We also excluded papers focusing on corpora development.
  • We excluded papers addressing policy and educational issues related to plagiarism detection to sharpen the focus of our review on computational detection methods.

Having made these adjustments to our search strategy, we started the third phase of the data collection. We queried Google Scholar with the following keywords related to specific sub-topics of plagiarism detection, which we had identified as important during the first and second phases: semantic analysis plagiarism detection, machine-learning plagiarism detection .

In the fourth phase , we sought to prevent selection bias from exclusively using Google Scholar by querying Web of Science using the keyword plagiarism detection .

In the fifth phase , we added to our dataset papers from the search period that are topically related to papers we had already collected. To do so, we included relevant references of collected papers and papers that publishers’ systems recommended as related to papers in our collection. Following this procedure, we included notebook papers of the annual PAN and SemEval workshops. To ensure the significance of research contributions, we excluded papers that were not referenced in the official overview papers of the PAN and SemEval workshops or reported results below the baseline provided by the workshop organizers. For the same reason, we excluded papers that do not report experimental evaluation results.

To ensure the consistency of paper processing, the first author read all papers in the final dataset and recorded the paper's key content in a mind map. All authors continuously reviewed, discussed, and updated the mind map. Additionally, we maintained a spreadsheet to record the key features of each paper (task, methods, improvements, dataset, results, etc.).

Table 1 and Table 2 list the numbers of papers retrieved and processed in each phase of the data collection.

1) Google Scholar: reviews 66 28 38 38
2) Google Scholar: related papers 143 54 89 23 104
3) Google Scholar: sub-topics 49 42 111
4) Web of Science 134 82 52 35 128
5) Processing stage 126 126 254
Papers identified by keyword-based automated search 128
Papers collected through references and automated recommendations 126
Inaccessible papers 3
Excluded papers 12
- Reviews and general papers 35
- Papers containing experiments (included in overview tables) 204
– Extrinsic PD 136
– Intrinsic PD 67
– Both extrinsic and intrinsic PD 1

Methodological Risks

The main risks for systematic literature reviews are incompleteness of the collected data and deficiencies in the selection, structure, and presentation of the content.

We addressed the risk of data incompleteness mainly by using two of the most comprehensive databases for academic literature—Google Scholar and Web of Science. To achieve the best possible coverage, we queried the two databases with keywords that we gradually refined in a multi-stage process, in which the results of each phase informed the next phase. By including all relevant references of papers that our keyword-based search had retrieved, we leveraged the knowledge of domain experts, i.e., the authors of research papers and literature reviews on the topic, to retrieve additional papers. We also included the content-based recommendations provided by the digital library systems of major publishers, such as Elsevier and ACM. We are confident that this multi-faceted and multi-stage approach to data collection yielded a set of papers that comprehensively reflects the state of the art in detecting academic plagiarism.

To mitigate the risk of subjectivity regarding the selection and presentation of content, we adhered to best practice guidelines for conducting systematic reviews and investigated the taxonomies and structure put forward in related reviews. We present the insights of the latter investigation in the following section.

RELATED LITERATURE REVIEWS

Table 3 lists related literature reviews in chronological order and categorized according to (i) the plagiarism detection (PD) tasks the review covers (PD for text documents, PD for source code, other PD tasks), (ii) whether the review includes descriptions or evaluations of productive plagiarism detection systems, and (iii) whether the review addresses policy issues related to plagiarism and academic integrity. All reviews are “narrative” according to the typology of Pare et al. [ 187 ]. Two of the reviews (References [ 61 ] and [ 48 ]) cover articles that appeared at venues included in Beall's List of Predatory Journals and Publishers .

Meuschke and Gipp [ ] YES NO NO YES NO
Chong [ ] YES NO NO NO NO
Eisa et al. [ ] YES NO YES NO NO
Agarwal and Sharma [ ] YES YES NO YES NO
Chowdhury et al. [ ] YES YES NO YES NO
Kanjirangat and Gupta [ ] YES YES NO YES NO
Velasquez et al. [ ] YES NO NO YES YES
Hourrane and Benlahmar [ ] YES NO NO NO NO

Our previous review article [ 160 ] surveyed the state of the art in detecting academic plagiarism, presented plagiarism detection systems, and summarized evaluations of their detection effectiveness. We outlined the limitations of text-based plagiarism detection methods and suggested that future research should focus on semantic analysis approaches that also include non-textual document features, such as academic citations.

The main contribution of Chong [ 47 ] is an extensive experimental evaluation of text preprocessing methods as well as shallow and deep NLP techniques. However, the paper also provides a sizable state-of-the-art review of plagiarism detection methods for text documents.

Eisa et al. [ 61 ] defined a clear methodology and meticulously followed it but did not include a temporal dimension. Their well-written review provides comprehensive descriptions and a useful taxonomy of features and methods for plagiarism detection. The authors concluded that future research should consider non-textual document features, such as equations, figures, and tables.

Agarwal and Sharma [ 8 ] focused on source code PD but also gave a basic overview of plagiarism detection methods for text documents. Technologically, source code PD and PD for text are closely related, and many plagiarism detection methods for text can also be applied for source code PD [ 57 ].

Chowdhury et al. [ 48 ] provided a comprehensive list of available plagiarism detection systems.

Kanjirangat and Gupta [ 251 ] summarized plagiarism detection methods for text documents that participated in the PAN competitions and compared four plagiarism detection systems.

Velasquez et al. [ 256 ] proposed a new plagiarism detection system but also provided an extensive literature review that includes a typology of plagiarism and an overview of six plagiarism detection systems.

Hourrane and Benlahmar [ 114 ] described individual research papers in detail but did not provide an abstraction of the presented detection methods.

The literature review at hand extends and improves the reviews outlined in Table 3 as follows:

  • We include significantly more papers than other reviews.
  • Our literature survey is the first that analyses research contributions during a specific period to provide insights on the most recent research trends.
  • Our review is the first that adheres to the guidelines for conducting systematic literature surveys.
  • We introduce a three-layered conceptual model to describe and analyze the phenomenon of academic plagiarism comprehensively.

OVERVIEW OF THE RESEARCH FIELD

The papers we retrieved during our research fall into three broad categories: plagiarism detection methods, plagiarism detection systems , and plagiarism policies . Ordering these categories by the level of abstraction at which they address the problem of academic plagiarism yields the three-layered model shown in Figure 1 . We propose this model to structure and systematically analyze the large and heterogeneous body of literature on academic plagiarism.

Fig. 1.

Layer 1: Plagiarism detection methods subsumes research that addresses the automated identification of potential plagiarism instances. Papers falling into this layer typically present methods that analyze textual similarity at the lexical, syntactic, and semantic levels, as well as similarity of non-textual content elements, such as citations, figures, tables, and mathematical formulae. To this layer, we also assign papers that address the evaluation of plagiarism detection methods, e.g., by providing test collections and reporting on performance comparisons. The research contributions in Layer 1 are the focus of this survey.

Layer 2: Plagiarism detection systems encompasses applied research papers that address production-ready plagiarism detection systems, as opposed to the research prototypes that are typically presented in papers assigned to Layer 1. Production-ready systems implement the detection methods included in Layer 1, visually present detection results to the users and should be able to identify duly quoted text. Turnitin LLC is the market leader for plagiarism detection services. The company's plagiarism detection system Turnitin is most frequently cited in papers included in Layer 2 [ 116 , 191 , 256 ].

Layer 3: Plagiarism policies subsumes papers that research the prevention, detection, prosecution, and punishment of plagiarism at educational institutions. Typical papers in Layer 3 investigate students’ and teachers’ attitudes toward plagiarism (e.g., Reference [ 75 ]), analyze the prevalence of plagiarism at institutions (e.g., Reference [ 50 ]), or discuss the impact of institutional policies (e.g., Reference [ 183 ]).

The three layers of the model are interdependent and essential to analyze the phenomenon of academic plagiarism comprehensively. Plagiarism detection systems (Layer 2) depend on reliable detection methods (Layer 1), which in turn would be of little practical value without production-ready systems that employ them. Using plagiarism detection systems in practice would be futile without the presence of a policy framework (Layer 3) that governs the investigation, documentation, prosecution, and punishment of plagiarism. The insights derived from analyzing the use of plagiarism detection systems in practice (Layer 3) also inform the research and development efforts for improving plagiarism detection methods (Layer 1) and plagiarism detection systems (Layer 2).

Continued research in all three layers is necessary to keep pace with the behavior changes that are a typical reaction of plagiarists when being confronted with an increased risk of discovery due to better detection technology and stricter policies. For example, improved plagiarism detection capabilities led to a rise in contract cheating, i.e., paying ghostwriters to produce original works that the cheaters submit as their own [ 177 ]. Many researchers agree that counteracting these developments requires approaches that integrate plagiarism detection technology with plagiarism policies.

Originally, we intended to survey the research in all three layers. However, the extent of the research fields is too large to cover all of them in one survey comprehensively. Therefore, the curr- ent article surveys plagiarism detection methods and systems. A future survey will cover the research on plagiarism policies.

DEFINITION AND TYPOLOGY OF PLAGIARISM

In accordance with Fishman, we define academic plagiarism as the use of ideas, content, or structures without appropriately acknowledging the source to benefit in a setting where originality is expected [ 279 ]. We used a nearly identical definition in our previous survey [ 160 ], because it describes the full breadth of the phenomenon. The definition includes all forms of intellectual contributions in academic documents regardless of their presentation, e.g., text, figures, tables, and mathematical formulae, and their origin. Other definitions of academic plagiarism often include the notion of theft (e.g., References [ 13 , 38 , 116 , 146 , 188 , 274 , 252 ]), i.e., require intent and limit the scope to reusing the content of others. Our definition also includes self-plagiarism, unintentional plagiarism, and plagiarism with the consent of the original author.

Review of Plagiarism Typologies

Aside from a definition, a typology helps to structure the research and facilitates communication on a phenomenon [ 29 , 261 ]. Researchers proposed a variety of typologies for academic plagiarism. Walker [ 263 ] coined a typology from a plagiarist's point of view, which is still recognized by contemporary literature [ 51 ]. Walker's typology distinguishes between:

  • Sham paraphrasing ( presenting copied text as a paraphrase by leaving out quotations )
  • Illicit paraphrasing
  • Other plagiarism ( plagiarizing with the original author's consent )
  • Verbatim copying ( without reference )
  • Recycling ( self-plagiarism )
  • Ghostwriting
  • Purloining ( copying another student's assignment without consent )

All typologies we encountered in our research categorize verbatim copying as one form of academic plagiarism. Alfikri and Ayu Purwarianti [ 13 ] additionally distinguished as separate forms of academic plagiarism the partial copying of smaller text segments, two forms of paraphrasing that differ regarding whether the sentence structure changes and translations. Velasquez et al. [ 256 ] distinguished verbatim copying and technical disguise, combined paraphrasing and translation into one form, and categorized the deliberate misuse of references as a separate form. Weber-Wulff [ 265 ] and Chowdhury and Bhattacharyya [ 48 ] likewise categorized referencing errors as a form of plagiarism. Many authors agreed on classifying idea plagiarism as a separate form of plagiarism [ 47 , 48 , 114 , 179 , 252 ]. Mozgovoy et al. [ 173 ] presented a typology that consolidates other classifications into five forms of academic plagiarism:

  • Verbatim copying
  • Hiding plagiarism instances by paraphrasing
  • Technical tricks exploiting weaknesses of current plagiarism detection systems
  • Deliberately inaccurate use of references
  • Tough plagiarism

“Tough plagiarism” subsumes the forms of plagiarism that are difficult to detect for both humans and computers, like idea plagiarism, structural plagiarism, and cross-language plagiarism [ 173 ].

The typology of Eisa et al. [ 61 ], which originated from a typology by Alzahrani et al. [ 21 ], distinguishes only two forms of plagiarism: literal plagiarism and intelligent plagiarism . Literal plagiarism encompasses near copies and modified copies, whereas intelligent plagiarism includes paraphrasing, summarization, translation, and idea plagiarism.

Our Typology of Plagiarism

Since we focus on reviewing plagiarism detection technology, we exclusively consider technical properties to derive a typology of academic plagiarism forms. From a technical perspective, several distinctions that are important from a policy perspective are irrelevant or at least less important. Technically irrelevant properties of plagiarism instances are whether:

  • the original author permitted to reuse content;
  • the suspicious document and its potential source have the same author(s), i.e., whether similarities in the documents’ content may constitute self-plagiarism.
  • how much of the content represents potential plagiarism;
  • whether a plagiarist uses one or multiple sources. Detecting compilation plagiarism (also referred to as shake-and-paste, patch-writing, remix, mosaic or mash-up) is impossible at the document level but requires an analysis on the level of paragraphs or sentences.

Both properties are of little technical importance, since similar methods are employed regardless of the extent of plagiarism and whether it may originate from one or multiple source documents.

Our typology of academic plagiarism derives from the generally accepted layers of natural language: lexis, syntax, and semantics. Ultimately, the goal of language is expressing ideas [ 96 ]. Therefore, we extend the classic three-layered language model to four layers and categorize plagiarism forms according to the language layer they affect. We order the resulting plagiarism forms increasingly by their level of obfuscation:

  • Literal plagiarism (copy and paste)
  • Possibly with mentioning the source
  • Technical disguise
  • Synonym substitution
  • Translation
  • Paraphrase (mosaic, clause quilts)
  • Structural plagiarism
  • Using concepts and ideas only

Characters-preserving plagiarism includes, aside from verbatim copying, plagiarism forms in which sources are mentioned, like “pawn sacrifice” and “cut and slide” [ 265 ]. Syntax-preserving plagiarism often results from employing simple substitution techniques, e.g., using regular expressions. Basic synonym substitution approaches operate in the same way; however, employing more sophisticated substitution methods has become typical. Semantics-preserving plagiarism refers to sophisticated forms of obfuscation that involve changing both the words and the sentence structure but preserve the meaning of passages. In agreement with Velasquez et al. [ 256 ], we consider translation plagiarism as a semantics-preserving form of plagiarism, since a translation can be seen as the ultimate paraphrase. In the section devoted to semantics-based plagiarism detection methods, we will also show a significant overlap in the methods for paraphrase detection and cross-language plagiarism detection. Idea-preserving plagiarism (also referred to as template plagiarism or boilerplate plagiarism) includes cases in which plagiarists use the concept or structure of a source and describe it entirely in their own words. This form of plagiarism is difficult to identify and even harder to prove. Ghostwriting [ 47 , 114 ] describes the hiring of a third party to write genuine text [ 50 , 263 ]. It is the only form of plagiarism that is undetectable by comparing a suspicious document to a likely source. Currently, the only technical option for discovering potential ghostwriting is to compare stylometric features of a possibly ghost-written document with documents certainly written by the alleged author.

PLAGIARISM DETECTION APPROACHES

Conceptually, the task of detecting plagiarism in academic documents consists of locating the parts of a document that exhibit indicators of potential plagiarism and subsequently substantiating the suspicion through more in-depth analysis steps [ 218 ]. From a technical perspective, the literature distinguishes the following two general approaches to plagiarism detection.

The extrinsic plagiarism detection approach compares suspicious documents to a collection of documents assumed to be genuine (reference collection) and retrieves all documents that exhibit similarities above a threshold as potential sources [ 252 , 235 ].

The intrinsic plagiarism detection approach exclusively analyzes the input document, i.e., does not perform comparisons to documents in a reference collection. Intrinsic detection methods employ a process known as stylometry to examine linguistic features of a text [ 90 ]. The goal is to identify changes in writing style, which the approach considers as indicators for potential plagiarism [ 277 ]. Passages with linguistic differences can become the input for an extrinsic plagiarism analysis or be presented to human reviewers. Hereafter, we describe the extrinsic and intrinsic approaches to plagiarism detection in more detail.

Extrinsic Plagiarism Detection

The reference collection to which extrinsic plagiarism detection approaches compare the suspicious document is typically very large, e.g., a significant subset of the Internet for production-ready plagiarism detection systems. Therefore, pairwise comparisons of the input document to all documents in the reference collection are often computationally infeasible. To address this challenge, most extrinsic plagiarism detection approaches consist of two stages: candidate retrieval (also called source retrieval) and detailed analysis (also referred to as text alignment) [ 197 ]. The candidate retrieval stage efficiently limits the collection to a subset of potential source documents. The detailed analysis stage then performs elaborate pairwise document comparisons to identify parts of the source documents that are similar to parts of the suspicious document.

Candidate Retrieval.  Given a suspicious input document and a querying tool, e.g., a search engine or database interface, the task in the candidate retrieval stage is to retrieve from the reference collection all documents that share content with the input document [ 198 ]. Many plagiarism detection systems use the APIs of Web search engines instead of maintaining own reference collections and querying tools.

Recall is the most important performance metric for the candidate retrieval stage of the extrinsic plagiarism detection process, since the subsequent detailed analysis cannot identify source documents missed in the first stage [ 105 ]. The number of queries issued is another typical metric to quantify the performance in the candidate retrieval stage. Keeping the number of queries low is particularly important if the candidate retrieval approach involves Web search engines, since such engines typically charge for issuing queries.

Detailed Analysis.  The set of documents retrieved in the candidate retrieval stage is the input to the detailed analysis stage. Formally, the task in the detailed analysis stage is defined as follows. Let d q be a suspicious document. Let $D = \lbrace {{d_s}} \rbrace\;|\;s = 1 \ldots n$ be a set of potential source documents. Determine whether a fragment ${s_q} \in {d_q}$ is similar to a fragment $s \in {d_s}$ ( ${d_s} \in D$ ) and identify all such pairs of fragments $( {{s_q},\;s} )$ [ 202 ]. Eventually, an expert should determine whether the identified pairs $( {{s_q},\;s} )$ constitute legitimate content re-use, plagiarism, or false positives [ 29 ]. The detailed analysis typically consists of three steps [ 197 ]:

  • Seeding : Finding parts of the content in the input document (the seed) within a document of the reference collection
  • Extension : Extending each seed as far as possible to find the complete passage that may have been reused
  • Filtering : Excluding fragments that do not meet predefined criteria (e.g., that are too short), and handling of overlapping passages

The most common strategy for the extension step is the so-called rule-based approach. The approach merges seeds if they occur next to each other in both the suspicious and the source document and if the size of the gap between the passages is below a threshold [ 198 ].

Paraphrase Identification is often a separate step within the detailed analysis stages of extrinsic plagiarism detection methods but also a research field on its own. The task in paraphrase identification is determining semantically equivalent sentences in a set of sentences [ 71 ]. SemEval is a well-known conference series that addresses paraphrase identification for tweets [ 9 , 222 ]. Identifying semantically equivalent tweets is more difficult than identifying semantically equivalent sentences in academic documents due to out-of-vocabulary words, abbreviations, and slang terms that are frequent in tweets [ 24 ]. Al-Samadi et al. [ 9 ] provided a thorough review of the research on paraphrase identification.

Intrinsic Plagiarism Detection

The concept of intrinsic plagiarism detection was introduced by Meyer zu Eissen and Stein [ 277 ]. Whereas extrinsic plagiarism detection methods search for similarities across documents, intrinsic plagiarism detection methods search for dissimilarities within a document. A crucial presumption of the intrinsic approach is that authors have different writing styles that allow identifying the authors. Juola provides a comprehensive overview of stylometric methods to analyze and quantify writing style [ 127 ].

Intrinsic plagiarism detection consists of two tasks [ 200 , 233 ]:

  • Style breach detection : Delineating passages with different writing styles
  • Author identification : Identifying the author of documents or passages

Author identification furthermore subsumes two specialized tasks:

  • Author clustering : Grouping documents or passages by authorship
  • Author verification : Deciding whether an input document was authored

by the same person as a set of sample documents

Style Breach Detection.  Given a suspicious document, the goal of style-breach detection is identifying passages that exhibit different stylometric characteristics [ 233 ].

Most of the algorithms for style breach detection follow a three-step process [ 214 ]:

  • Text segmentation based on paragraphs, (overlapping) sentences, character or word n-grams
  • Feature space mapping , i.e., computing stylometric measures for segments
  • Clustering segments according to observed critical values

Author Clustering typically follows the style breach detection stage and employs pairwise comparisons of passages identified in the previous stage to group them by author [ 247 ]. For each pair of passages, a similarity measure is computed that considers the results of the feature space mapping in the style-breach detection stage. Formally, for a given set of documents or passages D , the task is to find the decomposition of this set ${D_1},\;{D_2},\ldots Dn$ , such that:

  • $D = {\rm{U}}_{i = 1}^n{D_i}$
  • ${D_i} \cap {D_j} = \emptyset $ for each $i \ne j$
  • All documents of the same class have the same author;

For each pair of documents from different classes, the authors are different.

Author Verification is typically defined as the prediction of whether two pieces of text were written by the same person. In practice, author verification is a one-class classification problem [ 234 ] that assumes all documents in a set have the same author. By comparing the writing style at the document level, outliers can be detected that may represent plagiarized documents. This method can reveal ghostwriting [ 127 ], unless the same ghost-writer authored all documents in the set.

Author Identification (also referred to as author classification), takes multiple document sets as input. Each set of documents must have been written verifiably by a single author. The task is assigning documents with unclear authorship to the stylistically most similar document set. Each authorship identification problem, for which the set of candidate authors is known, is easily transformable into multiple authorship verification problems [ 128 ]. An open-set variant of the author identification problem allows for a suspicious document with an author that is not included in any of the input sets [ 234 ].

Several other stylometry-based tasks, e.g., author profiling, exist. However, we limit the descriptions in the next section to methods whose main application is plagiarism detection. We recommend readers interested in related tasks to refer to the overview paper of PAN’17 [ 200 ].

PLAGIARISM DETECTION METHODS

We categorize plagiarism detection methods and structure their description according to our typology of plagiarism. Lexical detection methods exclusively consider the characters in a document. Syntax-based detection methods consider the sentence structure, i.e., the parts of speech and their relationships. Semantics-based detection methods compare the meaning of sentences, paragraphs, or documents. Idea-based detection methods go beyond the analysis of text in a document by considering non-textual content elements like citations, images, and mathematical content. Before presenting details on each class of detection methods, we describe preprocessing strategies that are relevant for all classes of detection methods.

Preprocessing

The initial preprocessing steps applied as part of plagiarism detection methods typically include document format conversions and information extraction. Before 2013, researchers described the extraction of text from binary document formats like PDF and DOC as well as from structured document formats like HTML and DOCX in more details than in more recent years (e.g., Refer- ence [ 49 ]). Most research papers on text-based plagiarism detection methods we review in this article do not describe any format conversion or text extraction procedures. We attribute this development to the technical maturity of text extraction approaches. For plagiarism detection approaches that analyze non-textual content elements, e.g., academic citations and references [ 90 , 91 , 161 , 191 ], images [ 162 ], and mathematical content [ 163 , 165 ], document format conversion, and information extraction still present significant challenges.

Specific preprocessing operations heavily depend on the chosen approach. The aim is to remove noise while keeping the information required for the analysis. For text-based detection methods, typical preprocessing steps include lowercasing, punctuation removal, tokenization, segmentation, number removal or number replacement, named entity recognition, stop words removal, stemming or lemmatization, Part of Speech (PoS) tagging, and synset extension. Approaches employing synset extension typically employ thesauri like WordNet [ 69 ] to assign the identifier of the class of synonymous words to which a word in the text belongs. The synonymous words can then be considered for similarity calculation. Detection methods operating on the lexical level usually perform chunking as a preprocessing step. Chunking groups text elements into sets of given lengths, e.g., word n-grams, line chunks, or phrasal constituents in a sentence [ 47 ].

Some detection approaches, especially in intrinsic plagiarism detection, limit preprocessing to a minimum to not loose potentially useful information [ 9 , 67 ]. For example, intrinsic detection methods typically do not remove punctuation.

All preprocessing steps we described represent standard procedures in Natural Language Processing (NLP), hence well-established, publicly available software libraries support these steps. The research papers we reviewed predominantly used the multilingual and multifunctional text processing pipelines Natural Language Toolkit Kit (Python) or Stanford CoreNLP library (Java). Commonly applied syntax analysis tools include Penn Treebank, 3 Citar, 4 TreeTagger, 5 and Stanford parser. 6 Several papers present resources for Arabic [ 33 , 34 , 227 ] and Urdu [ 54 ] language processing.

Lexical Detection Methods

Lexical detection methods exclusively consider the characters in a text for similarity computation. The methods are best suited for identifying copy-and-paste plagiarism that exhibits little to no obfuscation. To detect obfuscated plagiarism, the lexical detection methods must be combined with more sophisticated NLP approaches [ 9 , 67 ]. Lexical detection methods are also well-suited to identify homoglyph substitutions, which are a common form of technical disguise. The only paper in our collection that addressed the identification of technically disguised plagiarism is Refer- ence [ 19 ]. The authors used a list of confusable Unicode characters and applied approximate word n-gram matching using the normalized Hamming distance.

Lexical detection approaches typically fall into one of the three categories we describe in the following: n-gram comparisons, vector space models, and querying search engines .

N-gram Comparisons.  Comparing n-grams refers to determining the similarity of sequences of $n$ consecutive entities, which are typically characters or words and less frequently phrases or sentences. n-gram comparisons are widely applied for candidate retrieval or the seeding phase of the detailed analysis stage in extrinsic monolingual and cross-language detection approaches as well as in intrinsic detection.

Approaches using n-gram comparisons first split a document into (possibly overlapping) n-grams, which they use to create a set-based representation of the document or passage (“fingerprint”). To enable efficient retrieval, most approaches store fingerprints in index data structures. To speed up the comparison of individual fingerprints, some approaches hash or compress the n-grams that form the fingerprints. Hashing or compression reduces the lengths of the strings under comparison and allows performing computationally more efficient numerical comparisons. However, hashing introduces the risk of false positives due to hash collisions. Therefore, hashed or compressed fingerprinting is more commonly applied for the candidate retrieval stage, in which achieving high recall is more important than achieving high precision.

Fingerprinting is the most popular method for assessing local lexical similarity [ 104 ]. However, recent research has focused increasingly on detecting obfuscated plagiarism. Thus n-gram fingerprinting is often restricted to the preprocessing stage [ 20 ] or used as a feature for machine learning [ 7 ]. Character n-gram comparisons can be applied to cross-language plagiarism detection (CLPD) if the languages in question exhibit a high lexical similarity, e.g., English and Spanish [ 79 ].

Table 4 presents papers employing word n-grams; Table 5 lists papers using character n-grams, and Table 6 shows papers that employ hashing or compression for n-gram fingerprinting.

Extrinsic Document-level detection Stop words removed [ , , , , , , ]
Stop word n-grams [ ]
Candidate retrieval Stop words removed [ ]
All word n-grams and stop word n-grams [ ]
Detailed analysis All word n-grams [ , , ]
Stop words removed [ , ]
All word n-grams, stop word n-grams, and named entity n-grams [ ]
Numerous n-gram variations [ , ]
Context n-grams [ , ]
Paraphrase identification All word n-grams [ , ]
Combination with ESA [ ]
CLPD Stop words removed [ ]
Intrinsic Author identification Overlap in LZW dictionary [ ]
Author verification Word n-grams [ , , , , , , ]
Stop word n-grams [ , , ]
Extrinsic Document-level detection Pure character n-grams [ , ]
Overlap in LZW dictionary [ ]
Machine learning [ ]
Combined with Bloom filters [ ]
Detailed analysis Hashed character n-grams [ ]
Paraphrase identification Feature for machine learning [ ]
Cross-language PD Cross-language CNG [ , , , ]
Intrinsic Style-breach detection CNG as stylometric features [ , ],
Author identification Bit n-grams [ ]
Author verification CNG as stylometric features [ , , , , ], [ , , , , , , , , , , , ]
Author clustering CNG as stylometric features [ , , , , ]
Document-level detection Hashing [ , , , ]
Candidate retrieval Hashing [ , , , ]
Detailed analysis Hashing [ , , , ]
Document-level detection Compression [ ]
Author identification Compression [ , , , ]

Vector Space Models (VSM) are a classic retrieval approach that represents texts as high-dimensional vectors [ 249 ]. In plagiarism detection, words or word n-grams typically form the dimensions of the vector space and the components of a vector undergo term frequency–inverse document frequency (tf-idf) weighting [ 249 ]. Idf values are either derived from the suspicious document or the corpus [ 205 , 238 ]. The similarity of vector representations—typically quantified using the cosine measure, i.e., the angle between the vectors—is used as a proxy for the similarity of the documents the vectors represent.

Most approaches employ predefined similarity thresholds to retrieve documents or passages for subsequent processing. Kanjirangat and Gupta [ 249 ] and Ravi et al. [ 208 ] follow a different approach. They divide the set of source documents into K clusters by first selecting K centroids and then assigning each document to the group whose centroid is most similar. The suspicious document is used as one of the centroids and the corresponding cluster is passed on to the subsequent processing stages.

VSM remain popular and well-performing approaches not only for detecting copy-and-paste plagiarism but also for identifying obfuscated plagiarism as part of a semantic analysis. VSM are also frequently applied in intrinsic plagiarism detection. A typical approach is to represent sentences as vectors of stylometric features to find outliers or to group stylistically similar sentences.

Table 7 presents papers that employ VSM for extrinsic plagiarism detection; Table 8 lists papers using VSM for intrinsic plagiarism detection.

Document-level detection sentence Combination of similarity metrics [ ]
Document-level detection sentence VSM as a bitmap; compressed for comparison [ ]
Document-level detection sentence Machine learning to set similarity thresholds [ ]
Document-level detection word Synonym replacement [ ]
Document-level detection word, sentence Fuzzy set of WordNet synonyms [ ]
Candidate retrieval word Vectors of word N-grams [ , , ],
Candidate retrieval word K-means clustering of vectors to find documents most similar to the input doc. [ , ]
Candidate retrieval word Z-order mapping of multidimensional vectors to scalar and subsequent filtering [ ]
Candidate retrieval word Topic-based segmentation; Re-ranking of results based on the proximity of terms [ ]
Detailed analysis sentence Pure VSM [ , , , ]
Detailed analysis sentence Adaptive adjustment of parameters to detect the type of obfuscation [ , ]
Detailed analysis sentence Hybrid similarity (Cosine+ Jaccard) [ ]
Detailed analysis word Pure VSM [ ]
Paraphrase identification sentence Semantic role annotation [ ]
Style-breach detection word Word frequencies [ ]
Style-breach detection word Vectors of lexical and syntactic features [ , ]
Style-breach detection sentence Vectors of word embeddings [ ]
Style-breach detection sentence Vectors of lexical features [ ]
Style-breach detection sliding window Vectors of lexical features [ ]
Author clustering document Vectors of lexical features [ , , , , ]
Author clustering document Word frequencies [ ]
Author clustering document Word embeddings [ ]
Author verification document Word frequencies [ ]
Author verification document Vectors of lexical features [ , , , ]
Author verification document Vectors of lexical and syntactic features [ , , , ]
Author verification document Vectors of syntactic features [ ]

Querying Web Search Engines.  Many detection methods employ Web search engines for candidate retrieval, i.e., for finding potential source documents in the initial stage of the detection process. The strategy for selecting the query terms from the suspicious document is crucial for the success of this approach. Table 9 gives an overview of the strategies for query term selection employed by papers in our collection.

Querying the words with the highest tf-idf value [ , , , , , ]
Querying the least frequent words [ , ]
Querying the least frequent strings [ ]
Querying the words with the highest tf-idf value as well as noun phrases [ , , ]
Querying the nouns and most frequent words [ ]
Querying the nouns and verbs [ ]
Querying the nouns, verbs, and adjectives [ , , , ]
Querying the nouns, facts (dates, names, etc.) as well as the most frequent words [ ]
Querying keywords and the longest sentence in a paragraph [ , ]
Comparing different querying heuristics [ ]
Incrementing passage length and passage selection heuristics [ ]
Query expansion by words from UMLS Meta-thesaurus [ ]

Intrinsic detection approaches can employ Web Search engines to realize the G eneral Impostors Method . This method transforms the one-class verification problem regarding an author's writing style into a two-class classification problem. The method extracts keywords from the suspicious document to retrieve a set of topically related documents from external sources, the so-called “impostors.” The method then quantifies the “average” writing style observable in impostor documents, i.e., the distribution of stylistic features to be expected. Subsequently, the method compares the stylometric features of passages from the suspicious document to the features of the “average” writing style in impostor documents. This way, the method distinguishes the stylistic features that are characteristic of an author from the features that are specific to the topic [ 135 ]. Koppel and Winter present the method in detail [ 146 ]. Detection approaches implementing the general impostors method achieved excellent results in the PAN competitions, e.g., winning the competition in 2013 and 2014 [ 128 , 232 ]. Table 10 presents papers using this method.

Author verification [ , , , , ]

Syntax-based Methods

Syntax-based detection methods typically operate on the sentence level and employ PoS tagging to determine the syntactic structure of sentences [ 99 , 245 ]. The syntactic information helps to address morphological ambiguity during the lemmatization or stemming step of preprocessing [ 117 ], or to reduce the workload of a subsequent semantic analysis, typically by exclusively comparing the pairs of words belonging to the same PoS class [ 102 ]. Many intrinsic detection methods use the frequency of PoS tags as a stylometric feature.

The method of Tschuggnall and Specht [ 245 ] relies solely on the syntactic structure of sentences. Table 11 presents an overview of papers using syntax-based methods.

Extrinsic PoS tagging Addressing morphological ambiguity [ , ]
Word comparisons within the same PoS class only [ , ]
Combined with stop-words [ ]
Comparing PoS sequences [ ]
Combination with PPM compression [ ]
Intrinsic PoS tags as stylometric features PoS frequency [ , , , , ]
PoS n-gram frequency [ , , , , , , , ]
PoS frequency, PoS n-gram frequency, starting PoS tag [ ]
Comparing syntactic trees Direct comparison [ , ]
Integrated syntactic graphs [ ]

Semantics-based Methods

Papers presenting semantics-based detection methods are the largest group in our collection. This finding reflects the importance of detecting obfuscated forms of academic plagiarism, for which semantics-based detection methods are the most promising approach [ 216 ]. Semantics-based methods operate on the hypothesis that the semantic similarity of two passages depends on the occurrence of similar semantic units in these passages. The semantic similarity of two units derives from their occurrence in similar contexts.

Many semantics-based methods use thesauri (e.g., WordNet or EuroVoc 7 ). Including semantic features, like synonyms, hypernyms, and hyponyms, in the analysis improves the performance of paraphrase identification [ 9 ]. Using a canonical synonym for each word helps detecting synonym-replacement obfuscation and reduces the vector space dimension [ 206 ]. Sentence segmentation and text tokenization are crucial parameters for all semantics-based detection methods. Tokenization extracts the atomic units of the analysis, which are typically either words or phrases. Most papers in our collection use words as tokens.

Employing established semantic text analysis methods like Latent Semantic Analysis (LSA), Explicit Semantic Analysis (ESA), and word embeddings for extrinsic plagiarism detection is a popular and successful approach. This group of methods follows the idea of “distributional semantics,” i.e., terms co-occurring in similar contexts tend to convey a similar meaning. In the reverse conclusion, distributional semantics assumes that similar distributions of terms indicate semantically similar texts. The methods differ in the scope within which they consider co-occurring terms. Word embeddings consider only the immediately surrounding terms, LSA analyzes the entire document and ESA uses an external corpus.

Latent Semantic Analysis is a technique to reveal and compare the underlying semantic structure of texts [ 55 ]. To determine the similarity of term distributions in texts, LSA computes a matrix, in which rows represent terms, columns represent documents and the entries of the matrix typically represent log-weighted tf-idf values [ 46 ]. LSA then employs Singular Value Decomposition (SVD) or similar dimensionality reduction techniques to find a lower-rank approximation of the term-document matrix by reducing the number of rows (i.e., pruning less relevant terms) while maintaining the similarity distribution between columns (i.e., the text representations). The terms remaining after the dimensionality reduction are assumed to be most representative of the semantic meaning of the text. Hence, comparing the rank-reduced matrix-representations of texts allows computing the semantic similarity of the texts [ 46 ].

LSA can reveal similarities between texts that traditional vector space models cannot express [ 116 ]. The ability of LSA to address synonymy is beneficial for paraphrase identification. For example, Satyapanich et al. [ 222 ] considered two sentences as paraphrases if their LSA similarity is above a threshold. While LSA performs well in addressing synonymy, its ability to reflect polysemy is limited [ 55 ].

Ceska [ 46 ] first applied LSA for plagiarism detection. AlSallal et al. [ 15 ] proposed a novel weighting approach that assigns higher weights to the most common terms and used LSA as a stylometric feature for intrinsic plagiarism detection. Aldarmaki and Diab [ 11 ] used weighted matrix factorization—a method similar to LSA—for cross-language paraphrase identification. Table 12 lists other papers employing LSA for extrinsic and intrinsic plagiarism detection.

Extrinsic Document-level detection LSA with phrase tf-idf [ , ]
LSA in combination with other methods [ ]
Candidate retrieval LSA only [ ]
Paraphrase identification LSA only [ ]
LSA with machine learning [ , , , ]
Weighted matrix factorization [ ]
Intrinsic Document-level detection LSA with stylometric features [ ]
Author identification LSA with machine learning [ , ]
LSA at CNG level [ ]

Explicit Semantic Analysis is an approach to model the semantics of a text in a high-dimensional vector space of semantic concepts [ 82 ]. Semantic concepts are the topics in a man-made knowledge base corpus (typically Wikipedia or other encyclopedias). Each article in the knowledge base is an explicit description of the semantic content of the concept, i.e., the topic of the article [ 163 ]. ESA builds a “semantic interpreter” that allows representing texts as concept vectors whose components reflect the relevance of the text for each of the semantic concepts, i.e., knowledge base articles [ 82 ]. Applying vector similarity measures, such as the cosine metric, to the concept vectors then allows determining the texts’ semantic similarity.

Table 13 shows detection methods that employed ESA depending on the corpus used to build the semantic interpreter. Constructing the semantic interpreter from multilingual corpora, such as Wikipedia, allows the application of ESA for cross-language plagiarism detection [ 78 ]. ESA has several applications beyond PD, e.g., when applied for document classification, ESA achieved a precision above 95% [ 124 , 174 ].

Wikipedia (monolingual) [ , , ]
Wikipedia (cross-language) [ , ]
Wikipedia + FanFiction [ ]

The Information Retrieval-based semantic similarity approach proposed by Itoh [ 120 ] is a generalization of ESA. The method models a text passage as a set of words and employs a Web search engine to obtain a set of relevant documents for each word in the set. The method then computes the semantic similarity of the text passages as the similarity of the document sets obtained, typically using the Jaccard metric. Table 14 presents papers that also follow this approach.

Articles from Wikipedia [ ]
Synonyms from Farsnet [ ]

Word embeddings is another semantic analysis approach that is conceptually related to ESA. While ESA considers term occurrences in each document of the corpus, word embeddings exclusively analyze the words that surround the term in question. The idea is that terms appearing in proximity to a given term are more characteristic of the semantic concept represented by the term in question than more distant words. Therefore, terms that frequently co-occur in proximity within texts should also appear closer within the vector space [ 73 ]. In cross-language plagiarism detection, word embeddings outperformed other methods when syntactic weighting was employed [ 73 ]. Table 15 summarizes papers that employ word embeddings.

Extrinsic Candidate retrieval [ ]
Cross-language PD [ ]
Intrinsic Paraphrase identification [ , , , , ]
Style-breach detection [ ]
Author clustering [ ]

Word Alignment is a semantic analysis approach widely used for machine translation [ 240 ] and paraphrase identification. Words are aligned, i.e., marked as related, if they are semantically similar. Semantic similarity of two words is typically retrieved from an external database, like WordNet. The semantic similarity of two sentences is then computed as the proportion of aligned words. Word alignment approaches achieved the best performance for the paraphrase identification task at SemEval 2014 [ 240 ] and were among the top-performing approaches at SemEval-2015 [ 9 , 242 ].

Cross-language alignment-based similarity analysis (CL-ASA) is a variation of the word alignment approach for cross-language semantic analysis. The approach uses a parallel corpus to compute the similarity that a word $x$ in the suspicious document is a valid translation of the term $y$ in a potential source document for all terms in the suspicious and the source documents. The sum of the translation probabilities yields the probability that the suspicious document is a translation of the source document [ 28 ]. Table 16 presents papers using Word alignment and CL-ASA.

Word alignment only [ , ]
Word alignment-based modification of Jaccard and Levenshtein measure [ ]
Word alignment in combination with machine learning [ , , ]
CL-ASA [ , ]
Translation + word alignment [ ]

Graph-based Semantic Analysis. Knowledge graph analysis (KGA) represents a text as a weighted directed graph, in which the nodes represent the semantic concepts expressed by the words in the text and the edges represent the relations between these concepts [ 79 ]. The relations are typically obtained from publicly available corpora, such as BabelNet 8 or WordNet. Determining the edge weights is the major challenge in KGA. Traditionally, edge weights were computed from analyzing the relations between concepts in WordNet [ 79 ]. Salvador et al. [ 79 ] improved the weighting procedure by using continuous skip-grams that additionally consider the context in which the concepts appear. Applying graph similarity measures yields a semantic similarity score for documents or parts thereof (typically sentences).

Inherent characteristics of KGA like word sense disambiguation, vocabulary expansion, and language independence are highly beneficial for plagiarism detection. Thanks to these characteristics, KGA is resistant to synonym replacements and syntactic changes. Using multilingual corpora allows the application of KGA for cross-language PD [ 79 ]. KGA achieves high detection effectiveness if the text is translated literally; for paraphrased translations, the results are worse [ 77 ].

The universal networking language approach proposed by Avishek and Bhattacharyyan [ 53 ] is conceptually similar to KGA. The method constructs a dependency graph for each sentence and then compares the lexical, syntactic, and semantic similarity separately. Kumar [ 147 ] used semantic graphs for the seeding phase of the detailed analysis stage. In those graphs, the nodes corresponded to all words in a document or passage. The edges represented the adjacency of the words. The edge weights expressed the semantic similarity of words based on the probability that the words occur in a 100-word window within a corpus of DBpedia 9 articles. Overlapping passages in two documents were identified using the minimum weight bipartite clique cover.

Table 17 presents detection methods that employ graph-based semantic analysis.

Document-level detection Knowledge graph analysis [ ]
Detailed analysis Semantic graphs [ ]
Detailed analysis Word n-gram graphs for sentences [ ]
Paraphrase identification Knowledge graph analysis [ ]
Paraphrase identification Universal networking language [ ]
Cross-language plagiarism detection Knowledge graph analysis [ , , ]

Semantic Role Labeling (SRL) determines the semantic roles of terms in a sentence, e.g., the subject, object, events, and relations between these entities, based on roles defined in linguistic resources, such as PropBank 10 or VerbNet. 11 The goal is to extract “who” did “what” to “whom” “where” and “when” [ 188 ]. The first step in SRL is PoS tagging and syntax analysis to obtain the dependency tree of a sentence. Subsequently, the semantic annotation is performed [ 71 ].

Paul and Jamal [ 188 ] used SRL in combination with sentence ranking for document-level plagiarism detection. Hamza and Salim [ 182 ] employed SRL to extract arguments from sentences, which they used to quantify and compare the syntactic and semantic similarity of the sentences. Ferreira et al. [ 71 ] obtained the similarity of sentences by combining various features and measures using machine learning. Table 18 lists detection approaches that employ SRL.

Document-level detection [ , ]
Paraphrase identification [ ]
Monolingual plagiarism detection Citation-based PD [ , , , , ]
Math-based PD [ , ]
Image-based PD [ ]
Cross-lingual plagiarism detection CbPD [ ]

Idea-based Methods

Idea-based methods analyze non-textual content elements to identify obfuscated forms of academic plagiarism. The goal is to complement detection methods that analyze the lexical, syntactic, and semantic similarity of text to identify plagiarism instances that are hard to detect both for humans and for machines. Table 19 lists papers that proposed idea-based detection methods.

Citation-based plagiarism detection (CbPD) proposed by Gipp et al. [ 91 ] analyses patterns of in-text citations in academic documents, i.e., identical citations occurring in proximity or in a similar order within two documents. The idea is that in-text citations encode semantic information language-independently. Thus, analyzing in-text citation patterns can indicate shared structural and semantic similarity among texts. Assessing semantic and structural similarity using citation patterns requires significantly less computational effort than approaches for semantic and syntactic text analysis [ 90 ]. Therefore, CbPD is applicable for the candidate retrieval and the detailed analysis stage [ 161 ] of monolingual [ 90 , 93 ] and cross-lingual [ 92 ] detection methods. For weakly obfuscated instances of plagiarism, CbPD achieved comparable results as lexical detection methods; for paraphrased and idea plagiarism, CbPD outperformed lexical detection methods in the experiments of Gipp et al. [ 90 , 93 ]. Moreover, the visualization of citation patterns was found to facilitate the inspection of the detection results by humans, especially for cases of structural and idea plagiarism [ 90 , 93 ]. Pertile et al. [ 191 ] confirmed the positive effect of combining citation and text analysis on the detection effectiveness and devised a hybrid approach using machine learning. CbPD can also alert a user when the in-text citations are inconsistent with the list of references. Such inconsistency may be caused by mistake, or deliberately to obfuscate plagiarism.

Meuschke et al. [ 163 ] proposed mathematics-based plagiarism detection (MathPD) as an extension of CbPD for documents in the Science, Technology, Engineering and Mathematics (STEM) fields. Mathematical expressions share many properties of academic citations, e.g., they are essential components of academic STEM documents, are language-independent, and contain rich semantic information. Furthermore, some disciplines, such as mathematics and physics, use academic citations sparsely [ 167 ]. Therefore, a citation-based analysis alone is less likely to reveal suspicious content similarity for these disciplines [ 163 ], [ 165 ]. Meuschke et al. showed that an exclusive math-based similarity analysis performed well for detecting confirmed cases of academic plagiarism in STEM documents [ 163 ]. Combining a math-based and a citation-based analysis further improved the detection performance for confirmed cases of plagiarism [ 165 ].

Image-based plagiarism detection analyze graphical content elements. While a large variety of methods to retrieve similar images have been proposed [ 56 ], few studies investigated the application of content-based image retrieval approaches for academic plagiarism detection. Meuschke et al. [ 162 ] is the only such study we encountered during our data collection. The authors proposed a detection approach that integrates established image retrieval methods with novel similarity assessments for images that are tailored to plagiarism detection. The approach has been shown to retrieve both copied and altered figures.

Ensembles of Detection Methods

Each class of detection methods has characteristic strengths and weaknesses. Many authors showed that combining detection methods achieves better results than applying the methods individually [ 7 , 62 , 78 , 128 , 133 , 234 , 242 , 273 , 275 ]. By assembling the best-performing detection methods in PAN 2014, the organizers of the workshop created a meta-system that performed best overall [ 232 ].

In intrinsic plagiarism detection, combining feature analysis methods is a standard approach [ 233 ], since an author's writing style always comprises of a multitude of stylometric features [ 127 ]. Many recent author verification methods employ machine learning to select the best performing feature combination [ 234 ].

In general, there are three ways of combining plagiarism detection methods:

  • Using adaptive algorithms that determine the obfuscation strategy, choose the detection method, and set similarity thresholds accordingly
  • Using an ensemble of detection methods whose results are combined using static weights
  • Using machine learning to determine the best-performing combination of detection methods

The winning approach at PAN 2014 and 2015 [ 216 ] used an adaptive algorithm . After finding the seeds of overlapping passages, the authors extended the seeds using two different thresholds for the maximum gap. Based on the length of the passages, the algorithm automatically recognized different plagiarism forms and set the parameters for the VSM-based detection method accordingly.

The “ linguistic knowledge approach ” proposed by Abdi et al. [ 2 ] exemplifies an ensemble of detection methods . The method combines the analysis of syntactic and semantic sentence similarity using a linear combination of two similarity metrics: (i) the cosine similarity of semantic vectors and (ii) the similarity of syntactic word order vectors [ 2 ]. The authors showed that the method outperformed other contesters on the PAN-10 and PAN-11 corpora. Table 20 lists other ensembles of detection methods.

Document-level detection Linguistic knowledge [ ]
Candidate retrieval Querying a Web search engine Combination of querying heuristics [ ]
Detailed analysis Vector space model Adaptive algorithm [ , , ]

Machine Learning approaches for plagiarism detection typically train a classification model that combines a given set of features. The trained model can then be used to classify other datasets. Support vector machine (SVM) is the most popular model type for plagiarism detection tasks. SVM uses statistical learning to minimize the distance between a hyperplane and the training data. Choosing the hyperplane is the main challenge for correct data classification [ 66 ].

Machine-learning approaches are very successful in intrinsic plagiarism detection. Supervised machine-learning methods, specifically random forests, were the best-performing approach at the intrinsic detection task of the PAN 2015 competition [ 233 ]. The best-known method for author verification is unmasking [ 232 ], which uses an SVM classifier to distinguish the stylistic features of the suspicious document from a set of documents for which the author is known. The idea of unmasking is to train and run the classifier and then remove the most significant features of the classification model and rerun the classification. If the classification accuracy drops significantly, then the suspicious and known documents are likely from the same author; otherwise, they are likely written by different authors [ 232 ]. There is no consensus on the stylometric features that are most suitable for authorship identification [ 158 ]. Table 21 gives an overview of intrinsic detection methods that employ machine-learning techniques.

Style-breach detection Gradient Boosting Regression Trees Lexical, syntax [ ]
Author identification SVM Semantic (LSA) [ ]
Author clustering Recurrent ANN Lexical [ ],
SVM Lexical, syntax [ ]
Author verification Recurrent ANN Lexical [ ]
k-nearest neighbor Lexical [ ]
Lexical, syntax [ ]
Homotopy-based classification Lexical [ ]
Naïve Bayes Lexical [ ]
SVM Lexical, syntax [ , , , , ]
Equal error rate Lexical [ ]
Decision Tree Lexical [ ]
Random Forest Lexical, syntax [ , , ]
Genetic algorithm Lexical, syntax [ , ]
Multilayer perceptron Lexical, semantic (LSA) [ ]
Many Lexical [ , ]
Lexical, syntax [ ]

For extrinsic plagiarism detection, the application of machine learning has been studied for various components of the detection process [ 208 ]. Gharaviet al. [ 88 ] used machine learning to determine the suspiciousness thresholds for a vector space model. Zarrella et al. [ 273 ] won the SemEval competition in 2015 with their ensemble of seven algorithms; most of them used machine learning. While Hussain and Suryani [ 116 ] successfully used an SVM classifier for the candidate retrieval stage [ 269 ], Williams et al. compared many supervised machine-learning methods and concluded that applying them for classifying and ranking Web search engine results did not improve candidate retrieval. Kanjirangat and Gupta [ 252 ] used a genetic algorithm to detect idea plagiarism. The method randomly chooses a set of sentences as chromosomes. The sentence sets that are most descriptive of the entire document are combined and form the next generation. In this way, the method gradually extracts the sentences that represent the idea of the document and can be used to retrieve similar documents.

Sánchez-Vega et al. [ 218 ] proposed a method termed rewriting index that evaluates the degree of membership of each sentence in the suspicious document to a possible source document. The method uses five different Turing machines to uncover verbatim copying as well as basic transformations on the word level (insertion, deletion, substitution). The output values of the Turing machines are used as the features to train a Naïve Bayes classifier and identify reused passages.

In the approach of Afzal et al. [ 5 ], the linear combination of supervised and unsupervised machine-learning methods outperformed each of the methods applied individually. In the experiments of Alfikri and Purwarianti [ 13 ], SVM classifiers outperformed Naïve Bayes classifiers. In the experiments of Subroto and Selamat [ 236 ], the best performing configuration was a hybrid model that combined SVM and an artificial neural network (ANN). El-Alfy et al. [ 62 ] found that an abductive network outperformed SVM. However, as shown in Table 22 , SVM is the most popular classifier for extrinsic plagiarism detection methods. Machine learning appears to be more beneficial when applied for the detailed analysis, as indicated by the fact that most extrinsic detection methods apply machine learning for that stage (cf. Table 22 ).

Document-level detection SVM Semantic [ , ]
SVM, Naïve Bayes Lexical, semantic [ ]
Decision tree, k-nearest neighbor Syntax [ ]
Naïve Bayes, SVM, Decision tree Lexical, syntax [ ]
Many Semantic (CbPD) [ ]
Candidate retrieval SVM Lexical [ ]
Linear discriminant analysis Lexical, syntax [ ]
Genetic algorithm Lexical, syntax [ ]
Detailed analysis Logical regression model Lexical, syntax, semantic [ ]
Naïve Bayes Lexical [ ]
Naïve Bayes, Decision Tree, Random Forest Lexical [ ]
SVM Lexical, semantic [ ]
Paraphrase identification SVM Lexical [ ]
Lexical, semantic [ , ]
Lexical, syntax, semantic [ , , ]
MT metrics [ ]
ML with syntax and semantic features [ ]
k-nearest neighbor, SVM, artificial neural network Lexical [ ]
SVM, Random forest, Gradient boosting Lexical, syntax, semantic, MT metrics [ ]
SVM, MaxEnt Lexical, syntax, semantic [ ]
Abductive networks Lexical [ ]
Linear regression Lexical, syntax, semantic [ ]
L2-regularized logistic regression Lexical, syntax, semantic, ML [ ]
Ridge regression Lexical, semantic [ ]
Gaussian process regression Lexical, semantic [ ]
Isotonic regression Semantic [ ]
Artificial neural network Lexical, semantic [ ]
Deep neural network Syntax, semantic [ ]
Semantic [ ]
Decision Tree Semantic [ ]
Lexical, syntax, semantic [ , ]
Random Forest Semantic, MT metrics [ ]
Many Lexical, semantic [ , ]
Lexical, syntax, semantic [ , ]
Cross-language PD Artificial neural networks Semantic [ ]

Evaluation of Plagiarism Detection Methods

The availability of datasets for development and evaluation is essential for research on natural language processing and information retrieval. The PAN series of benchmark competitions is a comprehensive and well‑established platform for the comparative evaluation of plagiarism detection methods and systems [ 197 ]. The PAN test datasets contain artificially created monolingual (English, Arabic, Persian) and—to a lesser extent—cross-language plagiarism instances (German and Spanish to English) with different levels of obfuscation. The papers included in this review that present lexical, syntactic, and semantic detection methods mostly use PAN datasets 12 or the Microsoft Research Paraphrase corpus. 13 Authors presenting idea-based detection methods that analyze non-textual content features or cross-language detection methods for non-European languages typically use self-created test collections, since the PAN datasets are not suitable for these tasks. A comprehensive review of corpus development initiatives is out of the scope of this article.

Since plagiarism detection is an information retrieval task, precision, recall, and F‑measure are typically employed to evaluate plagiarism detection methods. A notable use-case-specific extension of these general performance measures is the PlagDet metric. Potthast et al. introduced the metric to evaluate the performance of methods for the detailed analysis stage in external plagiarism detection [ 201 ]. A method may detect only a fragment of a plagiarism instance or report a coherent instance as multiple detections. To account for these possibilities, Potthast et al. included the granularity score as part of the PlagDet metric. The granularity score is the ratio of the detections a method reports and the true number of plagiarism instances.

PLAGIARISM DETECTION SYSTEMS

Plagiarism detection systems implement (some of) the methods described in the previous sections. To be applicable in practice, the systems must address the tradeoff between detection performance and processing speed [ 102 ], i.e., find sources of plagiarism with reasonable computational costs.

Most systems are Web-based; some can run locally. The systems typically highlight the parts of a suspicious document that likely originate from another source as well as which source that is. Understanding how the source was changed is often left to the user. Providers of plagiarism detection systems, especially of commercial systems, rarely publish information on the detection methods they employ [ 85 , 256 ]. Thus, estimating to what extent plagiarism detection research influences practical applications is difficult.

Velásquez et al. [ 256 ] provided a text-matching software and described its functionality that included the recognition of quotes. The system achieved excellent results in the PAN 10 and PAN 11 competitions. Meanwhile, the authors commercialized the system [ 195 ].

Academics and practitioners are naturally interested in which detection system achieves the best results. Weber-Wulff and her team performed the most methodologically sound investigation of this question in 2004, 2007, 2008, 2010, 2011, 2012, and 2013 [ 266 ]. In their latest benchmark evaluation, the group compared 15 systems using documents written in English and German.

Chowdhury and Bhattacharyya [ 48 ] provided an exhaustive list of currently available plagiarism detection systems. Unfortunately, the description of each system is short, and the authors did not provide performance comparisons. Pertile et al. [ 191 ] summarized the basic characteristics of 17 plagiarism detection systems. Kanjirangat and Gupta [ 251 ] compared four publicly available systems. They used four test documents that contained five forms of plagiarism (copy-and-paste, random obfuscation, translation to Hindi and back, summarization). All systems failed to identify plagiarism instances other than copy-and-paste and random obfuscation.

There is consensus in the literature that the inability of plagiarism detection systems to identify obfuscated plagiarism is currently their most severe limitation [ 88 , 251 , 266 ].

In summary, there is a lack of systematic and methodologically sound performance evaluations of plagiarism detection systems, since the benchmark comparisons of Weber-Wulff ended in 2013. This lack is problematic, since plagiarism detection systems are typically a key building block of plagiarism policies. Plagiarism detection methods and plagiarism policies are the subjects of extensive research. We argue that plagiarism detection systems should be researched just as extensively but are currently not.

In this section, we summarize the advancements in the research on methods to detect academic plagiarism that our review identified. Figure 2 depicts the suitability of the methods discussed in the previous sections for identifying the plagiarism forms presented in our typology. As shown in the Figure, n-gram comparisons are well-suited for detecting character-preserving plagiarism and partially suitable for identifying ghostwriting and syntax-preserving plagiarism. Stylometry is routinely applied for intrinsic plagiarism detection and can reveal ghostwriting and copy-and-paste plagiarism. Vector space models have a wide range of applications but appear not to be particularly beneficial for detecting idea plagiarism. Semantics-based methods are tailored to the detection of semantics-preserving plagiarism, yet also perform well for character-preserving and syntax-preserving forms of plagiarism. Non-textual feature analysis and machine learning are particularly beneficial for detecting strongly obfuscated forms of plagiarism, such as semantics-preserving and idea-preserving plagiarism. However, machine learning is a universal approach that also performs well for less strongly disguised forms of plagiarism.

Fig. 2.

The first observation of our literature survey is that ensembles of detection methods tend to outperform approaches based on a single method [ 93 , 161 ]. Chong experimented with numerous methods for preprocessing as well as with shallow and deep NLP techniques [ 47 ]. He tested the approaches on both small and large-scale corpora and concluded that a combination of string-matching and deep NLP techniques achieves better results than applying the techniques individually.

Machine-learning approaches represent the logical evolution of the idea to combine heterogeneous detection methods. Since our previous review in 2013, unsupervised and supervised machine-learning methods have found increasingly wide-spread adoption in plagiarism detection research and significantly increased the performance of detection methods. Baroni et al. [ 27 ] provided a systematic comparison of vector-based similarity assessments. The authors were particularly interested in whether unsupervised count-based approaches like LSA achieve better results than supervised prediction-based approaches like Softmax. They concluded that the prediction-based methods outperformed their count-based counterparts in precision and recall while requiring similar computational effort. We expect that the research on applying machine learning for plagiarism detection will continue to grow significantly in the future.

Considering the heterogeneous forms of plagiarism (see the typology section), the static one-fits-all approach observable in most plagiarism detection methods before 2013 is increasingly replaced by adaptive detection algorithms. Many recent detection methods first seek to identify the likely obfuscation method and then apply the appropriate detection algorithm [ 79 , 198 ], or at least to dynamically adjust the parameters of the detection method [ 216 ].

Graph-based methods operating on the syntactic and semantic levels achieve comparable results to other semantics-based methods. Mohebbi and Talebpour [ 168 ] successfully employed graph-based methods to identify paraphrases. Franco-Salvador et al. [ 79 ] demonstrated the suitability of knowledge graph analysis for cross-language plagiarism detection.

Several researchers showed the benefit of analyzing non-textual content elements to improve the detection of strongly obfuscated forms of plagiarism. Gipp et al. demonstrated that analyzing in-text citation patterns achieves higher detection rates than lexical approaches for strongly obfuscated forms of academic plagiarism [ 90 , 92 – 94 ]. The approach is computationally modest and reduces the effort required of users for investigating the detection results. Pertile et al. [ 191 ] combined lexical and citation-based approaches to improve detection performance. Eisa et al. [ 61 ] strongly advocated for additional research on analyzing non-textual content features. The research by Meuschke et al. on analyzing images [ 162 ] and mathematical expressions [ 164 ] confirms that non-textual detection methods significantly enhance the detection capabilities. Following the trend of combining detection methods, we see the analysis of non-textual content features as a promising component of future integrated detection approaches.

Surprisingly many papers in our collection addressed plagiarism detection for Arabic and Persian texts (e.g., References [ 22 , 118 , 231 , 262 ]). The interest in plagiarism detection for the Arabic language led the organizers of the PAN competitions to develop an Arabic corpus for intrinsic plagiarism detection [ 34 ]. In 2015, the PAN organizers also introduced a shared task on plagiarism detection for Arabic texts [ 32 ], followed by a shared task for Persian texts one year later [ 22 ]. While these are promising steps toward improving plagiarism detection for Arabic, Wali et al. [ 262 ] noted that the availability of corpora and lexicons for Arabic is still insufficient when compared to other languages. This lack of resources and the complex linguistic features of the Arabic language cause plagiarism detection for Arabic to remain a significant research challenge [ 262 ].

For cross-language plagiarism detection methods, Ferrero et al. [ 74 ] introduced a five-class typology that still reflects the state of the art: cross-language character n-grams (CL-CNG), cross-language conceptual thesaurus-based similarity (CL-CTS), cross-language alignment-based similarity analysis (CL-ASA), cross-language explicit semantic analysis (CL-ESA), and translation with monolingual analysis (T+MA). Franco-Salvador et al. [ 80 ] showed that the performance of these methods varies depending on the language and corpus. The observation that the combination of detection methods improves the detection performance also holds for the cross-language scenario [ 80 ]. In the analysis of Ferrero et al. [ 74 ], the detection performance of methods exclusively depended on the size of the chosen chunk but not on the language, nor the dataset. Translation with monolingual analysis is a widely used approach. For the cross-language detection task (Spanish–English) at the SemEval competition in 2016, most of the contesters applied a machine translation from Spanish to English and then compared the sentences in English [ 7 ]. However, some authors do not consider this approach as cross-language plagiarism detection but as monolingual plagiarism detection with translation as a preprocessing step [ 80 ].

For intrinsic plagiarism detection, authors predominantly use lexical and syntax-based text analysis methods. Widely analyzed lexical features include character n-grams, word frequencies, as well as the average lengths of words, sentences, and paragraphs [ 247 ]. The most common syntax-based features include PoS tag frequencies, PoS tag pair frequencies, and PoS structures [ 247 ]. At the PAN competitions, methods that analyzed lexical features and employed simple clustering algorithms achieved the best results [ 200 ].

For the author verification task, the most successful methods treated the problem as a binary classification task. They adopted the extrinsic verification paradigm by using texts from other authors to identify features that are characteristic of the writing style of the suspected author [ 233 ]. The general impostors method is a widely used and largely successful realization of this approach [ 135 , 146 , 159 , 224 ].

From a practitioner's perspective, intrinsic detection methods exhibit several shortcomings. First, stylometric comparisons are inherently error-prone for documents collaboratively written by multiple authors [ 209 ]. This shortcoming is particularly critical, since most scientific publications have multiple authors [ 39 ]. Second, intrinsic methods are not well suited for detecting paraphrased plagiarism, i.e., instances in which authors illegitimately reused content from other sources that they presented in their own words. Third, the methods are generally not reliable enough for practical applications yet. Author identification methods achieve a precision of approximately 60%, author profiling methods of approximately 80% [ 200 ]. These values are sufficient for raising suspicion and encouraging further examination but not for proving plagiarism or ghostwriting. The availability of methods for automated author obfuscation aggravates the problem. The most effective methods can mislead the identification systems in almost half of the cases [ 199 ]. Fourth, intrinsic plagiarism detection approaches cannot point an examiner to the source document of potential plagiarism. If a stylistic analysis raised suspicion, then extrinsic detection methods or other search and retrieval approaches are necessary to discover the potential source document(s).

Other Applications of Plagiarism Detection Methods

Aside from extrinsic and intrinsic plagiarism detection, the methods described in this article have numerous other applications such as machine translation [ 67 ], author profiling for marketing applications [ 211 ], spam detection [ 248 ], law enforcement [ 127 , 211 ], identifying duplicate accounts in internet fora [ 4 ], identifying journalistic text reuse [ 47 ], patent analysis [ 1 ], event recognition based on tweet similarity [ 24 , 130 ], short answer scoring based on paraphrase identification [ 242 ], or native language identification [ 119 ].

In 2010, Mozgovoy et al. [ 173 ] proposed a roadmap for the future development of plagiarism detection systems. They suggested the inclusion of syntactic parsing, considering synonym thesauri, employing LSA to discover “tough plagiarism,” intrinsic plagiarism detection, and tracking citations and references. As our review of the literature shows, all these suggestions have been realized. Moreover, the field of plagiarism detection has made a significant leap in detection performance thanks to machine learning.

In 2015, Eisa et al. [ 61 ] praised the effort invested into improving text-based plagiarism detection but noted a critical lack of “techniques capable of identifying plagiarized figures, tables, equations and scanned documents or images .” While Meuschke et al. [ 163 , 165 ] proposed initial approaches that addressed these suggestions and achieved promising results, most of the research still addresses text-based plagiarism detection only.

A generally observable trend is that approaches that integrate different detection methods—often with the help of machine learning—achieve better results. In line with this observation, we see a large potential for the future improvement of plagiarism detection methods in integrating non-textual analysis approaches with the many well-performing approaches for the analysis of lexical, syntactic, and semantic text similarity.

To summarize the contributions of this article, we refer to the four questions Kitchenham et al. [ 138 ] suggested to assess the quality of literature reviews:

  • “Are the review's inclusion and exclusion criteria described and appropriate?
  • Is the literature search likely to have covered all relevant studies?
  • Did the reviewers assess the quality/validity of the included studies?
  • Were the basic data/studies adequately described?”

We believe that the answers to these four questions are positive for our survey. Our article summarizes previous research and identifies research gaps to be addressed in the future. We are confident that this review will help researchers newly entering the field of academic plagiarism detection to get oriented as well that it will help experienced researchers to identify related works. We hope that our findings will aid in the development of more effective and efficient plagiarism detection methods and system that will then facilitate the implementation of plagiarism policies.

  • Assad Abbas, Limin Zhang, and Samee U. Khan. 2014. A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 37 (2014), 3–13. DOI: 10.1016/j.wpi.2013.12.006
  • Asad Abdi, Norisma Idris, Rasim M. Alguliyev, and Ramiz M. Aliguliyev. 2015. PDLK: Plagiarism detection using linguistic knowledge. Expert Syst. Appl . 42, 22 (2015), 8936–8946. DOI: 10.1016/j.eswa.2015.07.048
  • Samira Abnar, Mostafa Dehghani, Hamed Zamani, and Azadeh Shakery. 2014. Expanded n-grams for semantic text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In Proceedings of the 2014 IEEE Symposium on Security and Privacy . 212–226.
  • Naveed Afzal, Yanshan Wang, and Hongfang Liu. 2016. MayoNLP at SemEval-2016 Task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 674–679.
  • Basant Agarwal, Heri Ramampiaro, Helge Langseth, and Massimiliano Ruocco. 2018. A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54, 6 (2018), 922–937. DOI: 10.1016/j.ipm.2018.06.005
  • Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 497–511.
  • Mayank Agrawal and Dilip Kumar Sharma. 2016. A state of art on source code plagiarism detection. In Proceedings of the 2016 2nd International Conference on Next Generation Computing Technologies (NGCT’16) . 236–241. DOI: 10.1109/NGCT.2016.7877421
  • Mohammad Al-Smadi, Zain Jaradat, Mahmoud Al-Ayyoub, and Yaser Jararweh. 2017. Paraphrase identification and semantic text similarity analysis in arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manag. 53, 3 (2017), 640–652. DOI: 10.1016/j.ipm.2017.01.002
  • Houda Alberts. 2017. Author clustering with the aid of a simple distance measure—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Hanan Aldarmaki and Mona Diab. 2016. GWU NLP at SemEval-2016 Shared Task 1: Matrix factorization for crosslingual STS. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 663–667.
  • Mahmoud Alewiwi, Cengiz Orencik, and Erkay Savas. 2016. Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Cluster Comput . 19, 1 (2016), 109–126. DOI: 10.1007/s10586-015-0506-0
  • Zakiy Firdaus Alfikri and Ayu Purwarianti. 2014. Detailed analysis of extrinsic plagiarism detection system using machine learning approach (naive bayes and svm). Indones. J. Electr. Eng. Comput. Sci. 12, 11 (2014), 7884–7894.
  • Muna Alsallal, Rahat Iqbal, Saad Amin, and Anne James. 2013. Intrinsic plagiarism detection using latent semantic indexing and stylometry. In Proceedings of the 2013 6th International Conference on Developments in eSystems Engineering . 145–150. DOI: 10.1109/DeSE.2013.34
  • Muna AlSallal, Rahat Iqbal, Saad Amin, Anne James, and Vasile Palade. 2016. An integrated machine learning approach for extrinsic plagiarism detection. In Proceedings of the 2016 9th International Conference on Developments in eSystems Engineering (DeSE’16) . 203–208. DOI: 10.1109/DeSE.2016.1
  • Muna AlSallal, Rahat Iqbal, Vasile Palade, Saad Amin, and Victor Chang. 2019. An integrated approach for intrinsic plagiarism detection. Fut. Gener. Comput. Syst. 96 (2019), 700–712. DOI: 10.1016/j.future.2017.11.023
  • Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, and Luis Villaseñor-Pineda. 2018. Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J. Intell. Fuzzy Syst. 34, 5 (2018), 2983–2990.
  • Faisal Alvi, Mark Stevenson, and Paul Clough. 2014. Hashing and merging heuristics for text reuse detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 939–946.
  • Faisal Alvi, Mark Stevenson, and Paul Clough. 2017. Plagiarism detection in texts obfuscated with homoglyphs. In Advances in Information Retrieval . 669–675.
  • Salha Alzahrani. 2015. Arabic plagiarism detection using word correlation in N-Grams with K-Overlapping approach—Working notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Salha M. Alzahrani, Naomie Salim, and Ajith Abraham. 2012. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man, Cybern. C Appl. Rev. 42, 2 (2012), 133–149.
  • Habibollah Asghari, Salar Mohtaj, Omid Fatemi, Heshaam Faili, Paolo Rosso, and Martin Potthast. 2016. Algorithms and corpora for persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 61.
  • Duygu Ataman, Jose G. C. De Souza, Marco Turchi, and Matteo Negri. 2016. FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual semantic similarity measurement using quality estimation features and compositional bilingual word embeddings. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 570–576.
  • Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Comput. Intell. 31, 1 (2015), 132–164. DOI: 10.1111/coin.12017
  • Douglas Bagnall. 2015. Author identification using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Douglas Bagnall. 2016. Authorship clustering using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 238–247.
  • Alberto Barrón-Cedeño, Parth Gupta, and Paolo Rosso. 2013. Methods for cross-language plagiarism detection. Knowl.-Based Syst. 50 (2013), 211–217. DOI: 10.1016/j.knosys.2013.06.018
  • Alberto Barrón-Cedeño, Marta Vila, M. Antònia Martí, and Paolo Rosso. 2013. Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39, 4 (2013), 917–947. DOI: 10.1162/COLI_a_00153
  • Alberto Bartoli, Alex Dagri, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. 2015. An author verification approach based on differential features—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Jeffrey Beall. 2016. Best practices for scholarly authors in the age of predatory journals. Ann. R. Coll. Surg. Engl. 98, 2 (2016), 77–79.
  • Imene Bensalem, Imene Boukhalfa, Paolo Rosso, Lahsen Abouenour, Kareem Darwish, and Salim Chikhi. 2015. Overview of the AraPlagDet PAN@FIRE2015 shared task on arabic plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Imene Bensalem, Salim Chikhi, and Paolo Rosso. 2013. Building arabic corpora from wikisource. In Proceedings of the 2013 ACS International Conference on Computer Systems and Applications (AICCSA’13) . 1–2. DOI: 10.1109/AICCSA.2013.6616474
  • Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2013. A new corpus for the evaluation of arabic intrinsic plagiarism detection. In Information Access Evaluation: Multilinguality, Multimodality, and Visualization . 53–58.
  • Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2014. Intrinsic plagiarism detection using n-gram classes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14) . 1459–1464.
  • Ergun Bicici. 2016. RTM at SemEval-2016 Task 1: Predicting semantic similarity with referential translation machines and related statistics. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 758–764.
  • Victoria Bobicev. 2013. Authorship detection with PPM—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Hadj Ahmed Bouarara, Amine Rahmani, Reda Mohamed Hamou, and Abdelmalek Amine. 2014. Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service. In Proceedings of the 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS’14) . 157–162. DOI: 10.1109/ICIS.2014.6912125
  • Barry Bozeman, Daniel Fay, and Catherine P. Slade. 2013. Research collaboration in universities and academic entrepreneurship: The-state-of-the-art. J. Technol. Transf. 38, 1 (2013), 1–67. DOI: 10.1007/s10961-012-9281-8
  • Pearl Brereton, Barbara A. Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80, 4 (2007), 571–583. DOI: 10.1016/j.jss.2006.07.009
  • Tomáš Brychcín and Lukáš Svoboda. 2016. UWB at SemEval-2016 Task 1: Semantic textual similarity using lexical, syntactic, and semantic information. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 588–594.
  • Davide Buscaldi, Joseph Le Roux, Jorge J. García Flores, and Adrian Popescu. 2013. LIPN-CORE: Semantic text similarity using n-grams, wordnet, syntactic analysis, ESA and information retrieval based features. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics . 63.
  • Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Pinto, and Saul León. 2014. Unsupervised method for the authorship identification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF ’ 14) .
  • Daniel Castro, Yaritza Adame, María Pelaez, and Rafael Muñoz. 2015. Authorship verification, combining linguistic features and different similarity functions—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Daniele Cerra, Mihai Datcu, and Peter Reinartz. 2014. Authorship analysis based on data compression. Pattern Recogn. Lett. 42 (2014), 79–84. DOI: 10.1016/j.patrec.2014.01.019
  • Zdenek Ceska. 2008. Plagiarism detection based on singular value decomposition. In Advances in Natural Language Processing . Springer, 108–119.
  • Man Yan Miranda Chong. 2013. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Ph. D Thesis. University of Wolverhampton.
  • Hussain A. Chowdhury and Dhruba K. Bhattacharyya. 2016. Plagiarism: Taxonomy, tools and detection techniques. In Proceedings of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN’16) .
  • Daniela Chudá, Jozef Lačný, Maroš Maršalek, Pavel Michalko, and Ján Súkeník. 2013. Plagiarism detection in slovak texts on the web. In Proceedings of the Conference on Plagiarism across Europe and Beyond . 249–260.
  • Guy J. Curtis and Joseph Clare. 2017. How prevalent is contract cheating and to what extent are students repeat offenders? J. Acad. Ethics 15, 2 (2017), 115–124. DOI: 10.1007/s10805-017-9278-x
  • Guy J. Curtis and Lucia Vardanega. 2016. Is plagiarism changing over time? A 10-year time-lag study with three points of measurement. High. Educ. Res. Dev. 35, 6 (2016), 1167–1179. DOI: 10.1080/07294360.2016.1161602
  • Michiel van Dam. 2013. A basic character n-gram approach to authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Avishek Dan and Pushpak Bhattacharyya. 2013. Cfilt-core: Semantic textual similarity using universal networking language. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (*SEM’13) . 216–220.
  • Ali Daud, Wahab Khan, and Dunren Che. 2017. Urdu language processing: a survey. Artif. Intell. Rev. 47, 3 (2017), 279–311. DOI: 10.1007/s10462-016-9482-x
  • Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  • T. Dharani and I. Laurence Aroquiaraj. 2013. A survey on content based image retrieval. In Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering . 485–490. DOI: 10.1109/ICPRIME.2013.6496719
  • Michal Ďuračík, Emil Kršák, and Patrik Hrkút. 2017. Current trends in source code analysis, plagiarism detection and issues of analysis big datasets. Proc. Eng. 192 (2017), 136–141. DOI: 10.1016/j.proeng.2017.06.024
  • Nava Ehsan and Azadeh Shakery. 2016. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Inf. Process. Manag. 52, 6 (2016), 1004–1017. DOI: 10.1016/j.ipm.2016.04.006
  • Nava Ehsan and Azadeh Shakery. 2016. A pairwise document analysis approach for monolingual plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation ( FIRE’16) . 145–148.
  • Nava Ehsan, Frank Wm. Tompa, and Azadeh Shakery. 2016. Using a dictionary and n-gram alignment to improve fine-grained cross-language plagiarism detection. In Proceedings of the 2016 ACM Symposium on Document Engineering (DocEng’16) . 59–68. DOI: 10.1145/2960811.2960817
  • Taiseer Abdalla Elfadil Eisa, Naomie Salim, and Salha Alzahrani. 2015. Existing plagiarism detection techniques: A systematic mapping of the scholarly literature. Online Inf. Rev. 39, 3 (2015), 383–400.
  • El-Sayed M. El-Alfy, Radwan E. Abdel-Aal, Wasfi G. Al-Khatib, and Faisal Alvi. 2015. Boosting paraphrase detection through textual similarity metrics with abductive networks. Appl. Soft Comput. 26, (2015), 444–453. DOI: 10.1016/j.asoc.2014.10.021
  • Victoria Elizalde. 2013. Using statistic and semantic analysis to detect plagiarism—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF ’ 13) .
  • Victoria Elizalde. 2014. Using noun phrases and tf-idf for plagiarized document retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Erik von Elm, Greta Poglia, Bernhard Walder, and Martin R. Tramèr. 2004. Different patterns of duplicate publication: An Analysis of articles used in systematic reviews. JAMA 291, 8 (2004), 974–980. DOI: 10.1001/jama.291.8.974
  • Fezeh Esteki and Faramarz Safi Esfahani. 2016. A plagiarism detection approach based on SVM for persian texts. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 149–153.
  • Asli Eyecioglu and Bill Keller. 2015. Twitter paraphrase identification with simple overlap features and SVMs. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 64–69.
  • Jody Condit Fagan. 2017. An evidence-based review of academic web search engines, 2014–2016: Implications for librarians’ practice and research agenda. Inf. Technol. Libr. 36, 2 (2017), 7.
  • Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication) . The MIT Press.
  • Vanessa Wei Feng and Graeme Hirst. 2013. Authorship verification with entity coherence and other rich linguistic features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Rafael Ferreira, George D. C. Cavalcanti, Fred Freitas, Rafael Dueire Lins, Steven J. Simske, and Marcelo Riss. 2018. Combining sentence similarities measures to identify paraphrases. Comput. Speech Lang. 47 (2018), 59–73. DOI: 10.1016/j.csl.2017.07.002
  • Jérémy Ferrero, Frederic Agnes, Laurent Besacier, and Didier Schwab. 2017. CompiLIG at SemEval-2017 Task 1: Cross-language plagiarism detection methods for semantic textual similarity. arXiv:1704.01346 .
  • Jérémy Ferrero, Frédéric Agnes, Laurent Besacier, and Didier Schwab. 2017. Using word embedding for cross-language plagiarism detection. arXiv:1702.03082 .
  • Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnes. 2017. Deep investigation of cross-language plagiarism detection methods. arXiv:1705.08828 .
  • Tomáš Foltýnek and Irene Glendinning. 2015. Impact of policies for plagiarism in higher education across europe: Results of the project. Acta Univ. Agric. Silvic. Mendel. Brun. 63, 1 (2015), 207–216.
  • Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2013. Cross-language plagiarism detection using a multilingual semantic network. In Advances in Information Retrieval . 710–713.
  • Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2014. Knowledge graphs as context models: Improving the detection of cross-language plagiarism with paraphrasing. In Bridging Between Information Retrieval and Databases: PROMISE Winter School 2013 , Nicola Ferro (ed.). Springer-Verlag, Berlin, 227–236. DOI: 10.1007/978-3-642-54798-0_12
  • Marc Franco-Salvador, Parth Gupta, Paolo Rosso, and Rafael E. Banchs. 2016. Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowl.-Based Syst. 111 (2016), 87–99. DOI: 10.1016/j.knosys.2016.08.004
  • Marc Franco-Salvador, Paolo Rosso, and Manuel Montes-y-Gómez. 2016. A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52, 4 (2016), 550–570. DOI: 10.1016/j.ipm.2015.12.004
  • Marc Franco-Salvador, Paolo Rosso, and Roberto Navigli. 2014. A knowledge-based representation for cross-language document retrieval and categorization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics . 414–423.
  • Jordan Fréry, Christine Largeron, and Mihaela Juganaru-Mathieu. 2014. UJM at CLEF in Author Identification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07) . 1606–1611.
  • Jean-Gabriel Ganascia, Peirre Glaudes, and Andrea Del Lungo. 2014. Automatic detection of reuses and citations in literary texts. Lit. Linguist. Comput. 29, 3 (2014), 412–421. DOI: 10.1093/llc/fqu020
  • Yasmany García-Mondeja, Daniel Castro-Castro, Vania Lavielle-Castro, and Rafael Muñoz. 2017. Discovering author groups using a b-compact graph-based clustering—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Urvashi Garg and Vishal Goyal. 2016. Maulik: A plagiarism detection tool for hindi documents. Ind. J. Sci. Technol. 9, 12 (2016).
  • Shahabeddin Geravand and Mahmood Ahmadi. 2014. An efficient and scalable plagiarism checking system using bloom filters. Comput. Electr. Eng. 40, 6 (2014), 1789–1800.
  • M. R. Ghaeini. 2013. Intrinsic author identification using modified weighted KNN—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Erfaneh Gharavi, Kayvan Bijari, Kiarash Zahirnia, and Hadi Veisi. 2016. A deep learning approach to persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 154– 159.
  • Lee Gillam. 2013. Guess again and see if they line up: Surrey's runs at plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Bela Gipp. 2014. Citation-based Plagiarism Detection -Detecting Disguised and Cross-language Plagiarism Using Citation Pattern Analysis . Springer Vieweg Research. Retrieved from http://www.springer.com/978-3-658-06393-1 .
  • Bela Gipp and Norman Meuschke. 2011. Citation pattern matching algorithms for citation-based plagiarism detection: Greedy citation tiling, citation chunking and longest common citation sequence. In Proceedings of the 11th ACM Symposium on Document Engineering . 249–258. DOI: 10.1145/2034691.2034741
  • Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag. In Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11) . 255–258. DOI: 10.1145/1998076.1998124
  • Bela Gipp, Norman Meuschke, and Corinna Breitinger. 2014. Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus. J. Assoc. Inf. Sci. Technol. 65, 8 (2014), 1527–1540. DOI: 10.1002/asi.23228
  • Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, and Andreas Nürnberger. 2014. Web-based demonstration of semantic similarity detection using citation pattern visualization for a cross language plagiarism case. In Proceedings of the International Conference on Enterprise Information Systems (ICEIS’14) . 677–683. DOI: 10.5220/0004985406770683
  • Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, and Paolo Rosso. 2018. A resource-light method for cross-lingual semantic textual similarity. Knowl.-Based Syst. 143 (2018), 1–9. DOI: 10.1016/j.knosys.2017.11.041
  • Lila Gleitman and Anna Papafragou. 2005. Language and thought. In The Cambridge Handbook of Thinking and Reasoning , Keith J. Holyoak and Robert G. Morrison (eds.). Cambridge University Press, 633– 661.
  • Demetrios G. Glinos. 2014. A hybrid architecture for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 958–965.
  • Helena Gómez-Adorno, Yuridiana Alemán, Darnes Vilariño Ayala, Miguel A Sanchez-Perez, David Pinto, and Grigori Sidorov. 2017. Author clustering using hierarchical clustering analysis—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Helena Gómez-Adorno, Grigori Sidorov, David Pinto, and Ilia Markov. 2015. A graph based authorship identification approach—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Philipp Gross and Pashutan Modaresi. 2014. Plagiarism alignment detection by merging context seeds—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Deepa Gupta, Vani Kanjirangat, and L. M. Leema. 2016. Plagiarism detection in text documents using sentence bounded stop word n-grams. J. Eng. Sci. Technol . 11, 10 (2016), 1403–1420.
  • Deepa Gupta, Vani Kanjirangat, and Charan Kamal Singh. 2014. Using natural language processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI’14) . 2694–2699. DOI: 10.1109/ICACCI.2014.6968314
  • Josue Gutierrez, Jose Casillas, Paola Ledesma, Gibran Fuentes, and Ivan Meza. 2015. Homotopy based classification for author verification task—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yaakov HaCohen-Kerner and Aharon Tayeb. 2017. Rapid detection of similar peer-reviewed scientific papers via constant number of randomized fingerprints. Inf. Process. Manag. 53, 1 (2017), 70–86. DOI: 10.1016/j.ipm.2016.06.007
  • Matthias Hagen, Martin Potthast, and Benno Stein. 2015. Source retrieval for plagiarism detection from large web corpora: Recent approaches. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Osama Haggag and Samhaa Smhaa El-Beltagy. 2013. Plagiarism candidate retrieval using selective query formulation and discriminative query scoring. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Oren Halvani and Lukas Graner. 2017. Author clustering based on compression-based dissimilarity scores—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Oren Halvani and Martin Steinebach. 2014. VEBAV - A simple, scalable and fast authorship verification scheme—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Oren Halvani, Martin Steinebach, and Ralf Zimmermann. 2013. Authorship verification via k-nearest neighbor estimation—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Oren Halvani and Christian Winter. 2015. A generic authorship verification scheme based on equal error rates—notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Christian Hänig, Robert Remus, and Xose De La Puente. 2015. Exb themis: Extensive feature extraction from word alignments for semantic textual similarity. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 264–268.
  • Sarah Harvey. 2014. Author verification using PPM with parts of speech tagging—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Hua He, John Wieting, Kevin Gimpel, Jinfeng Rao, and Jimmy Lin. 2016. UMD-TTIC-UW at SemEval-2016 Task 1: Attention-based multi-perspective convolutional neural networks for textual similarity measurement. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 1103–1108.
  • Oumaima Hourrane and El Habib Benlahmar. 2017. Survey of plagiarism detection approaches and big data techniques related to plagiarism candidate retrieval. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications (BDCA’17) . 15:1–15:6. DOI: 10.1145/3090354.3090369
  • Manuela Hürlimann, Benno Weck, Esther van denBerg, Simon Šuster, and Malvina Nissim. 2015. GLAD: Groningen lightweight authorship detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Syed Fawad Hussain and Asif Suryani. 2015. On retrieving intelligently plagiarized documents using semantic similarity. Eng. Appl. Artif. Intell. 45 (2015), 246–258. DOI: 10.1016/j.engappai.2015.07.011
  • Ashraf S. Hussein. 2015. A plagiarism detection system for arabic documents. In Intelligent Systems 2014 , D. Filev, J. Jabłkowski, J. Kacprzyk, M. Krawczak, I. Popchev, L. Rutkowski, V. Sgurev, E. Sotirova, P. Szynkarczyk, and S. Zadrozny (Eds.). Springer International Publishing, 541–552.
  • Ashraf S. Hussein. 2015. Arabic document similarity analysis using n-grams and singular value decomposition. In Proceedings of the 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS’15) . 445–455. DOI: 10.1109/RCIS.2015.7128906
  • Radu Tudor Ionescu, Marius Popescu, and Aoife Cahill. 2014. Can characters reveal your native language? A language-independent approach to native language identification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14) . 1363–1373.
  • Hideo Itoh. 2016. RICOH at SemEval-2016 Task 1: IR-based semantic textual similarity estimation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 691–695.
  • Magdalena Jankowska, Vlado Kešelj, and and Evangelos Milios. 2013. Proximity based one-class classification with common n-gram dissimilarity for authorship verification task—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Magdalena Jankowska, Vlado Kešelj, and Evangelos Milios. 2014. Ensembles of proximity-based one-class classifiers for author verification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Arun Jayapal and Binayak Goswami. 2013. Vector space model and overlap metric for author identification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Zhuoren Jiang, Miao Chen, and Xiaozhong Liu. 2014. Semantic annotation with rescoredESA: Rescoring concept features generated from explicit semantic analysis. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14) . 25–27. DOI: 10.1145/2663712.2666192
  • M. A. C. Jiffriya, M. A. C. Akmal Jahan, and Roshan G. Ragel. 2014. Plagiarism detection on electronic text based assignments using vector space model. In Proceedings of the 7th International Conference on Information and Automation for Sustainability . 1–5. DOI: 10.1109/ICIAFS.2014.7069593
  • M. A. C. Jiffriya, M. A. C. Akmal Jahan, Roshan G. Ragel, and Sampath Deegalla. 2013. AntiPlag: Plagiarism detection on electronic submissions of text based assignments. In Proceedings of the 2013 IEEE 8th International Conference on Industrial and Information Systems . 376–380. DOI: 10.1109/ICIInfS.2013.6732013
  • Patrick Juola. 2017. Detecting contract cheating via stylometric methods. In Proceedings on the Conference on Plagiarism across Europe and Beyond . 187–198. Retrieved from https://plagiarism.pefka.mendelu.cz/files/proceedings17.pdf .
  • Patrick Juola and Efstathios Stamatatos. 2013. Overview of the author identification task at PAN 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Rune Borge Kalleberg. 2015. Towards detecting textual plagiarism using machine learning methods. University of Agder. Retrieved from https://brage.bibsys.no/xmlui/bitstream/handle/11250/299460/RuneBorgeKalleberg.pdf?sequence=1 .
  • Rafael-Michael Karampatsis. 2015. CDTDS: Predicting paraphrases in twitter via support vector regression. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 75–79.
  • Daniel Karaś, Martyna Śpiewak, and Piotr Sobecki. 2017. OPI-JSA at CLEF 2017: Author clustering and style breach detection—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Roman Kern. 2013. Grammar checker features for author identification and author profiling—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Imtiaz H. Khan, Muazzam A. Siddiqui, Kamal M. Jambi, Muhammad Imran, and Abobakr A. Bagais. 2014. Query optimization in Arabic plagiarism detection: An empirical study. Int. J. Intell. Syst. Appl. 7, 1 (2014), 73.
  • Jamal Ahmad Khan. 2017. Style breach detection: An unsupervised detection model—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Mahmoud Khonji and Youssef Iraqi. 2014. A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF)—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Khadijeh Khoshnavataher, Vahid Zarrabi, Salar Mohtaj, and Habibollah Asghari. 2015. Developing monolingual persian corpus for extrinsic plagiarism detection using artificial obfuscation—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Barbara Kitchenham. 2004. Procedures for performing systematic reviews. Keele University Technical Report TR/SE-0401. Keele University. 33.
  • Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7–15. DOI: 10.1016/j.infsof.2008.09.009
  • Mirco Kocher. 2016. UniNE at CLEF 2016: Author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Mirco Kocher and Jacques Savoy. 2015. UniNE at CLEF 2015: Author identification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Mirco Kocher and Jacques Savoy. 2017. UniNE at CLEF 2017: Author clustering—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Leilei Kong, Yong Han, Zhongyuan Han, Haihao Yu, Qibo Wang, Tinglei Zhang, and Haoliang Qi. 2014. Source retrieval based on learning to rank and text alignment based on plagiarism type recognition for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Leilei Kong, Zhimao Lu, Yong Han, Haoliang Qi, Zhongyuan Han, Qibo Wang, Zhenyuan Hao, and Jing Zhang. 2015. Source retrieval and text alignment corpus construction for plagiarism detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Leilei Kong, Zhimao Lu, Haoliang Qi, and Zhongyuan Han. 2014. Detecting high obfuscation plagiarism: Exploring multi-features fusion via machine learning. Int. J. u-and e-Serv. Sci. Technol. 7, 4 (2014), 385–396.
  • Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, and Zhongyuan Han. 2013. Approaches for source retrieval and text alignment of plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Moshe Koppel and Yaron Winter. 2014. Determining if two documents are written by the same author. J. Assoc. Inf. Sci. Technol. 65, 1 (2014), 178–187.
  • Niraj Kumar. 2014. A graph based automatic plagiarism detection technique to handle artificial word reordering and paraphrasing. In Computational Linguistics and Intelligent Text Processing . 481–494.
  • Marcin Kuta and Jacek Kitowski. 2014. Optimisation of character n-gram profiles method for intrinsic plagiarism detection. In Artificial Intelligence and Soft Computing . 500–511.
  • Mikhail Kuznetsov, Anastasia Motrenko, Rita Kuznetsova, and Vadim Strijov. 2016. Methods for intrinsic plagiarism detection and author diarization. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16 ) . 912–919. Retrieved from http://ceur-ws.org/Vol-1609/ .
  • Robert Layton, Paul Watters, and Richard Dazeley. 2013. Local n-grams for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Paola Ledesma, Gibran Fuentes, Gabriela Jasso, Angel Toledo, and and Ivan Meza. 2013. Distance learning for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Taemin Lee, Jeongmin Chae, Kinam Park, and Soonyoung Jung. 2013. CopyCaptor: Plagiarized source retrieval system using global word frequency and local feedback—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Chi-kiu Lo, Cyril Goutte, and Michel Simard. 2016. CNRC at SemEeval-2016 task 1: Experiments in crosslingual semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 668–673.
  • Tara C. Long, Mounir Errami, Angela C. George, Zhaohui Sun, and Harold R. Garner. 2009. Responding to possible plagiarism. Science 323, 5919 (2009), 1293–1294. DOI: 10.1126/science.1167408
  • Ahmed Magooda, Ashraf Y. Mahgoub, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for extrinsic plagiarism detection (RDI_RED)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Peyman Mahdavi, Zahra Siadati, and Farzin Yaghmaee. 2014. Automatic external persian plagiarism detection using vector space model. In Proceedings of the 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE’14) . 697–702.
  • Ashraf Y. Mahgoub, Ahmed Magooda, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for intrinsic plagiarism detection (RDI_RID)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15) .
  • Promita Maitra, Souvick Ghosh, and Dipankar Das. 2015. Authorship verification - an approach based on random forest—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Cristhian Mayor, Josue Gutierrez, Angel Toledo, Rodrigo Martinez, Paola Ledesma, Gibran Fuentes, and and Ivan Meza. 2014. A single author style representation for the author verification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Norman Meuschke and Bela Gipp. 2013. State-of-the-art in detecting academic plagiarism. Int. J. Educ. Integr. 9, 1 (2013), 50–71.
  • Norman Meuschke and Bela Gipp. 2014. Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries . 197–200.
  • Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel A. Keim, and Bela Gipp. 2018. An adaptive image-based plagiarism detection approach. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’18) . DOI: 10.1145/3197026.3197042
  • Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomáš Skopal, and Bela Gipp. 2017. Analyzing mathematical content to detect academic plagiarism. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17) . 2211–2214. DOI: 10.1145/3132847.3133144
  • Norman Meuschke, Nicolas Siebeck, Moritz Schubotz, and Bela Gipp. 2017. Analyzing semantic concept patterns to detect academic plagiarism. In Proceedings of the 6th International Workshop on Mining Scientific Publications (WOSP’17) . 46–53. DOI: 10.1145/3127526.3127535
  • Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Kramer, and Bela Gipp. 2019. Improving academic plagiarism detection for STEM documents by analyzing mathematical content and citations. In Proceeedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL’19) .
  • Pashutan Modaresi and Philipp Gross. 2014. A language independent author verifier using fuzzy c-means clustering—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • H. F. Moed, W. J. M. Burger, J. G. Frankfort, and A. F. J. Van Raan. 1985. The application of bibliometric indicators: Important field- and time-dependent factors to be considered. Scientometrics 8, 3–4 (1985), 177–203. DOI: 10.1007/BF02016935
  • Majid Mohebbi and Alireza Talebpour. 2016. Texts semantic similarity detection based graph approach. Int. Arab J. Inf. Technol. 13, 2 (2016), 246–251.
  • Mozhgan Momtaz, Kayvan Bijari, Mostafa Salehi, and Hadi Veisi. 2016. Graph-based approach to text alignment for plagiarism detection in persian documents. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) . 176–179.
  • Erwan Moreau, Arun Jayapal, and Carl Vogel. 2014. Author verification: exploring a large set of parameters using a genetic algorithm—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Erwan Moreau, Arun Jayapal, Gerard Lynch, and Carl Vogel. 2015. Author verification: Basic stacked generalization applied to predictions from a set of heterogeneous learners—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Erwan Moreau and Carl Vogel. 2013. Style-based distance features for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Maxim Mozgovoy, Tuomo Kakkonen, and Georgina Cosma. 2010. Automatic student plagiarism detection: Future perspectives. J. Educ. Comput. Res. 43, 4 (2010), 511–531.
  • Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration . 364–371. DOI: 10.1109/IRI.2015.62
  • El Moatez Billah Nagoudi, Ahmed Khorsi, Hadda Cherroun, and Didier Schwab. 2018. 2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents. Cybern. Inf. Technol. 18, 1 (2018), 124–138. DOI: 10.2478/cait-2018-0011
  • Rao Muhammad Adeel Nawab, Mark Stevenson, and Paul Clough. 2017. An IR-based approach utilizing query expansion for plagiarism detection in MEDLINE. IEEE/ACM Trans. Comput. Biol. Bioinforma. 14, 4 (2017), 796–804. DOI: 10.1109/TCBB.2016.2542803
  • Philip M. Newton. 2018. How common is commercial contract cheating in higher education and is it increasing? A Systematic Review. Front. Educ. 3 (2018). DOI: 10.3389/feduc.2018.00067
  • Le Thanh Nguyen, Nguyen Xuan Toan, and Dinh Dien. 2016. Vietnamese plagiarism detection method. In Proceedings of the 7th Symposium on Information and Communication Technology (SoICT’16) . 44–51. DOI: 10.1145/3011077.3011109
  • Gabriel Oberreuter and Juan D. VeláSquez. 2013. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Exp. Syst. Appl. 40, 9 (2013), 3756–3763.
  • Milan Ojsteršek, Janez Brezovnik, Mojca Kotar, Marko Ferme, Goran Hrovat, Albin Bregant, and Mladen Borovič. 2014. Establishing of a slovenian open access infrastructure: A technical point of view. Program 48, 4 (2014), 394–412. DOI: 10.1108/PROG-02-2014-0005
  • Adeva Oktoveri, Agung Toto Wibowo, and Ari Moesriami Barmawi. 2014. Non-relevant document reduction in anti-plagiarism using asymmetric similarity and AVL tree index. In Proceedings of the 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS’14) . 1–5. DOI: 10.1109/ICIAS.2014.6869547
  • Ahmed Hamza Osman and Naomie Salim. 2013. An improved semantic plagiarism detection scheme based on Chi-squared automatic interaction detection. In Proceedings of the 2013 International Conference on Computing, Electrical and Electronic Engineering (ICCEEE’13) . 640–647. DOI: 10.1109/ICCEEE.2013.6634015
  • Caleb Owens and Fiona A. White. 2013. A 5‐year systematic strategy to reduce plagiarism among first‐year psychology university students. Aust. J. Psychol. 65, 1 (2013), 14–21. DOI: 10.1111/ajpy.12005
  • María Leonor Pacheco, Kelwin Fernandes, and Aldo Porco. 2015. Random forest with increased generalization: A universal background approach for authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yurii Palkovskii and Alexei Belov. 2013. Using hybrid similarity methods for plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Yurii Palkovskii and Alexei Belov. 2014. Developing high-resolution universal multi-type n-gram plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 984–989.
  • Guy Paré, Marie-Claude Trudel, Mirou Jaana, and Spyros Kitsiou. 2015. Synthesizing information systems knowledge: A typology of literature reviews. Inf. Manag. 52, 2 (2015), 183–199. DOI: 10.1016/j.im.2014.08.008
  • Merin Paul and Sangeetha Jamal. 2015. An Improved SRL based plagiarism detection technique using sentence ranking. Procedia Comput. Sci. 46 (2015), 223–230. DOI: 10.1016/j.procs.2015.02.015
  • Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing . 425–430.
  • Jian Peng, Kim-Kwang Raymond Choo, and Helen Ashman. 2016. Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70 (2016), 171–182. DOI: 10.1016/j.jnca.2016.04.001
  • Solange de L. Pertile, Viviane P. Moreira, and Paolo Rosso. 2015. Comparing and combining content‐ and citation‐based approaches for plagiarism detection. J. Assoc. Inf. Sci. Technol. 67, 10 (2015), 2511–2526. DOI: 10.1002/asi.23593
  • Solange de L. Pertile, Paolo Rosso, and Viviane P. Moreira. 2013. Counting co-occurrences in citations to identify plagiarised text fragments. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages . 150–154.
  • Timo Petmanson. 2013. Authorship identification using correlations of frequent features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . 1341–1351.
  • Gaspar Pizarro V. and Juan D. Velásquez. 2017. Docode 5: Building a real-world plagiarism detection system. Eng. Appl. Artif. Intell. 64 (Jun. 2017), 261–271. DOI: 10.1016/j.engappai.2017.06.001
  • Juan-Pablo Posadas-Durán, Grigori Sidorov, Ildar Batyrshin, and Elibeth Mirasol-Meléndez. 2015. Author verification using syntactic n-grams—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Martin Potthast, Tim Gollub, Matthias Hagen, Martin Tippmann, Johannes Kiesel, Paolo Rosso, Efstathios Stamatatos, and Benno Stein. 2013. Overview of the 5th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, Paolo Rosso, and Benno Stein. 2014. Overview of the 6th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Martin Potthast, Matthias Hagen, and Benno Stein. 2016. Author Obfuscation: Attacking the state of the art in authorship verification. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. 2017. Overview of PAN’17: Author identification, author profiling, and author obfuscation. In Proceedings of the 7th International Conference of the CLEF Initiative . DOI: 10.1007/978-3-319-65813-1_25
  • Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, and Paolo Rosso. 2010. An Evaluation framework for plagiarism detection. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10) . 997–1005.
  • Martin Potthast, Benno Stein, Andreas Eiselt, Alberto Barrón-Cedeño, and Paolo Rosso. 2009. Overview of the 1st international competition on plagiarism detection. In Proceedings of the SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN’09) . 1–9.
  • Amit Prakash and Sujan Kumar Saha. 2014. Experiments on document chunking and query formation for plagiarism source retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Piotr Przybyła, Nhung T. H. Nguyen, Matthew Shardlow, Georgios Kontonatsios, and Sophia Ananiadou. 2016. NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 614–620.
  • Javad Rafiei, Salar Mohtaj, Vahid Zarrabi, and Habibollah Asghari. 2015. Source retrieval plagiarism detection based on noun phrase and keyword phrase extraction—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Shima Rakian, Esfahani Faramarz Safi, and Hamid Rastegari. 2015. A Persian fuzzy plagiarism detection approach. J. Inf. Syst. Telecommun. 3, 3 (2015), 182–190.
  • N Riya Ravi and Deepa Gupta. 2015. Efficient paragraph based chunking and download filtering for plagiarism source retrieval—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • N. Riya Ravi, Vani Kanjirangat, and Deepa Gupta. 2016. Exploration of fuzzy C means clustering algorithm in external plagiarism detection system. In Intelligent Systems Technologies and Applications . Springer, 127–138.
  • Andi Rexha, Stefan Klampfl, Mark Kröll, and Roman Kern. 2015. Towards authorship attribution for bibliometrics using stylometric features. In Proceedings of the Conference on Computational Linguistics and Bibliometrics co-located with the International Conference on Scientometrics and Informetrics (CLBib@ ISSI) . 44–49.
  • Diego Antonio Rodríguez Torrejón and José Manuel Martín Ramos. 2014. CoReMo 2.3 Plagiarism detector text alignment module—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall, and Benno Stein. 2016. Overview of PAN’16. In Experimental IR Meets Multilinguality, Multimodality, and Interaction . 332–350.
  • Frantz Rowe. 2014. What literature review is not: Diversity, boundaries and recommendations. Eur. J. Inf. Syst. 23, 3 (2014), 241–255. DOI: 10.1057/ejis.2014.7
  • Barbara Rychalska, Katarzyna Pakulska, Krystyna Chodorowska, Wojciech Walczak, and Piotr Andruszkiewicz. 2016. Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 602–608.
  • Kamil Safin and Rita Kuznetsova. 2017. Style breach detection with neural sentence embeddings—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Anuj Saini and Aayushi Verma. 2016. Anuj@ DPIL-FIRE2016: a novel paraphrase detection method in hindi language using machine learning. In Proceedings of the Forum for Information Retrieval Evaluation . 141–152.
  • Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov. 2015. Dynamically adjustable approach through obfuscation type recognition—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Miguel A Sanchez-Perez, Grigori Sidorov, and Alexander F Gelbukh. 2014. A winning approach to text alignment for text reuse detection at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) . 1004–1011.
  • Fernando Sánchez-Vega, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and Paolo Rosso. 2013. Determining and characterizing the reused text for plagiarism detection. J. Assoc. Inf. Sci. Technol. 65, 5 (2013), 1804–1813. DOI: 10.1016/j.eswa.2012.09.021
  • Yunita Sari and Mark Stevenson. 2015. A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Yunita Sari and Mark Stevenson. 2016. Exploring word embeddings and character n-grams for author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Satyam, Anand, Arnav Kumar Dawn, and and Sujan Kumar Saha. 2014. Statistical analysis approach to author identification using latent semantic analysis—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Taneeya Satyapanich, Hang Gao, and Tim Finin. 2015. Ebiquity: Paraphrase and semantic similarity in twitter using skipgrams. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 51–55.
  • Andreas Schmidt, Reinhold Becker, Daniel Kimmig, Robert Senger, and Steffen Scholz. 2014. A concept for plagiarism detection based on compressed bitmaps. In Procceedings of the 6th International Conference on Advances in Databases, Knowledge, and Data Applications . 30–34.
  • Shachar Seidman. 2013. Authorship verification using the impostors method—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF13) .
  • Prasha Shrestha, Suraj Maharjan, and Thamar Solorio. 2014. Machine translation evaluation metric for text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Prasha Shrestha and Thamar Solorio. 2013. Using a variety of n-grams for the detection of different kinds of plagiarism. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Muazzam Ahmed Siddiqui, Imtiaz Hussain Khan, Kamal Mansoor Jambi, Salma Omar Elhaj, and Abobakr Bagais. 2014. Developing an arabic plagiarism detection corpus. Comput. Sci. Inf. Technol. 4, 2014 (2014), 261–269. DOI: 10.5121/csit.2014.41221
  • L. Sindhu and Sumam Mary Idicula. 2015. Fingerprinting based detection system for identifying plagiarism in malayalam text documents. In Proceedings of the 2015 International Conference on Computing and Network Communications (CoCoNet’15) . 553–558. DOI: 10.1109/CoCoNet.2015.7411242
  • Abdul Sittar, Hafiz Rizwan Iqbal, and Rao Muhammad Adeel Nawab. 2016. Author diarization using cluster-distance approach. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) . 1000–1007.
  • Sidik Soleman and Ayu Purwarianti. 2014. Experiments on the Indonesian plagiarism detection using latent semantic analysis. In Proceedings of the 2014 2nd International Conference on Information and Communication Technology (ICoICT’14) . 413–418. DOI: 10.1109/ICoICT.2014.6914098
  • Hussein Soori, Michal Prilepok, Jan Platos, Eshetie Berhan, and Vaclav Snasel. 2014. Text similarity based on data compression in Arabic. In AETA 2013: Recent Advances in Electrical Engineering and Related Sciences . Springer, 211–220.
  • Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, and Alberto Barrón-Cedeño. 2014. Overview of the author identification task at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, and Benno Stein. 2015. Overview of the PAN/CLEF 2015 Evaluation Lab. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the 6th International Conference of the CLEF Initiative (CLEF’15) . 518–538. DOI: 10.1007/978-3-319-24027-5_49
  • Efstathios Stamatatos, Walter Daelemans Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. 2015. Overview of the author identification task at PAN 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Benno Stein, Sven zu Eissen, and Martin Potthast. 2007. Strategies for retrieving plagiarized documents. In Proceedings of the 30th Annual International ACM SIGIR Conference . 825–826. DOI: 10.1145/1277741.1277928
  • Imam Much Ibnu Subroto and Ali Selamat. 2014. Plagiarism detection through internet using hybrid artificial neural network and support vectors machine. Telecommun. Comput. Electron. Control. 12, 1 (2014), 209–218.
  • Šimon Suchomel and Michal Brandejs. 2014. Heterogeneous queries for synoptic and phrasal search—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14) .
  • Šimon Suchomel and Michal Brandejs. 2015. Improving synoptic querying for source retrieval. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15) .
  • Šimon Suchomel, Jan Kasprzak, and Michal Brandejs. 2013. Diverse queries and feature type selection for plagiarism discovery—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. DLS@CU: Sentence similarity from word alignment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14) . 241–246.
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. Back to basics for monolingual alignment: Exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2 (2014), 219–230.
  • M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2015. DLS@CU: Sentence similarity from word alignment and semantic vector composition. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 148–153.
  • Junfeng Tian and Man Lan. 2016. ECNU at SemEval-2016 Task 1: Leveraging word embedding from macro and micro views to boost performance for semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16) . 621–627.
  • Diego A. Rodríguez Torrejón and José Manuel Martín Ramos. 2013. Text alignment module in CoReMo 2.1 plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Michael Tschuggnall and Günther Specht. 2013. Detecting plagiarism in text documents through grammar-analysis of authors. Datenbanksysteme für Business, Technologie und Web (BTW) 2028 , Volker Markl, Gunter Saake, Kai-Uwe Sattler, Gregor Hackenbroich, Bernhard Mitschang, Theo Härder, and Veit Köppen (Eds.). Gesellschaft für Informatik e.V., 241--259.
  • Michael Tschuggnall and Günther Specht. 2013. Using grammar-profiles to intrinsically expose plagiarism in text documents. In Natural Language Processing and Information Systems . 297–302.
  • Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. 2017. Overview of the author identification task at PAN-2017: Style breach detection and author clustering. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17) .
  • Alper Kursat Uysal and Serkan Gunal. 2014. Text classification using genetic algorithm oriented latent semantic features. Exp. Syst. Appl. 41, 13 (2014), 5938–5947. DOI: 10.1016/j.eswa.2014.03.041
  • Vani Kanjirangat and Deepa Gupta. 2014. Using K-means cluster based techniques in external plagiarism detection. In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics (IC3I’14) . 1268–1273. DOI: 10.1109/IC3I.2014.7019659
  • Vani Kanjirangat and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI’15) . 1578–1584. DOI: 10.1109/ICACCI.2015.7275838
  • Vani Kanjirangat and Deepa Gupta. 2016. Study on extrinsic text plagiarism detection techniques and tools. J. Eng. Sci. Technol. Rev. 9, 5 (2016), 9–23.
  • Vani Kanjirangat and Deepa Gupta. 2017. Detection of idea plagiarism using syntax–semantic concept extractions with genetic algorithm. Exp. Syst. Appl. 73 (2017), 11–26. DOI: 10.1016/j.eswa.2016.12.022
  • Vani Kanjirangat and Deepa Gupta. 2017. Identifying document-level text plagiarism: A two-phase approach. J. Eng. Sci. Technol. 12, 12 (2017), 3226–3250.
  • Vani Kanjirangat and Deepa Gupta. 2017. Text plagiarism classification using syntax based linguistic features. Exp. Syst. Appl. 88 (2017), 448–464. DOI: 10.1016/j.eswa.2017.07.006
  • Anna Vartapetiance and Lee Gillam. 2013. A textual modus operandi: surrey's simple system for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Juan D Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. 2016. DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fus. 27 (2016), 64–75. DOI: 10.1016/j.inffus.2015.05.006
  • Ondřej Veselý, Tomáš Foltýnek, and Jiří Rybička. 2013. Source retrieval via naïve approach and passage selection heuristics—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Darnes Vilariño, David Pinto, Helena Gómez, Saúl León, and Esteban Castillo. 2013. Lexical-syntactic and graph-based features for authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Ngoc Phuoc An Vo, Octavian Popescu, and Tommaso Caselli. 2014. FBK-TR: SVM for semantic relatedeness and corpus patterns for RTE. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14) . 289–293.
  • Hai Hieu Vu, Jeanne Villaneau, Farida Saïd, and Pierre-François Marteau. 2014. Sentence similarity by combining explicit semantic analysis and overlapping N-grams. In Text, Speech and Dialogue . 201–208.
  • Elizabeth Wager. 2014. Defining and responding to plagiarism. Learn. Publ. 27, 1 (2014), 33–42. DOI: 10.1087/20140105
  • Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between arabic sentences. In Computational Collective Intelligence . 158–167.
  • John Walker. 1998. Student plagiarism in universities: What are we doing about it? High. Educ. Res. Dev. 17, 1 (1998), 89–106. DOI: 10.1080/0729436980170105
  • Shuai Wang, Haoliang Qi, Leilei Kong, and Cuixia Nu. 2013. Combination of VSM and jaccard coefficient for external plagiarism detection. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics . 1880–1885. DOI: 10.1109/ICMLC.2013.6890902
  • Debora Weber-Wulff. 2014. False feathers: A Perspective on Academic Plagiarism . Springer, Berlin.
  • Debora Weber-Wulff, Christopher Möller, Jannis Touras, and Elin Zincke. 2013. Plagiarism Detection Software Test 2013. Retrieved from http://plagiat.htw-berlin.de/wp-content/uploads/Testbericht-2013-color.pdf .
  • Agung Toto Wibowo, Kadek W. Sudarmadi, and Ari M. Barmawi. 2013. Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents. In Proceedings of the 2013 International Conference of Information and Communication Technology (ICoICT’13) . 128–133. DOI: 10.1109/ICoICT.2013.6574560
  • Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Chowdhury, and C. Lee Giles. 2013. Unsupervised ranking for plagiarism source retrieval—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13) .
  • Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Classifying and ranking search engine results as potential sources of plagiarism. In Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng’14) . 97–106. DOI: 10.1145/2644866.2644879
  • Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Supervised ranking for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop ( CLEF’ 14) .
  • Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. A lightweight and high performance monolingual word aligner. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . 702–707.
  • Takeru Yokoi. 2015. Sentence-based plagiarism detection for japanese document based on common nouns and part-of-speech structure. In Intelligent Software Methodologies, Tools and Techniques . 297–308.
  • Guido Zarrella, John Henderson, Elizabeth M. Merkhofer, and Laura Strickhart. 2015. MITRE: Seven systems for semantic similarity in tweets. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 12–17.
  • Chunxia Zhang, Xindong Wu, Zhendong Niu, and Wei Ding. 2014. Authorship identification from unstructured texts. Knowl.-Based Syst. 66 (2014), 99–111. DOI: 10.1016/j.knosys.2014.04.025
  • Jiang Zhao and Man Lan. 2015. Ecnu: Leveraging word embeddings to boost performance for paraphrase in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15) . 34–39.
  • Valentin Zmiycharov, Dimitar Alexandrov, Hristo Georgiev, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. 2016. Experiments in authorship-link ranking and complete author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16) .
  • Sven Meyer Zu Eissen and Benno Stein. 2006. Intrinsic plagiarism detection. In Proceedings of the European Conference on Information Retrieval . 565–569.
  • Denis Zubarev and Ilya Sochenkov. 2014. Using sentence similarity measure for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop ( CLEF’ 14) .
  • Teddi Fishman. 2009. We know it when we see it' is not good enough: Toward a standard definition of plagiarism that transcends theft, fraud, and copyright. In Proceedings 4th Asia Pacific Conference on Educational Integrity (4APCEI'09) . 5.
  • 1 http://de.vroniplag.wikia.com .
  • 2 https://beallslist.weebly.com/standalone-journals.html .
  • 3 https://www.ldc.upenn.edu/ .
  • 4 http://github.com/danieldk/citar .
  • 5 http://www.cis.uni-muenchen.de/∼schmid/tools/TreeTagger/ .
  • 6 http://nlp.stanford.edu/software/lex-parser.shtml .
  • 7 https://publications.europa.eu/en/web/eu-vocabularies/th-dataset/-/resource/dataset/eurovoc .
  • 8 https://babelnet.org/ .
  • 9 https://wiki.dbpedia.org/ .
  • 10 http://verbs.colorado.edu/∼mpalmer/projects/ace.html .
  • 11 http://verbs.colorado.edu/∼mpalmer/projects/verbnet.html .
  • 12 https://pan.webis.de/data.html .
  • 13 https://www.microsoft.com/en-us/download/details.aspx?id=52398 .

This work was supported by the EU ESF grant CZ.02.2.69/0.0/0.0/16_027/0007953 “MENDELU international development.”

Authors’ addresses: T. Foltýnek, Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czechia; email: [email protected] ; N. Meuschke and B. Gipp, Data & Knowledge Engineering Group, University of Wuppertal, School of Electrical, Information and Media Engineering, Rainer-Gruenter-Str. 21, D-42119 Wuppertal, Germany; emails: [email protected] , [email protected] , [email protected] , [email protected] .

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

CC-BY share alike license image

This work is licensed under a Creative Commons Attribution-Share Alike International 4.0 License .

©2019 Copyright held by the owner/author(s). 0360-0300/2019/10-ART112 $15.00 DOI: https://doi.org/10.1145/3345317

Publication History: Received March 2019; revised August 2019; accepted August 2019

Perceptions of and Attitudes toward Plagiarism and Factors Contributing to Plagiarism: a Review of Studies

  • Published: 30 March 2017
  • Volume 15 , pages 167–195, ( 2017 )

Cite this article

research article about plagiarism

  • Fauzilah Md Husain 1 ,
  • Ghayth Kamel Shaker Al-Shaibani 1 &
  • Omer Hassan Ali Mahfoodh 1  

10k Accesses

40 Citations

Explore all metrics

The abundance of information technology and electronic resources for academic materials has contributed to the attention given to research on plagiarism from various perspectives. Among the issues that have attracted researchers’ attention are perceptions of plagiarism and attitudes toward plagiarism. This article presents a critical review of studies that have been conducted to examine staff’s and students’ perceptions of and attitudes toward plagiarism. It also presents a review of studies that have focused on factors contributing to plagiarism. Our review of studies reveals that most of the studies on perceptions of plagiarism and attitudes toward plagiarism lack an in-depth analysis of the relationship between the perceptions of plagiarism and other contextual, sociocultural and institutional variables, or the relationship between attitudes toward plagiarism and students’ perceptions of various forms of plagiarism. Although our review shows that various factors can contribute to plagiarism, there is no taxonomy that can account for all these factors. Some suggestions for future research are provided in this review article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

research article about plagiarism

Exploring the Perceived Spectrum of Plagiarism: a Case Study of Online Learning

research article about plagiarism

Chinese university EFL teachers’ perceptions of plagiarism

research article about plagiarism

Preservice Teachers’ Perception of Plagiarism: A Case from a College of Education

Explore related subjects.

  • Artificial Intelligence

Akbulut, Y., Şendağ, S., Birinci, G., Kılıçer, K., Şahin, M. C., & Odabaşı, H. F. (2008). Exploring the types and reasons of internet-triggered academic dishonesty among Turkish undergraduate students: development of internet-triggered academic dishonesty scale (ITADS). Computers & Education, 51 (1), 463–473.

Article   Google Scholar  

Alam, L. S. (2004). Is plagiarism more prevalent in some form of assessment than others? In C. R. Atkinson, D. J.-D. McBeath & R. Phillips (Eds.), Beyond the comfort zone: Proceedings of the 21st ASCILITE Conference (pp. 48–57).

Amiri, F., & Razmjoo, S. A. (2016). On Iranian EFL undergraduate students’ perceptions of plagiarism. Journal of Academic Ethics, 14 (2), 115–131.

Ashworth, P., Bannister, P., Thorne, P., & Students on the Qualitative Research Methods Course Unit. (1997). Guilty in whose eyes? University students’ perceptions of cheating and plagiarism in academic work and assessment. Studies in Higher Education, 22 (2), 187–203.

Babaii, E., & Nejadghanbar, H. (2016). Plagiarism among Iranian graduate students of language studies: perspectives and causes. Ethics and Behavior , 1–19. doi: 10.1080/10508422.2016.113886 .

Baetz, M., Zivcakova, L., Wood, E., Nosko, A., De Pasquale, D., & Archer, K. (2011). Encouraging active classroom discussion of academic integrity and misconduct in higher education business contexts. Journal of Academic Ethics, 9 (3), 217–234.

Barnett, R., & Cox, A. (2005). “At least they are learning something”: the hazy line between collaboration and collusion. Assessment and Evaluation in Higher Education, 30 (2), 107–122.

Batane, T. (2010). Turning to Turnitin to fight plagiarism among university students. Educational Technology & Society, 13 (2), 1–12.

Google Scholar  

Bennett, K. K., Behrendt, L. S., & Boothby, J. L. (2011). Instructor perceptions of plagiarism: are we finding common ground? Teaching of Psychology, 38 (1), 29–35. doi: 10.1177/0098628310390851 .

Bertram Gallant, T. (2008). Academic integrity in the twenty-first century: a teaching and learning imperative . San Francisco: Jossey-Bass.

Biber, D. (2006). University language: A corpus-based study of spoken and written registers (Vol. 23): John Benjamins Publishing.

Bloch, J. (2012). Plagiarism, intellectual property and the teaching of L2 writing . Bristol: Multilingual Matters.

Breen, L., & Maassen, M. (2005). Reducing the incidence of plagiarism in an undergraduate course: the role of education. Issues in Educational Research, 15 (1), 1–16.

Briggs, R. (2009). Shameless! Reconceiving the problem of student plagiarism. Angelaki: Journal of Theoretical Humanities, 14 (1), 65–75.

Brimble, M., & Stevenson-Clarke, P. (2005). Perceptions of the prevalence and seriousness of academic dishonesty in Australian universities. The Australian Educational Researcher, 32 (3), 19–44.

Bruce, I. (2008). Academic writing and genre: A systematic analysis . London: Continuum.

Carroll, J. (2002). A handbook for deterring plagiarism in higher education . Oxford Brookes University: Oxford: Oxford Centre for Staff and Learning Development.

Chen, Y., & Chou, C. (2016). Are We on the same page? College Students’ and Faculty’s Perception of Student Plagiarism in Taiwan. Ethics and Behavior , 1–21. doi: 10.1080/10508422.2015.1123630 .

Chien, S. C. (2014). Cultural constructions of plagiarism in student writing: teachers' perceptions and responses. Research in the Teaching of English, 49 (2), 120–140.

Chien, S. C. (2016). Taiwanese College Students’ Perceptions of Plagiarism: cultural and Educational Considerations. Ethics and Behavior , 1–22. doi: 10.1080/10508422.2015.1136219 .

Comas-Forgas, R., & Sureda-Negre, J. (2010). Academic plagiarism: explanatory factors from students’ perspective. Journal of Academic Ethics, 8 (3), 217–232.

Curtis, G. J., Gouldthorp, B., Thomas, E. F., O'Brien, G. M., & Correia, H. M. (2013). Online academic-integrity mastery training may improve students' awareness of, and attitudes toward, plagiarism. Psychology Learning and Teaching, 12 (3), 282–289. doi: 10.2304/plat.2013.12.3.282 .

Davis, M. (2012). International postgraduate students' experiences of plagiarism education in the UK: student, tutor and expert perspectives. International Journal for Educational Integrity, 8 (2), 21–33.

Decoo, W. (2002). Crisis on campus: confronting academic misconduct : MIT Press.

Devlin, M., & Gray, K. (2007). In their own words: a qualitative study of the reasons Australian university students plagiarize. High Education Research & Development, 26 (2), 181–198.

Divan, A., Bowman, M., & Seabourne, A. (2015). Reducing unintentional plagiarism amongst international students in the biological sciences: an embedded academic writing development programme. Journal of Further and Higher Education, 39 (3), 358–378.

Ehrich, J., Howard, S. J., Mu, C., & Bokosmaty, S. (2014). A comparison of Chinese and Australian university students' attitudes towards plagiarism. Studies in Higher Education . doi: 10.1080/03075079.2014.927850 .

Fishman, T. (2009). “We know it when we see it” is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright. Paper presented at the 4th Asia Pacific Conference on Educational Integrity University of Wollongong, Australia, 28–30 September,

Flint, A., Clegg, S., & Macdonald, R. (2006). Exploring staff perceptions of student plagiarism. Journal of Further and Higher Education, 30 (02), 145–156.

Flowerdew, J., & Li, Y. (2007). Plagiarism and second language writing in an electronic age. Annual Review of Applied Linguistics, 27 , 161–183.

Ford, P. J., & Hughes, C. (2012). Academic integrity and plagiarism: perceptions and experience of staff and students in a school of dentistry: a situational analysis of staff and student perspectives. European Journal of Dental Education, 16 (1), e180–e186. doi: 10.1111/j.1600-0579.2011.00695.x .

Fowler Jr., F. J. (2013). Survey research methods . Thousand Oaks: Sage Publications.

Franklyn-Stokes, A., & Newstead, S. E. (1995). Undergraduate cheating: who does what and why? Studies in Higher Education, 20 (2), 159–172.

Ghajarzadeh, M., Norouzi-Javidan, A., Hassanpour, K., Aramesh, K., & Emami-Razavi, S. H. (2012). Attitude toward plagiarism among Iranian medical faculty members. Acta Medica Iranica, 50 (11), 778–781.

Gomez, G. (2012). Do easily copied internet media in the library lead to plagiarism?. In International Conference on Science and the Internet, Düsseldorf, Germany (pp. 131–143). Retrieved from http://nfgwin.uni-duesseldorf.de/sites/default/files/Gomez.pdf

Gomez, M. S. S., Shivanna, L., & Shivanna, B. S. (2014). Assessment of the attitude towards plagiarism among dental postgraduate students and faculty members in Bapuji dental college and hospital, Davangere–a cross sectional survey. Journal of Dental and Medical Sciences, 13 (5), 1–6.

Gullifer, J. M., & Tyson, G. A. (2010). Exploring university students' perceptions of plagiarism: a focus group study. Studies in Higher Education, 35 (4), 463–481.

Gullifer, J. M., & Tyson, G. A. (2014). Who has read the policy on plagiarism? Unpacking students' understanding of plagiarism. Studies in Higher Education, 39 (7), 1202–1218.

Harris, R. A. (2001). The plagiarism handbook: Strategies for preventing, detecting, and dealing with plagiarism . Los Angeles: Pyrczak.

Hofstede, G., Hofstede, G. J., & Minkov, M. (1991). Cultures and organizations: Software of the mind (Vol. 2). London: McGraw-Hill.

Howard, R. M. (1995). Plagiarisms, authorships, and the academic death penalty. College English, 57 (7), 788–806.

Hu, G., & Lei, J. (2012). Investigating Chinese university students’ knowledge of and attitudes toward plagiarism from an integrated perspective. Language Learning, 62 (3), 813–850.

Hu, G., & Lei, J. (2015). Chinese university students’ perceptions of plagiarism. Ethics and Behavior, 25 (3), 233–255.

Hyland, K. (2003). Second language writing : Cambridge University Press.

Hyland, K. (2009). Academic discourse: English in a global context : A&C Black.

Kasprzak, J., & Nixon, M. (2004). Cheating in cyberspace: maintaining quality in higher education. Association for the Advancement of Computering in Education, 12 (1), 85–99.

Kayaoğlu, M. N., Erbay, Ş., Flitner, C., & Saltaş, D. (2015). Examining students’ perceptions of plagiarism: a cross-cultural study at tertiary level. Journal of Further and Higher Education . doi: 10.1080/0309877x.2015.1014320 .

Koul, R., Clariana, R. B., Jitgarun, K., & Songsriwittaya, A. (2009). The influence of achievement goal orientation on plagiarism. Learning and Individual Differences, 19 (4), 506–512.

Kuntz, J. R., & Butler, C. (2014). Exploring individual and contextual antecedents of attitudes toward the acceptability of cheating and plagiarism. Ethics & Behavior, 24 (6), 478–494.

Kutz, E., Rhodes, W., Sutherland, S., & Zamel, V. (2011). Addressing plagiarism in a digital age. Human Architecture, 9 (3), 15–35.

Lau, G. K., Yuen, A. H., & Park, J. (2013). Toward an analytical model of ethical decision making in plagiarism. Ethics & Behavior, 23 (5), 360–377.

Lei, J., & Hu, G. (2015). Chinese university EFL teachers’ perceptions of plagiarism. Higher Education, 70 (3), 551–565. doi: 10.1007/s10734-014-9855-5 .

Leonard, M., Schwieder, D., Buhler, A., Bennett, D. B., & Royster, M. (2015). Perceptions of plagiarism by STEM graduate students: a case study. Science and Engineering Ethics . doi: 10.1007/s11948-014-9604-2 .

Lim, V. K., & See, S. K. (2001). Attitudes toward, and intentions to report academic cheating among students in Singapore. Ethics & Behaviour, 11 (3), 261–274.

Louw, D. A. (1998). Human development . Cape Town: Pearson South Africa.

Marcus, S., & Beck, S. (2011). Faculty perceptions of plagiarism at Queensborough community college. Community & Junior College Libraries, 17 (2), 63–73.

Marshall, S. & Garry, M. (2005). How well do students really understand plagiarism? In Goss, H. (Ed.), Proceedings of the 22nd annual conference of the Australasian Society for Computers in Learning in Tertiary Education (ASCILITE) (pp. 457–467). Brisbane, Australia, 4–7 December.

Marshall, S., & Garry, M. (2006). NESB and ESB students’ attitudes and perceptions of plagiarism. International Journal of Educational Integrity, 2 (1), 26–37.

Mavrinac, M., Brumini, G., Bilić-Zulle, L., & Petrovecki, M. (2010). Construction and validation of attitudes toward plagiarism questionnaire. Croatian Medical Journal, 51 (3), 195–201.

Maxwell, A. J., Curtis, G. J., & Vardanega, L. (2008). Does culture influence understanding and perceived seriousness of plagiarism? International Journal for Educational Integrity, 4 (2), 25–40.

McCabe, D. L. (2005). Cheating among college and university students: a North American perspective. The International Journal for Educational Integrity, 4 (1). doi: 10.21913/IJEI.v1i1.14

Michalec, M., & Welsh, T. S. (2007). Quantity and authorship of GIS articles in library and information science literature, 1990-2005. Science & Technology Libraries, 27 (3), 65–77.

Mu, C. (2010). "I only cited some of his words": the dilemma of EFL students and their perceptions of plagiarism in academic writing. Journal of Asia TEFL, 7 (4), 103–130.

Murtaza, G., Zafar, S., Bashir, I., & Hussain, I. (2013). Evaluation of student's perception and behavior towards plagiarism in Pakistani universities. Acta Bioethica, 19 (1), 125–130. doi: 10.4067/s1726-569x2013000100013 .

Oppenheim, A. N. (1966). Questionnaire design and attitude measurement (pp. 133–142). London: Heinemann.

Paltridge, B. (2002). Thesis and dissertation writing: an examination of published advice and actual practice. English for Specific Purposes, 21 (2), 125–143.

Paltridge, B., & Starfield, S. (2007). Thesis and dissertation writing in a second language: A handbook for supervisors : Routledge.

Park, C. (2003). In other (people's) words: plagiarism by university students--literature and lessons. Assessment & Evaluation in Higher Education, 28 (5), 471–488.

Pecorari, D., & Petrić, B. (2014). Plagiarism in second-language writing. Language Teaching, 47 (3), 269–302.

Pennycook, A. (1996). Borrowing others’ words: text, ownership, memory, and plagiarism. TESOL Quarterly, 30 (2), 201–230.

Pennycook, A. (2001). Critical applied linguistics: A critical introduction . Mahwah: Erlbaum.

Perry, B. (2010). Exploring academic misconduct: some insights into student behaviour. Active Learning in Higher Education, 11 (2), 97–108.

Phillips, M. R., & Horton, V. (2000). Cybercheating: has morality evaporated in business education? International Journal of Educational Management, 14 (4), 150–155.

Pickard, J. (2006). Staff and student attitudes to plagiarism at university college Northampton. Assessment and Evaluation in Higher Education, 31 (2), 215–232. doi: 10.1080/02602930500262528 .

Powell, L. (2012). Understanding plagiarism: developing a model of plagiarising behavior. Paper presented at the International Integrity & Plagiarism Conference, Newcastle Upon Tyne, United Kingdom. Retrieved from https://pdfs.semanticscholar.org/0903/10b04ade5540c672c5b0db66e868bd805644.pdf .

Pupovac, V., Bilic-Zulle, L., Mavrinac, M., & Petrovecki, M. (2010). Attitudes toward plagiarism among pharmacy and medical biochemistry students-cross-sectional survey study. Biochemia Medica, 20 , 307–3013.

Quah, C. H., Stewart, N., & Lee, J. W. C. (2012). Attitudes of business students’ toward plagiarism. Journal of Academic Ethics, 10 (3), 185–199.

Reingold, R., & Baratz, L. (2011). An institutional code of ethics--a response to attitude of Israeli Teachers' education college students towards academic plagiarism. US-China Education Review, 8 (5), 589–598.

Remler, D. K., & Pema, E. (2009). Why do institutions of higher education reward research while selling education? NBER Working Paper No. 14974 . Retrieved from http://www.nber.org/papers/w14974 .

Rettinger, D. A., & Kramer, Y. (2009). Situational and personal causes of student cheating. Research in Higher Education, 50 (3), 293–313.

Roberts, Tim S., ed. Student plagiarism in an online world : Problems and Solutions : Problems and Solutions. IGI Global, 2007.

Robinson-Zañartu, C., Peña, E. D., Cook-Morales, V., Peña, A. M., Afshani, R., & Nguyen, L. (2005). Academic crime and punishment: faculty members' perceptions of and responses to plagiarism. School Psychology Quarterly, 20 (3), 318–337. doi: 10.1521/scpq.2005.20.3.318 .

Ryan, G., Bonanno, H., Krass, I., Scouller, K., & Smith, L. (2009). Undergraduate and postgraduate pharmacy students' perceptions of plagiarism and academic honesty. American Journal of Pharmaceutical Education, 73 (6), 105.

Scollon, R. (1995). Plagiarism and ideology: identity in intercultural discourse. Language in Society, 24 (01), 1–28.

Shirazi, B., Jafarey, A. M., & Moazam, F. (2010). Plagiarism and the medical fraternity: a study of knowledge and attitudes. Journal of the Pakistan Medical Association, 60 (4), 269–273.

Silver, M. (2012). Voice and stance across disciplines in academic discourse. In K. Hyland & C. S. Guinda (Eds.), Stance and voice in written academic genres (pp. 202–217). New York: Palgrave Macmillan.

Chapter   Google Scholar  

Sisti, D. A. (2007). How do high school students justify internet plagiarism? Ethics & Behavior, 17 (3), 215–231.

Smith, M., Ghazali, N., & Fatimah Noor Minhad, S. (2007). Attitudes towards plagiarism among undergraduate accounting students: Malaysian evidence. Asian Review of Accounting, 15 (2), 122–146.

Song-Turner, H. (2008). Plagiarism: academic dishonesty or "blind spot" of multicultural education? Australian Universities' Review, 50 (2), 39–50.

Sowden, C. (2005). Plagiarism and the culture of multilingual students in higher education abroad. ELT Journal, 59 (3), 226–233.

Sterngold, A. (2004). Confronting plagiarism: how conventional teaching invites cyber-cheating. Change: The Magazine of Higher Learning, 36 (3), 16–21.

Sutherland-Smith, W. (2005). Pandora's box: academic perceptions of student plagiarism in writing. Journal of English for Academic Purposes, 4 (1), 83–95. doi: 10.1016/j.jeap.2004.07.007 .

Sutherland-Smith, W. (2008). Plagiarism, the Internet, and student learning: improving academic integrity : Routledge.

Swales, J. (1990). Genre analysis: English in academic and research settings . Cambridge: Cambridge University Press.

Wager, E. (2014). Defining and responding to plagiarism. Learned Publishing, 27 (1), 33–42.

Walker, J. (2010). Measuring plagiarism: researching what students do, not what they say they do. Studies in Higher Education, 35 (1), 41–59.

Weber-Wulff, D. (2014). False feathers: A perspective on academic plagiarism . Springer Science & Business.

Wheeler, G. (2009). Plagiarism in the Japanese universities: truly a cultural matter? Journal of Second Language Writing, 18 (1), 17–29.

Yang, S. C. (2012). Attitudes and behaviors related to academic dishonesty: a survey of Taiwanese graduate students. Ethics & Behavior, 22 (3), 218–237.

Yeo, S. (2007). First-year university science and engineering students’ understanding of plagiarism. High Education Research & Development, 26 (2), 199–216.

Zivcakova, L., & Wood, E. (2015). Examining instructional interventions: encouraging academic integrity through active learning approaches. In Exploring Learning & Teaching in Higher Education (pp. 191–205): Springer.

Download references

Author information

Authors and affiliations.

School of Languages, Literacies and Translation, Universiti Sains Malaysia, 11800, Penang, Malaysia

Fauzilah Md Husain, Ghayth Kamel Shaker Al-Shaibani & Omer Hassan Ali Mahfoodh

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Fauzilah Md Husain or Omer Hassan Ali Mahfoodh .

Ethics declarations

This work was supported by Universiti Sains Malaysia, Malaysia [Grant number 304/PBAHASA/6313161].

Rights and permissions

Reprints and permissions

About this article

Husain, F.M., Al-Shaibani, G.K.S. & Mahfoodh, O.H.A. Perceptions of and Attitudes toward Plagiarism and Factors Contributing to Plagiarism: a Review of Studies. J Acad Ethics 15 , 167–195 (2017). https://doi.org/10.1007/s10805-017-9274-1

Download citation

Published : 30 March 2017

Issue Date : June 2017

DOI : https://doi.org/10.1007/s10805-017-9274-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Academic misconduct
  • Perceptions
  • Higher education
  • Find a journal
  • Publish with us
  • Track your research

research article about plagiarism

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts
  • Submit a Manuscript

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Insight into modern-day plagiarism

The science of pseudo research.

Sharma, Hunny a,* ; Verma, Swati b

a Department of Public Health Dentistry, Triveni Institute of Dental Sciences, Hospital and Research Centre, Bilaspur, Chhattisgarh, India

b Department of Public Health Dentistry, Rungta College of Dental Sciences and Research, Bhilai, Chhattisgarh, India

Address for correspondence: Dr. Hunny Sharma, Department of Public Health Dentistry, Triveni Institute of Dental Sciences, Hospital and Research Centre, Raipur Road Near New High Court Building Village: Bodri Vidya Sthali, Bilaspur - 495 001, Chhattisgarh, India. E-mail: [email protected]

Received September 11, 2019

Received in revised form September 17, 2019

Accepted October 07, 2019

This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

In today's world, when there is a rapid surge of biomedical publications, maintaining research integrity of articles is of prime importance. It is expected that the submitted work is genuine of submitting authors’. Ease in the availability of these digitally published biomedical papers and pressure to publish for academic and professional advancement had resulted in numerous novice scientists and students falling into unethical practice of plagiarizing others’ work to get the job done quickly. Plagiarists are continuously in search of finding new and easy ways to plagiarize someone else's work, currently seen as different forms of plagiarism. Hence, this narrative review intends to help young and upcoming researchers to understand plagiarism, its type, the reason for plagiarists getting involved in that, and possible ways to detect and prevent it.

INTRODUCTION

In the biomedical sector, where conducting a research and publication in respectable indexed journals is the highest reward for scholarly and professional research scientists, the ease of access to these published researches via the internet has helped to develop and thrive plagiarized researches [ 1 ]. The higher number of publications and their credit points according to the Medical Council of India and Dental Council of India has resulted in considering these points as measures of researchers’ success in comparison to other researchers [ 2 ]. However, those who fail to publish their research remain unadvantaged in biomedical sector for getting opportunities in academic advancement.

Publishing of the research is considered the ultimate goal for a researcher, whereas many unpublished kinds of research struggle to thrive and become nonexistent to the scientific community [ 3 ]. Research integrity not only relies on appropriate methodology and conduction of the research but also relies on proper documentation, reporting, and publication of the research. Unethical methods used by some authors to alter these steps are called misconduct, and one such misconduct is plagiarism. Plagiarism not only floods the biomedical literature with false copy-pasted work but also compromises the validity and reliability of such literature [ 4 5 ].

A plagiarist not only copies words or short phrases for paraphrasing but also can go to an extreme extent of copying the whole work without giving the other author his/her due credit [ 6 7 8 ]. Today, with continuously evolving pseudoscience of plagiarism, it is the need of time to adopt a zero-tolerance policy toward plagiarism. No authors should be exempted from punishment and penalties, considering whether the misconduct of plagiarism was intentional or not.

With this narrative review, the authors intend to help young and upcoming researchers to understand plagiarism, its types, reason for plagiarists committing it, reliable detection methods and remedies to prevent it.

WHAT IS PLAGIARISM IN BIOMEDICAL PUBLICATIONS

The word plagiarism is derived from the Latin word “Plagium,” meaning manstealing or kidnapping. In terms of biomedical publication, the word plagiarism means stealing the work or the writings of another researcher and presenting as own. It can be both unintentional and intentional [ 9 ]. The World Association of Medical Editors states that the term plagiarism implies “appropriation of the language, ideas, or thoughts of another without crediting their true source, and representing them as one's original work [ 10 ].” However, the Committee on Publication Ethics (COPE) has defined plagiarism as “the unreferenced use of others’ published and unpublished ideas [ 11 ].” The work of an author can be said to be plagiarized when six or more consecutive words are copied or seven to eleven words are overlapping the set of thirty letters [ 9 ].

CLASSIFICATION OF PLAGIARISM

Although there are many forms of plagiarism that exists, on the basis of intent of author to plagiarize and extent of plagiarized material being used to fabricate the biomedical literature, the plagiarism can basically be classified as follows.

BASED ON INTENT OF AUTHOR TO PLAGIARIZE

Unintentional plagiarism.

Unintentional and improper paraphrasing or citation refers to as unintentional plagiarism. In such instances, the authors are truly unaware of the proper referencing style and citation principles to be followed when writing scholarly manuscripts for publication [ 12 13 14 ].

Intentional plagiarism

Deliberate copying of another authors’ writing or work without giving credit and presenting it as one's own original work is intentional plagiarism [ 12 15 ].

BASED ON THE EXTENT TO WHICH AUTHORS PLAGIARIZE

Direct plagiarism.

This type of plagiarism is done with a definite intention to plagiarize; here, the author copy-pastes word-to-word text of other authors’ writing to create his/her work without giving credit or using quotation marks [ 12 16 ].

Mosaic plagiarism (patchwork plagiarism)

This type of plagiarism is described as the borrowing of the phrases from an original work of another author without using quotation marks, or a simple replacement of other authors’ writing or words by synonyms, ultimately trying to keep actual language same and meaningful as found in original work [ 13 17 ].

THE MOST COMMON TYPE OF PLAGIARISM IN BIOMEDICAL PUBLICATIONS

Secondary source plagiarism.

This type of plagiarism occurs when a researcher uses a secondary source but purposefully cites only the primary once within the secondary, e.g., citation of primary sources from a conducted meta-analysis. This type of work, on one hand, fails to give appropriate credit to the work of the authors of a secondary source and, on the other hand, gives a false image of the amount of review that went into research [ 16 18 ].

Invalid source plagiarism

This type of plagiarism occurs when researchers quote or reference an inaccurate or incorrect source. This act of citing misleading and nonexisting sources is done to increase the list of references and hiding inadequate research [ 18 ].

Duplication or self-plagiarism

In this type of plagiarism, the authors use data or text or even the results from their own published studies or presented paper and publish it without properly citing it or purposefully avoiding it in order to show increased productivity [ 19 ].

Paraphrasing

This type of plagiarism is also known as intellectual theft as it involves using the published work of other researchers and changing the words or using synonyms, thus making it look like original research. Some writers purposely avoid quoting the real authors’ work in order to avoid getting caught stealing original idea or concept [ 16 18 ].

Repetitive research plagiarism

It is one of the types of self-plagiarism as it involves repeating or reusing of data or the entire text from a study with similar methodology and results without properly attributing or citing it. This type of plagiarism gives a false image of increased productivity [ 18 ].

Replication plagiarism

This is a serious misconduct in the author's part and is a direct violation of research ethics. Replication in simpler terms is the submission of a research paper to more than one journal, resulting in the publication of the same paper more than once. Such practice on the authors’ part can lead to immediate retraction of an article from the journals [ 18 20 ].

Verbatim plagiarism

It is also a type of intellectual theft as the author copy-pastes the work or writing of another author without properly crediting them. In biomedical publication, it can happen in two ways. The first is when the plagiarist cites the source of the original paper, but does not mention or indicate that it is a direct quote. In general, the quotes taken directly should be kept within the quotation marks. In the second type, plagiarists do not quote the source at all, thus devoid the original researcher from its deserved credit [ 18 21 22 ].

Translational plagiarism

This type of plagiarism occurs when a research manuscript is published by the original researcher in one language (e.g., English language) and then translated by the same or another author using Google Translate or other computerized translation methods to publish in some other languages [ 23 24 ].

Complete plagiarism or stealing

This is a type of extreme intellectual theft, in which the plagiarist takes research, an unpublished manuscript or work of another researcher and submits claiming his/her own [ 16 18 ].

WHAT ALL CAN BE PLAGIARIZED IN BIOMEDICAL LITERATURE

In today's digital world of internet, plagiarism had crept to extreme extents. Today anything can be plagiarized. Plagiarists show their talent from copying basic things such as someone's research title, ideas, concepts, hypothesis to extreme copying of text, methodology, data, tabulations, graphs, and even figures. In some instances, plagiarists had been caught copying even images and graphic arts from the internet without crediting them [ 25 ].

WHY DOES PLAGIARISM OCCUR IN BIOMEDICAL LITERATURE?

Instances of plagiarism are widespread in the internet era because of poor language skills and easy access to biomedical literature through open access movements which can be easily copy-pasted. Inexperienced researcher and students are under pressure to “Publish or Perish”, indulge themselves in corrupt practice of plagiarism due to lack of knowledge about ethics in the publication of biomedical research. Plagiarism may also result due to ignorance of the fact or shear unawareness that plagiarism detection softwares are readily available, and journal editors can detect copy-pasting. Previous publication of the manuscript in unreviewed predatory journals may also give overconfidence to the inexperienced researcher that no one is going to check. While some novice researchers and students are involving themselves unwittingly in unethical plagiarism activities due to shear insufficient knowledge and awareness in biomedical research ethics and morality [ 25 ].

CURRENT SCENARIO OF PLAGIARISM IN SCIENTIFIC LITERATURES

It is extremely important to understand that what is the current scenario of plagiarism throughout the world, so that each author and researcher can realize that how the corrupt practice of plagiarism is destroying the biomedical science.

A survey conducted by Nogueira et al. , in 2017, reported that out of 72 retracted articles from 44 journals, plagiarism was the main reason for retraction in 13 articles, i.e., 18.1% of the total articles. However, overlap of significant information was found in nine articles, i.e., 13.6% [ 26 ].

Analysis of Malaysian retracted papers by Aspura et al. in 2018 revealed that their analysis identified 125 retractions between 2009 and June 2017, of which 33 retractions were with clearly defined reason. Out of these 33 retractions, 12 (9.6%) were retracted due to duplicate publication, whereas plagiarism and self-plagiarism are the main reason accounting for 6 (4.8%) and 4 (3.2%), respectively [ 27 ].

Another descriptive study conducted by Campos-Varela and Ruano-Raviña in 2017 revealed that their study found 1082 retracted publications indexed in PubMed between January 1, 2013, and December 31, 2016. Analysis of the study data showed the ugly side of scientific misconduct, with plagiarism being the main reason for retraction in 354, i.e., 32.7% of the retractions [ 28 ].

The News from Indian continent is also not so encouraging and shows the hideous side of plagiarism in biomedical literature. In a viewpoint published by Misra et al. in 2017, the author reports that they identified 46 retractions from India between January 1, 2010, and July 4, 2017, in the MEDLINE database. The most prevalent reason for among these article retractions were duplication of text, figures, or tables in 41.3% of articles, whereas duplicate publication lead to retraction of 15.2% of articles [ 29 ].

An excellent example of internet misuse was reported by Eysenbach, in case report of cyber-plagiarism, which took place in Journal of the Royal College of Surgeons of Edinburgh. Here, the plagiarism report generated by the software tool revealed that more than one-third (36%) of the suspected article consisted of phrases that were copied directly from multiple websites without giving credit to the website writer. The extent of plagiarism was to such an extent that the guilty author even copied subjective opinion expressed as statement along with general sentences from this website [ 30 ].

It is therefore obvious from the above-mentioned incidences that web is also a source of inspiration for many young researchers as a cut-copy-paste for many plagiarized texts. Incidence of plagiarism is not limited to any particular country or biomedical field, but occurs in almost all the academic fields.

CAN PLAGIARISM BE AVOIDED?

Education and training regarding responsible submission of research.

A study conducted by Landau in 2002 reported that plagiarism results from students’ inadequate knowledge of proper citation techniques. Proper education and training about plagiarism identification and appropriate way of paraphrasing skills led to better detection of plagiarism. Ironically, when students we were taught to identify plagiarized text and paraphrasing, they were less likely to get involved in plagiarizing text [ 31 ].

Even procedural plagiarism training program conducted by Newton in 2014 reported that students of the intervention group also performed better as compared to the control group in reference to patchwriting and paraphrasing exercise [ 32 ].

USE OF PLAGIARISM DETECTION SOFTWARE

One of the most suited methods of detecting plagiarism in academic papers and manuscripts is utilizing plagiarism detection software. These softwares can not only be used by the editors of the journals in initial screening to assess the extent of similarity and early rejection of plagiarized manuscripts, but also prevent such manuscripts from entering the formal peer-review process. These softwares on the other hand can also benefit authors by assessing their manuscripts for possible plagiarism, so that their manuscripts are not rejected by the journals. Some of the commonly used softwares for the detection of plagiarism are iThenticate, Plagiarism checker X, eTBlast, Turnitin, CitePlag, Plagium, Plagiarism, and Plagiarism Detect [ 33 ].

PUNISHMENT AND PENALTIES FOR PLAGIARISTS WHEN FOUND GUILTY

Copyright in Indian scenario is for the lifetime of its creator, i.e., from the day of origination of the respective material to 60 years after original creators’ death. Although copyright has no distinctive role in plagiarism, it automatically sets in and comes in to action, as soon as then matter is written or published. Copyright protection is conferred on the type of works and originally means that the work has not been copied from any other source and is original [ 34 ].

As per Section 17 of the Indian copyright act, “the author or creator of the work is the first owner of the copyright.” However, a particular section of the copyright act (i.e., section 57) also known as “Moral Rights” or “authors’ special rights” can be used to deal with the plagiarism. This section basically defines two moral rights of the author, i.e., right of paternity and right of integrity [ 34 ].

The right of paternity means that there is a right of an author to claim authorship of work and has a right to prevent all others from claiming authorship of his/her work. However, the heart of the section is the “Right of integrity,” which empowers the author to prevent distortion, mutilation, or other alterations of his/her work or any other action in relation to said work, which would be prejudicial to his/her honor or reputation. Hence, under these two sections of copyright law, the author can claim punishment for the copyright infringer or may claim authorship in the given plagiarized work [ 34 ].

It is seen that plagiarist writer is usually involved in verbatim plagiarism to create his/her work, and they use source texts or quotes without proper citation and quotation marks. This makes them fall under copyright infringement laws. Therefore, under these two sections of copyright law, an author may claim punishment for the infringer of copyright or claim authorship in the plagiarized work in question.

As per Part III (Section 4) of University Grants Commission (Promotion of Academic Integrity and Prevention of Plagiarism in Higher Educational Institutions [HEIs]) Regulations, 2018, every HEI should establish an Institutional Academic Integrity Panel. As per the rules, when someone is found guilty, he/she should be imposed with the penalty considering the severity of the plagiarism. These rules consist of total four levels ranging from zero to three, and penalties under each level are as follows: Level 0 (minor similarities) deals with similarities up to 10% and has no penalty and Level 1 deals with similarities above 10%-40% - such student shall be asked to submit a revised script within a stipulated period of not exceeding 6 months. Similarities above 40%-60% will be dealt under Level 2, and under this level, student shall be debarred from submitting a revised script for 1 year, whereas for those students involved in plagiarism with similarities ranging 60% and above will be kept under Level 3, and registration of such students for the program shall be canceled [ 35 36 ].

In case, self-plagiarism is suspected in a submitted manuscript, journal editors can follow COPE guidelines to overcome the dilemma of when to propose revision and when to reject a submitted manuscript. According to the guidelines, when the self-plagiarism is suspected, and the author had cited the previous publication, the editors or the reviewers should propose for revision of manuscript with the plagiarized part being corrected. However, to some extent, the overlap in the methodology section can be tolerated, but still, the final decision to allow or not to allow depends on the editor. Nevertheless, in the event that the previous work has not been cited, the submitting author should be notified in such situations, and the manuscript should be requested with the original article cited for major revision. No considerations should be made to propose the revision of the manuscript when a significant portion of the self-plagiarized text is found, or the plagiarized manuscript contains already published data and methodology [ 37 ].

In case of authors involved in obvious violation of copyright transfer and publication of plagiarized material, the plagiarist should be punished by journals and publishing companies by imposing penalties ranging from suspensions, retraction of the published article to blacklisting of the author [ 38 ].

It is well known that many journals are predatory or are non-English. Therefore, the level of plagiarism that we see may be the iceberg's tip. Many plagiarists may use content of the published and translated articles to fabricate their own work without carrying out their own research. Despite of all these incidences and much of awareness regarding plagiarism among the institution review board members and journal editors, still much confusion exists that who, when, and on which conditions can be declared plagiarist. Educational institutions, government, and policymakers should commit to zero-tolerance policy on plagiarism and should come up with standardizing and strict guidelines to who, when, and on what basis are considered to be involved in plagiarism. In addition, it is necessary to develop several other plagiarism detection methods for early detection of plagiarism and ways of dealing with it. Penalties and punishment should be listed out based on the severity of plagiarism and who will be authorized to sanction them. A forum should be set up at the national and international level to show names of the authors involved in plagiarism if proven guilty, making it difficult for them to write other publications for a certain period of time in the event of less serious plagiarism. Last but not least, educational attempts should be made at the grassroot level to promote research integrity and ethics in upcoming researchers and those who are already established. Genuine researchers with good intention for the upliftment of biomedical science will provide a huge leap toward scientific evolution and thus promoting improvement in the quality of biomedical literature.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

  • Cited Here |
  • PubMed | CrossRef |
  • Google Scholar

Bioethics ; Editorial policies ; Medical writing ; Plagiarism ; Scientific misconduct

  • + Favorites
  • View in Gallery

Readers Of this Article Also Read

7</sub>r signaling', 'suresh pavithra; phasuk, sarayut; liu, ingrid y.', 'tzu chi medical journal', 'jan-mar 2021', '33', '1' , 'p 1-6');" onmouseout="javascript:tooltip_mouseout()" class="ejp-uc__article-title-link">modulation of microglia activation and alzheimer's disease: cx3 chemokine..., pelvic myoma arising from vaginal cuff after hysterectomy: a case report and..., the impact of artificial intelligence on human society and bioethics, mirtazapine-associated movement disorders: a literature review, diagnosis and interventional pain management options for sacroiliac joint pain.

  • Original article
  • Open access
  • Published: 07 January 2021

Plagiarism in articles published in journals indexed in the Scientific Periodicals Electronic Library (SPELL): a comparative analysis between 2013 and 2018

  • Marcelo Krokoscz   ORCID: orcid.org/0000-0002-6869-864X 1  

International Journal for Educational Integrity volume  17 , Article number:  1 ( 2021 ) Cite this article

32k Accesses

6 Citations

7 Altmetric

Metrics details

This study analyzes the possible occurrence of plagiarism and self-plagiarism in a sample of articles published in the Scientific Periodicals Electronic Library (SPELL), an open database that indexes business journals in Brazil. The author compared one sample obtained in 2013 ( n  = 47 articles) and another selected from 2018 ( n  = 118 articles). In both samples, we verified the guidelines that each of the journals provided to authors regarding plagiarism and the adoption of software to detect textual similarities. In the analysis conducted in 2013, it was found that only one journal (2%) mentioned the word “plagiarism” in its policies, although the majority of the directives required guarantees that no type of violation of authors’ rights was contained in the manuscript. In the analysis conducted in 2013, it was determined that there were literal reproductions in 31 published articles (65.9%), and no relevant similarities with other publications were encountered in 16 articles (34.1%). In the 2018 analysis, 69 of the publications (58%) included observations and guidelines related to plagiarism and self-plagiarism. In the analysis conducted in 2018, it was found that similarities (plagiarism and self-plagiarism) occurred in 52 articles (44%), and no relevant evidence of plagiarism or self-plagiarism was found in 66 (56%) manuscripts. Although a reduction in the index of the occurrence of plagiarism was observed, as was an increase in the instructions on the prevention of plagiarism by authors, practices directed at guiding authors by means of directives concerning the importance of preventing plagiarism in manuscripts submitted for publication can be recommended.

Introduction

It has been reported in the literature that studies marred by a lack of scientific integrity due to scientific misconduct such as plagiarism or redundant publication (self-plagiarism) and works containing gift or ghost authorship are a recurring problem, which has intensified as of late (Amos 2014 ; Associação Nacional de Pesquisa e Pós-Graduação e Pesquisa em Administração (ANPAD) 2017 ; Committee On Publication Ethics (COPE) 2011 ; Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) 2011 ; Council of Science Editors (CSE) 2018 ; Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) 2011 ; Koocher and Keith-Spiegel 2010 ; Van Nordeen 2011 ).

In January, 2011, the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Coordination for the Improvement of Higher Education Graduates - CAPES) Footnote 1 recommended that all Brazilian institutions of higher education create “policies of awareness and information concerning intellectual property, adopting specific procedures seeking to limit the practice of plagiarism in the preparation of theses, monographs, articles and other texts on the part of students and other members of their communities” (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior 2011 ). In the same year, the main of research support agencies in Brazil presented policies aimed at restraining the occurrence of fraud and misconduct in scientific publications, citing the fabrication or invention of data, the falsification of results, and authorship fraud (plagiarism) among the types of fraud and misconduct (Conselho Nacional de Desenvolvimento Científico Tecnológico 2011 ; Fundação de Amparo à Pesquisa do Estado de São Paulo 2011 ).

These measures were aligned with those which institutions of higher education around the world were practicing and were in conformity with the codes of research integrity of international organizations, such as the following: the U.S. Department of Health and Human Services ( 2005 ), the Australian government ( 2007 ), and the Research Councils UK ( 2017 ). International entities, including CSE beginning in 1957 and COPE since 1997, have given support to science editors with the goal of creating and implementing a culture of ethics and good practices in scientific research activities.

In Brazil, the Associação Nacional de Pesquisa e Pós-Graduação e Pesquisa em Administração - ANPAD (National Association of Research and Graduate Studies and Research in Administration) had its manual “Boas Práticas da Publicação Científica: um manual para autores, revisores, editores e integrantes de corpos editoriais” (Good Practices in Scientific Publishing: a manual for authors, reviewers, editors and members of editorial committees) approved during the II Fórum de Editores Científicos de Administração e Contabilidade (II Forum of Scientific Editors in Administration and Accounting), held in 2010. In addition, in 2011, the Associação Brasileira dos Editores Científicos - ABEC (Brazilian Association of Scientific Editors - ABEC) held the Encontro Nacional de Editores Científicos (National Meeting of Scientific Editors), with the theme “Integrity and Ethics in Scientific Publishing”. Among its objectives, the association sought “to develop and refine the publication of technical-scientific periodicals and refine the communication and dissemination of information”. In February 2015, ABEC signed an agreement with iThenticate®, a software for detecting plagiarism in articles submitted to periodicals for publication, enabling the employment of this tool by its members. In 2017, ABEC, in partnership with CSE, published the “Diretrizes do CSE para Promover Integridade em Publicações em Periódicos Científicos” (Policies of the CSE for Promoting Integrity in Scientific Journals) in Portuguese.

All these organizations agree that misconduct in scientific research manifests itself fundamentally via three practices condemned by researchers: fabricating research data; falsifying results; and authorship fraud, that is, the undue appropriation of another author’s content without the due attribution of credit. Furthermore, condemnable practices such as redundancy in publications (self-plagiarism) are considered in the same category as the sloppy handling of research subjects or piracy.

Focusing more closely on the object of this study, plagiarism can be defined as “signing or otherwise presenting oneself as the author of an artistic or scholarly work belonging to another person. To imitate someone else’s work” (Ferreira 1986 , p. 249). According to Brazil’s law concerning the rights of an author (Law, 9610/98), the practice, which is considered forgery, is characterized as the unauthorized reproduction of a work, meriting the penalties outlined in Article 184 of the Penal Code. However, in the Brazilian academic environment, the problem is understood to be academic misconduct or dishonest intellectual practice, which can manifest itself through self-plagiarism or the purchase of academic works produced by others. These modalities of the occurrence of plagiarism are extrapolated from the juridical notion related to plagiarism by not including the characteristic of using someone else’s work in an incorrect manner. Self-plagiarism, for instance, is not addressed by the law because it is a situation in which authors themselves reuse their own works; i.e., there is no offense in relation to others’ rights. Therefore, it falls beyond legal issues and is essentially considered essentially an ethical problem since a redundant publication (self-plagiarism) “disrupts scientific publishing by over-emphasizing results, increasing journal publication costs, and artificially inflating journal impact, among other consequences” (Eaton and Crossman 2018 ).

Table  1 presents the most common types of plagiarism in the international academic context according to the literature and the practices in some teaching institutions. It is interesting that types 1 and 3 describe some forms of plagiarism that can be considered misappropriation a legislative standpoint. However, types 4 and 7 are kinds of plagiarism that do not harm authorship rights but are considered scientific misconduct and, consequently, ethically unacceptable practices.

Despite the increasing interest in academic plagiarism on the part of institutions involved in teaching and research, the subject can still be considered to have arisen relatively recent in Brazil, and little original work on the topic has been produced; however, it is currently being increasingly studied in the academic community (Demo 2011 ; Krokoscz 2011 ; 2012a , b ). For example, in a search for the keywords “plagiarism” and “plagio” Footnote 2 in the SPELL platform, among 48 thousand documents, only two publications on the topic were found: Veludo-de-Oliveira et al. ( 2014 ) and Costa et al. ( 2017 ). Nevertheless, beyond these, through other platforms, Brazilian discussions related to business plagiarism can also be found in Andrade ( 2011 ), Barbastefano and Souza ( 2007 ), Barros and Duque ( 2015 ); Fachini and Domingues ( 2008 ), Innarelli ( 2011 ), Valente et al. ( 2010 ), Neumann ( 2018 ), Silva and Domingues ( 2008 ), and Tomazelli ( 2011 ).

In summary, although these studies contribute to deepening the subject, have been only incipient discussions over the last 8 years. Nevertheless, in an article published in the Revista da Associação dos Docentes da USP (Journal of the Association of Professors of the University of São Paulo), researchers Luiz Henrique Lopes dos Santos and Erney Plessmann de Camargo, faculty members at the University of São Paulo University (USP), recognized that the concerns regarding plagiarism are becoming increasingly important and that knowledge about the subject is scant. Luiz Menna-Barreto, another researcher that was interviewed, considered that the climate concerning “productivism” (measurable professorial productivity), which has characterized the academic scenario in recent years, could be a factor related to this (Biondi 2011 ). In addition, an article published in Nature showed that, among researchers, plagiarism was third among the practices of academic dishonesty in the judgment of peer reviewers (Koocher and Keith-Spiegel 2010 ). Indeed, the problem has attained international importance and has been verified as one of the reasons for the increase in retracted articles (Van Nordeen 2011 , p. 27). This study revealed that cases of retractions occurring among the articles published in the Web of Science , as well as in PubMed , 44% correspond to problems of scientific misconduct, including plagiarism and self-plagiarism; and the other 56% were problems associated with research errors and nonreproducible results, among other problems. Carver et al. ( 2011 ) also emphasize that plagiarism has significantly contributed to the increase in the number of retractions; and for Masic ( 2014 , p. 145), “the biggest reason for retractions in the last thirty years is plagiarism and self-plagiarism.”

According to the website Retraction Watch, launched in 2010 with the aim of monitoring the indices of the occurrence and motives of the retraction of scientific articles in publications, in the field of life sciences, in 2013, there were 203 retractions related to plagiarism involving text, image, data or articles. In 2018, the database of the website catalogued 182 retractions for the same reasons (Retraction Watch 2019 ).

Another study found further evidence of the occurrence of plagiarism in scientific publications in the field of biomedicine found in PubMed for the period from 2008 to 2012. The study found that 35% of the retractions could be attributed to plagiarism or self-plagiarism in the sample studied. In addition, the study identified the 20 countries with the greatest numbers of works retracted as a result of plagiarism and self-plagiarism. Brazil was included among them, with 44,4% of the articles by its authors being retracted due to the same motives (Amos 2014 ).

Although the proportion, in percentages, of works retracted is low, it must be remembered that there is no standard minimum acceptable index for such practices in the academic world.

In addition, it is still unclear whether the numbers of retractions that have been verified are related to an increase in the frequency of plagiarism-related practices in recent years or result from increasing the identification of such instances because of the rigor in editing and whistle-blowing processes, internet visibility and the use of technological resources such as software that detects textual similarities.

Considering this scenario, the main objective of this study was to analyze the possible occurrence of plagiarism and self-plagiarism in a nonrandom sample of articles published in learned journals in the field of administration indexed in the Scientific Periodicals Electronic Library (SPELL) information database, a repository of scholarly studies that offers free access to technical and scientific information in the area of business ( www.spell.org.br ). In addition, the study sought to compare the results obtained with those reported in a similar study in 2013 and to analyze the guidelines that each of the journals composing the sample provided to authors regarding plagiarism.

The study is justified as a consequence of the increasing attention given to the problem of plagiarism by important Brazilian institutions concerned with research, such as the Coordination for the Improvement of Higher Education Graduates (CAPES), the National Council of Scientific and Technological Development (CNPq) and the Foundation for the Support to Research of the State of São Paulo (FAPESP), requiring that this issue be addressed.

The positions held by these institutions regarding the need to disseminate guidelines and take action to address plagiarism and other types of scientific misconduct was first put forth in 2011 when CAPES issued a document containing recommendations for all public and private universities in Brazil to adopt procedures to address academic plagiarism (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior 2011 ). It is important to note that the initiative taken by CAPES occurred due to a request by the Brazilian Bar Association (OAB) in the state of Ceará that recommended, inter alia, that all institutions of higher education in Brazil should “use software to search for similarity in the Internet [ … ], adopt policies of awareness and information about intellectual property, aiming at suppressing plagiarism in the academic community” (OAB 2010 ). Footnote 3 Since then, some measures have been implemented to address plagiarism. For instance, since 2013, FAPESP, one of the major public agencies financing research in the state of São Paulo, has kept a “shame page” on its institutional website on which it publishes a list of researchers and projects having revealed scientific misconduct (Fundação de Amparo à Pesquisa do Estado de São Paulo 2014 ). In 2017, FAPESP started to refuse projects from research institutions that did not have an office of academic integrity (Alves 2017 ).

In addition, despite repercussions from reports of recent cases of plagiarism by Brazilian researchers uncovered in learned journals in the national media, the reduced number of studies conducted and submitted for publication by Brazilian authors has been concentrated on higher education. However, it is known that some of the major obstacles related to the rejection of scholarly articles submitted for publication are the problems of a methodological nature or may be related to a lack of theoretical depth or difficulties in referencing (i.e., the correct identification of the sources consulted), among other issues (Job et al. 2009 ).

Nevertheless, it is important to learn which measures related to the verification and prevention of plagiarism have been adopted by scientific journal editors in relation to the articles submitted for publication. Likewise, there are no diagnostic evaluations that can provide evidence of the extent to which the submissions of researchers do or do not possess plagiarized sections. Obviously, the scope of this study excludes “exposing” authors or learned journals. It seeks to contribute to the identification and discussion of the question insufficiently addressed in the Brazilian scientific literature. Consequently, it is hoped that the findings of this investigation will contribute to improving the procedures for elaborating and submitting research reports for publication.

Methodology

The articles analyzed in the study were obtained from the SPELL database, a repository of scholarly articles in the field of business. The main reason to choose this database for the analysis is its free access to full-text technical and scientific information.

In 2013, using the bibliographic search for published articles cited in the SPELL database, 546 articles published in 47 different journals were identified. After 5 years, a new survey of articles published from 08/2013 to 08/2018 was performed. In this period, 121 journals were identified, and three of them were disregarded because they were no longer published (Desafio: Revista de Economia e Administração (published until 2010 and then continued as Desafio Online) (ISSN 1678–1821); RAC-Eletrônica (ISSN 1981–5700), published until January 2009; and Revista de Estudos de Administração – Rea (ISSN 1518–3645), published until December 2009), resulting in the identification of 28,259 published articles.

A random sample corresponding to one article from each journal in both periods was selected. This was done by means of attributing an identification number (ID) to each article in the database. The ID of the first article and that of the last one published were verified, and a number was drawn using the website www.random.org . After the number was drawn, the selected article was downloaded and input to the plagiarism detection software iThenticate®. All the articles selected and input to the plagiarism detector were then classified in a control spreadsheet, consisting of the following information: the Qualis/Capes identifier, article title, DOI or permanent link, authors, and publication date.

The articles drawn were input to iThenticate® software in the two phases of the research. The software operates by creating a search for similarities between the submitted text and texts that have been published on the internet, including in publications with restricted access, such as in the case of publishers (Elsevier, Springer, Nature, Taylor & Francis, and Wiley-Blackwell), indexers, and databases (EBSCOHost; Emerald Journals; Proquest; Pub-Med/Medline, and Cengage Learning), among other scholarly journals, and its own software database, thus consolidating a repertory for comparison with some 142 million documents (IThenticate® 2019 ).

Findings and discussion

Initially, the analysis was conducted using the policies and instructions for authors and/or submission manuals provided by the journals to authors interested in submitting their work for publication. The intention was to verify the existence or lack of guidelines related to plagiarism or self-plagiarism in publications seeking to clarify these issues for authors beforehand. This guidance is part of the flowchart concerning what to do in cases of the suspicion of plagiarism and redundancy in scholarly manuscripts that can be found in the document elaborated by the Committee On Publication Ethics (COPE) and that is aimed at editors of scholarly journals. The text notes that “the instructions to authors should include a definition of plagiarism and state the journal’s policy on it” (Committee On Publication Ethics 2016 ; 2018 ).

In the analysis conducted in 2013 in which data were analyzed but not published, it was found that only one journal (2%) among the 47 analyzed journals mentioned the word “plagiarism” in its policies, although the majority of the directives required guarantees on the part of authors that no type of violation of authors’ rights were contained in the submitted work. However, we also observed that one of the publications studied cited a directive related to redundancy (self-plagiarism) in its submission guidelines, although it utilized a different term to refer to the subject: overlapping of publication (Ebape 2014 ).

According to Eaton and Crossman ( 2018 ), self-plagiarism is a sub-category of plagiarism and is considered to be complex and polemical. The study and debate of self-plagiarism have received growing interest from editors with the objective to establish clear and specific guidelines about the issue to authors during the process of submitting scientific work in social science areas. One of the topics that has demanded attention is defining the percentage of a previously written text that an author can reuse, considering that some parts of the work, such as the description of the methods, do not usually vary substantially, which justifies their reproduction. Several authors have considered that up to 30% of a previous text could be reused, but this does not serve as a fixed rule since it depends on the area of study and the guidelines of each periodical (Bird and Sivilotti 2008 ; Roig 2015 ; Samuelson 1994 ).

Usually, the publication of two articles with considerable overlap is not acceptable, even if they are published in different academic periodicals. Various publications that have a unique data collection should only be permitted under the following guidelines: (a) if it is impossible to write a single article within the maximum number of 30 pages, and (b) if the articles present distinct approaches and purposes. The editor should be advised of a submission when the article has, in some form, already been published online.

Periodicals were also found that established directives in relation to the originality of the work, whether in Brazil or abroad, clarifying that they considered work that had been presented in preliminary versions in scholarly events acceptable for publication. Some journals encouraged and authorized authors to publish and disseminate their work in online vehicles such as institutional repositories or on personal pages, considering that this could have a positive effect on the visibility and increased probability of the work being cited. For example, “Authors have permission and are encouraged to publish and disseminate their work online (e.g. in institutional repositories or on their own personal pages) at any time before or during the editorial process, since this could generate productive alterations, as well as increase the impact and the citing of the published work [ …]” (Revista de Gestão, Finanças e Contabilidade 2014 ).

In relation to what was learned about plagiarism and self-plagiarism in the analyses conducted in 2013, it was determined that there was word-for-word plagiarism (copying verbatim from a source without any acknowledgement) in 31 published articles (65.9%), and no relevant similarities with other publications were encountered in 16 articles (34.1%).

Table  2 presents the list of the periodicals analyzed with the numbers of articles that were published by the time the similarity analysis was conducted. In this stage of the investigation, we only identified whether there were instances of plagiarism and self-plagiarism.

The column “Qualis” refers to a scale established by the Brazilian Ministry of Education that is used to classify the level of qualification of periodicals that publish scientific work in postgraduate programs in Brazil. During the time of this study, the evaluation strata adopted by this program varied from the highest quality, A1, to A2, B1, B2, B3, B4, B5, and C (zero). (BRASIL 2016 ). As can be seen in the data in Table  2 , it is possible to notice that there are occurrences of plagiarism/self-plagiarism in both more qualified (A2) and less qualified periodicals (B5).

The types of plagiarism mostly found were those copying the sentences of a source or paragraphs of other sources verbatim without the use of quotation marks or indenting the text and lacking any indication of the original document or source. Furthermore, we discovered cases of self-plagiarism (redundancy), that is, works by the same author that had already been published in other periodicals or event annals.

The present work did not analyze the extension of occurrences of self-plagiarism. The observations conducted identified the following: the copy of entire articles the same authors had previously presented in scientific events and published in conference proceedings, and parts of texts published in other studies and reused without proper citation.

The software did not allow us to identify the occurrence of indirect plagiarism (paraphrasing; i.e., when the original source is rewritten but no source is credited through an indirect quote (indication of authorship within the text), and no reference given to the source in the form of detailed identification at the end of the work. The use of a reference to the source and quoting the author are two essential conditions for avoiding the inappropriate use of a reproduced source.

In the 2018 analyses, the website of each of the 118 journals selected for this research and indexed in the database was visited. Initially, we identified the existence of directions or guidelines related to ethics or good research practices on the principal page. Then, a second step was searching for information connected to these topics in the section “about the journal.” In these sections, we searched for “plágio or plagiarism.” If this information was not encountered on these pages, analysis of the sections containing information, directives or instructions to authors followed.

It was found that on the websites of the 118 periodicals analyzed, 69 of them (58%) have on some page or document observations and instructions related to plagiarism and self-plagiarism, which corresponds to a significant increase in relation to what was observed in the study conducted in 2013. However, it was ascertained that some journals, such as Revista de Gestão – REGE (ISSN 2177–8736), recommended that authors follow the directives of scientific integrity such as those established by COPE, though no description of those directives concerning plagiarism was offered. Other journals, such as the International Journal of Professional Business Review (e-ISSN: 2525–3654), opted for a single page concerning good conduct or policies regarding ethics in research, clearly stating the following: “Originality and Plagiarism: The authors should insure that they have written entirely original works, and if the authors have used the work and/or words of others that this has been appropriately cited or quoted. Plagiarism in all its forms constitutes unethical publishing behavior and is unacceptable.” Still, other journals, such as the Revista de Administração IMED – RAIMED (ISSN 2237–7956), Revista de Ciências da Administração – RCA (ISSN 1516–3865) and the Revista Pensamento Contemporâneo em Administração (ISSN 1982–2596), provided a link to the document “Boas Práticas da Publicação Científica: um manual para autores, revisores, editores e integrantes de corpos editoriais” (Good Practices of Scientific Publishing: a handbook for authors, reviewers, editors and members of editorial councils) on their websites (Associação Nacional de Pesquisa e Pós-Graduação e Pesquisa em Administração (ANPAD) 2017 ).

To clarify the interpretation of the reports of the software used, it is important that sections of text with similarities are highlighted in color. Here, each color corresponds to a different source, and there is a superscript number in each section that permits direct access to the source with similar text. This in turn allows more precise analysis, such as the examination of whether the text comes the same author, if it was published before or after the manuscript under examination, the type of document, and other information.

From this type of analysis, by including additional documents, it is possible to affirm the occurrence of plagiarism or self-plagiarism. It is for this reason that the software detection service is offered as a verifier of similarities and not of plagiarism because not every similarity corresponds to an author’s fraud. The following are three examples extracted from similarity reports generated by the iThenticate® software. The examples were classified in three categories: low, medium and high incidences of plagiarism. The parameter used for each category represents the portion of paragraphs copied in relation to the manuscript.

Although there are no defined guidelines establishing the level of the seriousness of plagiarism regarding the amount reproduced, in the guidelines provided by Committee On Publication Ethics ( 2018 ) about “What to do if you suspect plagiarism”, it is recommended that one consider reporting it in the following cases: “a) Unattributed use of large portions of text and/or data; b) Minor copying of short phrases only (e.g. in discussion of research paper from non-native language speaker). No misattribution of data.” When large portions of text are identified, COPE recommends that editors contact the corresponding author and document the evidence of plagiarism. In the case of a satisfactory reply addressing an honest error, unclear journal instructions or a very junior researcher, the editor can reject the manuscript or ask for a revision in the hope of obtaining improvements. Conversely, if the author’s explanation is unsatisfactory, the manuscript must be rejected without the option of requesting a revision.

The first case (Fig.  1 ) was considered of “low incidence” because the similarities without attribution of credit appear only sporadically in some passages of the manuscript.

figure 1

Low incidence/reviewed with QUALIS A2/2013. Source: iThenticate®

Figure  2 presents a case of “medium incidence” because the text reveals sections reproduced inadequately in different parts of the manuscript, but only on some pages of the entire manuscript.

figure 2

Medium incidence/reviewed with QUALIS B2/2018 . Source: iThenticate®

The third example (Fig.  3 ) was considered a case of “high incidence” because it is possible to observe textual reproductions without the attribution of credit in different paragraphs on various pages, as well as differences in the provenance of the original sources copied (different colors).

figure 3

High incidence/reviewed with QUALIS B1/2018. Source: iThenticate®

A repeated observation refers to the quantity of identical terms in the same sequence of a sentence, which could indicate plagiarism. It is important to mention that the identification of patterns of similarity by software may not indicate plagiarism if the reproduced texts were correctly quoted and referenced. Therefore, it is not possible to categorically affirm that there is a predetermined amount of identical words between texts that determines plagiarism since this conclusion depends on analysis.

Some authors support the criterion of beginning a sequence with seven identical words as a parameter for judging the sequence as a verbatim copy (Saraiva and Carrieri 2009 ). This principle was adopted considering that “the chances of a human creating a sentence identical to another already created diminishes exponentially in relation to the number of words the sentence contains. Footnote 4 ” The authors demonstrated this evidence by conducting the following experiment: they used the sentence between quotation marks to search for similarity on Google ( www.google.com.br ) with the equivalent terms in Portuguese. The results found are presented in Table  3 .

This experiment makes sense from the perspective of “the ‘uniqueness of utterance principle’, supported in linguistics, which states that when we produce a text (spoken or written) we make lexico-grammatical choices that create a sequence which is not repeated identically in other situations.” (Abreu 2016 , p. 5). Also, Wager ( 2014 ) have summarized some ideas regarding the extent of copy and attribution of plagiarism:

The most blatant forms of plagiarism involve the copying of entire papers or chapters which are republished as the work of the plagiarist. Such cases usually involve not only plagiarism but also breach of copyright. Whole articles or chapters may also be plagiarized in translation." (Wager 2014 , p. 35) Nevertheless, these criteria cannot be considered inflexible because, first, it is acceptable to literally reproduce any quantity of text as long as the source is cited; and, second, in the specific case of plagiarism called “apt phrase,” even fewer than six words can characterize plagiarism (Wager, 2014 ).

Nevertheless, these criteria cannot be considered inflexible because, first, it is acceptable to literally reproduce any quantity of text as long as the source is cited; and, second, in the specific case of plagiarism called “apt phrase,” even two words can characterize plagiarism. That is the case of expressions created by authors to designate specific theoretical discoveries or statements in their area of research, such as the following: “I think, therefore I exist” (René Descartes), “somatic marker” (Antonio Damásio), and “knowledge conversion” (Ikugiro Nonaka & Hirotaka Takeuchi). However, according to Committee On Publication Ethics ( 2009 ), rather than a retraction, in the case of small plagiarized parts of a text, the editor may consider, with respect to the readers and the plagiarized author, that the text be corrected.

In the analyses conducted in 2018, it was found that similarities (plagiarism and self-plagiarism) occurred in 52 articles (44%), and there was no relevant evidence of plagiarism or self-plagiarism found in 66 (56%) manuscripts (Table 4 ).

Comparing the results of the similarity reports in the two periods studied (2013 versus 2018), it is possible to confirm a reduction of 21.9% in the index of the occurrence of plagiarism and self-plagiarism. This is a relevant volume for a five-year period, although 44% is an elevated index for fraud by authors when taking into account the parameters appearing in the literature (Amos 2014 ). When weighing the fact that the SPELL database included a total of 28,259 articles published in the 2013–2018 period, the percentage of observed fraud by authors was 0.18%, which represents a highly noteworthy number compared to the study conducted by Amos ( 2014 ). From a sample of 0.02% of the retracted articles in the PubMed database in the period from 2008 to 2012, that study deemed 35% included plagiarism or self-plagiarism.

Notably, 16 articles (14%) were determined to have evidence of self-plagiarism, or rather they were manuscripts that had been published in the form of theses. They were indexed in open-access repositories, had been presented at scientific events and appeared in their proceedings, or even were published in other journals. Self-plagiarism, or redundancy, is considered a fraudulent practice in the international and Brazilian contexts. COPE warns that published articles should be retracted if, among other reasons, “they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g., data fabrication) or honest error (e.g. miscalculation or experimental error); the findings have previously been published elsewhere without proper cross-referencing, permission or justification (i.e. cases of redundant publication); it constitutes plagiarism; it reports unethical research” (Committee On Publication Ethics 2009 ).

Still, it is necessary to recognize that there is a certain degree of controversy related to self-plagiarism. First, definitions concerning the undue appropriation of published works refer to the presentation as one’s own of someone else’s work. Therefore, considering the copying of one’s own work (self-plagiarism) as fraud cannot be accepted either conceptually or juridically. Regarding the system for attributing scientific credibility that considers the number of publications as a form of ascertaining scientific productivity, it might make sense to characterize self-plagiarism as redundancy. Thus, decreasing self-reproduction can be a way of preventing a single work from being presented as several works, giving a false notion of productivity.

It is fitting to discuss at what point plagiarism is considered a problem by editors and researchers because if it is not a concern, then its absence in the mechanisms of control and punishment is not warranted. Nevertheless, the directives of COPE for editors clearly recommend that mechanisms for the detection of plagiarism be adopted and that reviewers be supported and encouraged to verify the occurrence of plagiarism (Committee On Publication Ethics 2011 ).

Although the occurrence of plagiarism and self-plagiarism is not well known, it can be questioned whether the absence of editorial guidelines concerning these issues in the policy directives given to authors influences the numerical results. The fact that observation reveals that only one periodical sets forth specific directives concerning plagiarism appears to suggest that this problem apparently does not concern editors in relation to the requirements that must be met by authors. Nevertheless, plagiarism is a problem that exists in the academic world, and its occurrence has been measured among researchers in different fields and countries, with clear indications that its frequency is increasing.

COPE itself offers two flowcharts showing possible actions when plagiarism is suspected in manuscripts and in articles already published to help editors. These guidelines vary depending on the seriousness of the plagiarism, the degree of intentionality, and the extent of the responsibility of the author because there are works that contain a few sentences or many segments of literally and improperly reproduced material, cases in which the sources used were not correctly identified due to the researcher’s technical failure, and differences between the plagiarism occurring in a manuscript by a novice researcher and that of a senior investigator.

It is well known that the objective of research work is to contribute to human development; therefore, the greater the visibility a scientific discovery has, the greater the number of people that are able to obtain the resulting benefits. Thus, it is possible to note in the publication directives that it is considered acceptable to publish work previously presented at conferences or divulged in repositories.

The results obtained in this study contribute to the understanding of plagiarism in the context of scientific publications in the area of business in Brazil. Although a reduction in the indices of the occurrence of plagiarism was observed in published articles, as was an increase in the support regarding the prevention of plagiarism by authors in the editorial requirements of periodicals, evidence of the problem continues to remain a concern due to it impact on the reputations of researchers and journals. Nevertheless, it is possible to argue that these indices result from bad faith on the part of researchers less than might be thought. Indeed, it is often found that plagiarism can occur accidentally due to technical difficulties or ignorance of the practices involved in attributing sources. This thinking supports the idea that no scientist should risk having his name and reputation exposed publicly due to a manuscript with fraudulent textual segments since it is currently extremely easy to determine textual similarity using specialized software. Hence, the verification of such occurrences generally reveals carelessness, a lack of concern, or unpreparedness in relation to the matter. Similarly, just as it is not a question of simply attributing the responsibility of plagiarism to the researcher, one must consider the portion of responsibility of others involved in the process of the production and publication of scientific knowledge, such as the editors and the financing agencies.

Consequently, it can be recommended that the editors of the periodicals studied adopt practices directed at informing authors of the importance of preventing plagiarism in the manuscripts submitted for publication via directives. In addition, this action has been recommended by diverse institutions related to scientific production and should be increased by augmenting the capacity of reviewers such that they evaluate the articles submitted for publication, verify the occurrence of plagiarism, and adopt the use of plagiarism detection software as a standard procedure for periodicals. In this way, many works that are published today and are accused of plagiarism can be identified in the submission process, and their authors can be advised to make appropriate preventive corrections.

In conclusion, plagiarism is a problem that must be considered not from the perspective of finding culprits, but rather as a challenge to be overcome that requires collective and committed work on the part of all those involved in the research process, including researchers, editors, research institutions, and financing agencies, among others. However, the first and most fundamental step is the recognition that the problem exists and requires a response and a position from all those involved. This was clearly demonstrated in the present study.

It is recommended that similar studies be conducted using other databases with indices or other types of scientific publications and in different areas of study. It is additionally recommended that the results of these studies be compared with those of similar studies conducted in other contexts, always with the essential objective of contributing to the improvement of the actions for combating plagiarism and consequently strengthening the credibility of science in Brazil and other countries.

Availability of data and materials

The data and materials are not available to readers because they are sensitive content that may embarrass the authors of the manuscripts in which plagiarism and or self-plagiarism were found. However, they can be made available for controlled access by editors and reviewers. Despite the unavailability of data and materials from the selected sample, the reproducibility of the study results can be performed because access to the material is open-access in the Scientific Periodicals Electronic Library (SPELL).

“CAPES is a public institution, linked to the Ministry of Education, responsible for graduate education in Brazil (Master and PhD courses). Its role includes evaluation of such courses, access and communication of scientific production, investment on preparation of high level human resources (as professors and researchers) and promotion of international and scientific information.” (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior 2009 ).

“Plágio” is the term in Portuguese that corresponds to “plagiarism” in English. Since the platform contains articles principally in Portuguese and some others also in English, the search was done in both languages using the two key words.

Free translation of the following passage: “utilizem softwares de busca de similaridade na internet […] adotem políticas de conscientização e informação sobre a propriedade intelectual, visando coibir o plágio na comunidade acadêmica” (OAB 2010 ).

Free translation of the following quote: “chance de um ser humano criar uma frase idêntica a outra já criada diminui exponencialmente com o número de palavras que a frase contém”

Abbreviations

Brazilian Association of Scientific Editors

National Association of Research and Graduate Studies and Research in Administration

Coordination for the Improvement of Higher Education Graduates

National Council of Scientific and Technological Development

Committee on Publication Ethics

Council of Science Editors

Foundation for the Support to Research of the State of São Paulo

Brazilian Bar Association

Scientific Periodicals Electronic Library

São Paulo University

Abreu, BB (2016) Investigating plagiarism in the academic context. 2016. 220 f. Tese (Doutorado) - Curso de Programa de Pós-Graduação em Inglês, Estudos Linguísticos e Literários, Universidade Federal de Santa Catarina, Florianopolis

Google Scholar  

Alves, G (2017) Fapesp bloqueará verba de instituição que não adotar medidas antiplágio . Jornal Folha de São Paulo. Retrieved from https://www1.folha.uol.com.br/ciencia/2017/04/1878564-fapesp-bloqueara-verba-de-instituicao-que-nao-adotar-medidas-antiplagio.shtml?origin=folha . Accessed 04 Oct 2020

Amos KA (2014) The ethics of scholarly publishing: exploring differences in plagiarism and duplicate publication across nations. J Med Libr Assoc 102(2):87–91

Article   Google Scholar  

Andrade JX (2011) Má conduta na pesquisa em ciências contábeis. 2011, p 125 Tese (Doutorado em Ciências Contábeis – Universidade de São Paulo

Associação Nacional de Pesquisa e Pós-Graduação e Pesquisa em Administração (ANPAD) (2017) Boas Práticas da Publicação Científica : um manual para autores, revisores, editores e integrantes de corpos editoriais 2017. Retrieved from http://www.anpad.org.br/~anpad/diversos/2017/2017_Boas_Praticas.pdf . Accessed 04 Jul 2019

Australian Government (2007) Australian code for the responsible conduct of research 2007. Retrieved from https://www.nhmrc.gov.au/about-us/publications/australian-code-responsible-conduct-research-2007 . Accessed 05 Jul 2019

Badge, J, Scott, J (2009) Dealing with plagiarism in the digital age What is electronic detection of plagiarism? Retrieved from http://evidencenet.pbworks.com/w/page/19383480/Dealing%20with%20plagiarism%20in%20the%20digital%20age . Accessed 5 Jul 2019

Barbastefano, RG & Souza, CG (2007) Percepção do conceito de plágio acadêmico entre alunos de engenharia de produção e ações para sua redução. Revista Produção Online , Florianópolis, 7 (4)

Barros, TD & Duque, APO (2015) O cenário do plágio acadêmico sob a ótica informacional de pesquisadores brasileiros na BDTD e no ENANPAD. In: CONVIBRA 2015, Rio de Janeiro. Anais... Rio de Janeiro. Retrieved from http://www.convibra.com.br/upload/paper/2015/31/2015_31_11898.pdf . Accessed 06 Jul 2019

Biondi, A (2011) Plágio na produção acadêmica, vespeiro intocado. Ou não? Revista Adusp, São Paulo, 50 (90)

Bird SB, Sivilotti MLA (2008) Self-plagiarism, recycling fraud, and the intent to mislead. J Med Toxicol 4:69

Brasil (2016) Ministério Da Educação. Plataforma sucupira: Qualis. Retrieved from https://sucupira.capes.gov.br/sucupira/public/index.jsf;jsessionid=tVIr+CEzblCSaOT2ls0tR+yd.sucupira-205 #. Accessed 03 Oct 2020

Carver JD et al (2011) Ethical considerations in scientific writing. Indian J Sex transm Dis 32(2):124–128

Committee On Publication Ethics (2009). Retractions: Guidance from the Committee on Publication Ethics . Retrieved from https://publicationethics.org/files/u661/Retractions_COPE_gline_final_3_Sept_09__2_.pdf . Accessed 05 Jul 2019

Committee On Publication Ethics (2011) Code of conduct and best practice guidelines for jornal editors. Retrieved from http://publicationethics.org/files/Code_of_conduct_for_journal_editors_Mar11.pdf . Accessed 05 Jul 2019

Committee On Publication Ethics (2016) What to do if you suspect redundant (duplicate) publication (b) Suspected redundant publication in a published manuscript . Retrieved from https://publicationethics.org/files/redundant%20publication%20A_0.pdf . Accessed 05 Jul2019

Committee On Publication Ethics (2018) What to do if you suspect plagiarism (a) Suspected plagiarism in a submitted manuscript. Retrieved from https://publicationethics.org/files/plagiarism%20A.pdf . Accessed 05 Jul 2019

Concordia University (2019) What is Plagiarism . Retrieved from https://www.concordia.ca/students/academic-integrity/plagiarism.html . Accessed 05 Jul 2019

Conselho Nacional de Desenvolvimento Científico Tecnológico (2011) Relatório da Comissão de Integridade de Pesquisa do CNPq. Retrieved from http://www.cnpq.br/documents/10157/a8927840-2b8f-43b9-8962-5a2ccfa74dda . Accessed 07 Jul 2019

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (2009) What are CAPES's main activities. Retrieved from http://www.periodicos.capes.gov.br/?option=com_pnews&component=Clipping&view=pnewsclipping&cid=57&mn=0 . Accessed 19 October 2020

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (2011). Orientações Capes: combate ao plágio. Retrieved from http://www.capes.gov.br/images/stories/download/diversos/OrientacoesCapes_CombateAoPlagio.pdf . Accessed 05 Jul 2019

Costa FJ, Socorro CTS, Muzzio H (2017) Uma Reflexão sobre Autoria Acadêmica. Teoria e Prática em Administração 7(1):1–25

Council of Science Editors (2018) CSE’s White Paper on Promoting Integrity in Scientific Journal Publications. Retrieved from https://www.councilscienceeditors.org/wp-content/uploads/CSE-White-Paper_2018-update-050618.pdf . Accessed 06 Jul 2019

Demo P (2011) Remix, pastiche, plágio: autorias da nova geração. Meta: Avaliação 3(8):125–144

Eaton SE, Crossman K (2018) Self-plagiarism research literature in the social sciences: a scoping review. Interchange 49:285–311. https://doi.org/10.1007/s10780-018-9333-6

Cadernos Ebape (2014). Diretrizes para autores . Retrieved from http://bibliotecadigital.fgv.br/ojs/index.php/cadernosebape/pages/view/normas . Accessed 15 Jan 2014

Fachini GJ, Domingues MJCS (2008) Percepção do plágio acadêmico entre alunos de programas de pós-graduação em administração e contabilidade. Anais dos Seminários em Administração São Paulo, SP, Brasil, XI

Ferreira ABH (1986) Novo dicionário da Língua Portuguesa, 2nd edn. Nova Fronteira, Rio de Janeiro

Fundação de Amparo à Pesquisa do Estado de São Paulo (2011) Código de boas práticas científicas . Retrieved from http://www.fapesp.br/boaspraticas/codigo_050911.pdf . Accessed 05 Jul 2019

Fundação de Amparo à Pesquisa do Estado de São Paulo (2014). Sumário de casos . Retrieved from https://fapesp.br/8577/sumarios-de-casos . Accessed 04 Oct 2020

Garcia, GR (2013) Fraude y plagio academic en los ambientes virtuales de aprendizaje 2013. Retrieved from https://portafolis.urv.cat/artefact/file/download.php?file=12835&view=3272 . Accessed 05 Jul 2019

Georgetown University (2019) Examples of Plagiarism . Retrieved from https://honorcouncil.georgetown.edu/system/what-is-plagiarism/x . Accessed 05 Jul 2019

Harris R (2001) The plagiarism handbook. Pyrczak Publishing, Los Angeles

Innarelli, PB (2011) Fatores antecedentes na atitude de alunos de graduação frente ao plágio . (Dissertação de Mestrado). Universidade Metodista de São Paulo, São Bernardo do Campo, SP, Brasil

IThenticate® (2019) Prevent Plagiarism in Published Works . Retrieved from http://www.ithenticate.com/ . Accessed 04 Jul 2019

Job I, Mattos AM, Trindade A (2009) Processo de revisão pelos pares: por que são rejeitados os manuscritos submetidos a um periódico científico. Movimento, Porto Alegre 15(3): 1-17.

Koocher GP, Keith-Spiegel P (2010) Peers nip misconduct in the bud. Nature 466(7305):438–440

Krokoscz M (2011) Abordagem do plágio nas três melhores universidades de cada um dos cinco continentes e do Brasil. Rev Brasileira de Educação 16(48):745–818

Krokoscz M (2012a) A literature review of scientific research and reflections on plagiarism in Brazil since 1990. In 5th International Plagiarism Conference 16-18 July, Newcastle UK.

Krokoscz M (2012b) Autoria e Plágio: um guia para estudantes, professores, pesquisadores e editores. Atlas, São Paulo

Loui MC (2002) Seven ways to plagiarise. Sci Eng Ethics 8(4):529-539.

Martin B (1994) Plagiarism: a misplaced emphasis. J Inf Ethics 3(2):36–47

Masic I (2014) Plagiarism in scientific research and publications and how to prevent it. Materia Socio Medica 26(2):141

Massachusetts Institute Of Technology (2018) Academic Integrity: Incorporating the Words and Ideas of Others. Retrieved from http://integrity.mit.edu/sites/default/files/documents/AcademicIntegrityHandbook2018-color.pdf . Accessed 05 Jul 2019

Neumann, E (2018) Relação entre os fatores antecedentes e a atitude de plágio em estudantes de administração . (Dissertação de Mestrado). Universidade Regional de Blumenau, Blumenau, SC, Brasil

OAB (2010) Combate ao Plágio - Comissão Nacional de Relações Institucionais do Conselho Federal da OAB. Retrieved from http://www2.ib.usp.br/files/doc_%20plagio_OAB.pdf . Accessed 04 Oct 2020

Research Councils UK (2017) RCUK Policy and Code of Conduct on the Governance of Good research Conduct . Retrieved from https://www.ukri.org/files/legacy/reviews/grc/rcuk-grp-policy-and-guidelines-updated-apr-17-2-pdf/ . Accessed 5 Jul 2019.

Retraction Watch (2019) The Retraction Watch Database. Version 1.0.5.5. 2019. Retrieved from http://retractiondatabase.org/RetractionSearch.aspx . Accessed 04 Jul 2019

Revista Brasileira de Marketing (2014) Diretrizes para autores. Retrieved from http://www.revistabrasileiramarketing.org/ojs-2.2.4/index.php/remark/about/submissions#onlineSubmissions . Accessed 16 Jan 2014

Revista de Administração e Contabilidade da Unisinos (2014) Diretrizes para autores . Retrieved from http://revistas.unisinos.br/index.php/base/about/submissions#onlineSubmissions . Accessed 14 Jan 2014

Revista de Gestão, Finanças e Contabilidade (2014) Diretrizes para autores . Retrieved from https://www.revistas.uneb.br/index.php/financ/about/submissions#authorGuidelines . Accessed 14 Jan 2014

Roig, M (2015) Avoiding plagiarism, self-plagiarism, and other questionable writing practices: a guide to ethical writing. Retrieved from https://ori.hhs.gov/sites/default/files/plagiarism.pdf . Accessed 28 Sep 2012

Samuelson P (1994) Self-plagiarism or fair use? Commun ACM 37(8):21–25

Saraiva EV, Carrieri AP (2009) Citações e não citações na produção acadêmica de estratégia no Brasil: uma reflexão crítica. Rev de Administração - RAUSP 44(2):158–166

Silva AKL, Domingues MJCS (2008) Plágio no meio acadêmico: de que forma alunos de pós-graduação compreendem o tema. Perspect Contemp 3(2):117–135

Stanford University (2019) Sample plagiarism . Retrieved from https://communitystandards.stanford.edu/policies-and-guidance/what-plagiarism/sample-plagiarism-cases . Accessed 05 Jul 2019

The University of Hong Kong (2019) What is plagiarism . Retrieved from http://www.rss.hku.hk/plagiarism/page2s.htm . Accessed 05 Jul 2019

Tomazelli, KG (2011) Desonestidade acadêmica e profissional: avaliação das percepções de estudantes de Administração e Contabilidade. (Trabalho de conclusão de curso). Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brasil

U.S. Department Of Health And Human Services (2005) Public health service policies on research misconduct. Final rule. Federal register 70(94):28369-28400.

Universiteit Ghent (2019) Education and Examination Code Academic Year 2019–2020 . Retrieved from https://www.ugent.be/student/en/class-exam-exchange-intern/class-exam/education-examination-code/oeren20192020.pdf/at_download/file . Accessed 20 Apr 2013

University of Cambridge (2019a) University-wide statement on plagiarism . Retrieved from http://www.admin.cam.ac.uk/univ/plagiarism/students/statement.html . Accessed 05 Jul 2019

University of Cambridge (2019b) Collusion . Retrieved from https://www.plagiarism.admin.cam.ac.uk/what-plagiarism/collusion . Accessed 05 Jul 2019

University of Cape Town (2019) Avoiding plagiarism: a guide for students. Retrieved from https://www.uct.ac.za/sites/default/files/image_tool/images/328/about/policies/Guide_StudentGuideOnAvoidingPlagiarism.pdf . Accessed: 05 Jul 2019

University of Oxford (2019) What is plagiarism . Retrieved from http://www.ox.ac.uk/students/academic/goodpractice/about/ . Accessed 05 Jul 2019

University of Pretoria (2019) What is plagiarism . Retrieved from https://www.up.ac.za/students/article/2745913/what-is-plagiarism . Accessed 20 Apr 2013

Valente, NTZ et al. (2010) Reasons that lead undergraduate students in the business administration course to misuse ready papers taken from the internet. Anais do CONTECSI - International Conference on Information Systems and Technology Management, 7. São Paulo, SP, Brasil

Van Noorden R (2011) Science publishing: the trouble with retractions. Nature 478:26–28.

Veludo-De-Oliveira TM et al (2014) Cola, plágio. Ram, Rev Adm. Mackenzie 15(1):73–97

WAGER, E (2014) Defining and responding to plagiarism. Learned Publishing , 27 (1): 33–42.

Download references

Acknowledgements

I would like to express my sincere gratitude to Prof. Fredric Michael Litto with who I am having the opportunity to share ideas and reflections on plagiarism and academic integrity. I am also grateful for his contribution translating this manuscript to English. I would like also to say thank you to Talita Fonseca, for her support collecting data.

Code availability

Not applicable.

This research did not receive any external funding. Fundação Escola de Comércio Álvares Penteado (FECAP) provided funding for openly publishing the manuscript.

Author information

Authors and affiliations.

Fundação Escola de Comércio Álvares Penteado/FECAP, Av. Liberdade, 532, São Paulo, SP, 01502-001, Brazil

Marcelo Krokoscz

You can also search for this author in PubMed   Google Scholar

Contributions

The research was done by a single author. Collaborators were thanked in the corresponding section. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Marcelo Krokoscz .

Ethics declarations

Competing interests.

Although the data analysis was developed using the iThenticate®, a commercial software to detect similarities in the text, the author declare that he has no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Krokoscz, M. Plagiarism in articles published in journals indexed in the Scientific Periodicals Electronic Library (SPELL): a comparative analysis between 2013 and 2018. Int J Educ Integr 17 , 1 (2021). https://doi.org/10.1007/s40979-020-00063-5

Download citation

Received : 25 May 2020

Accepted : 09 November 2020

Published : 07 January 2021

DOI : https://doi.org/10.1007/s40979-020-00063-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Self-plagiarism
  • Authors’ guidelines
  • Academic integrity

International Journal for Educational Integrity

ISSN: 1833-2595

research article about plagiarism

  • Utility Menu

University Logo

fa3d988da6f218669ec27d6b6019a0cd

A publication of the harvard college writing program.

Harvard Guide to Using Sources 

  • The Honor Code
  • What Constitutes Plagiarism?

In academic writing, it is considered plagiarism to draw any idea or any language from someone else without adequately crediting that source in your paper. It doesn't matter whether the source is a published author, another student, a website without clear authorship, a website that sells academic papers, or any other person: Taking credit for anyone else's work is stealing, and it is unacceptable in all academic situations, whether you do it intentionally or by accident.

The ease with which you can find information of all kinds online means that you need to be extra vigilant about keeping track of where you are getting information and ideas and about giving proper credit to the authors of the sources you use. If you cut and paste from an electronic document into your notes and forget to clearly label the document in your notes, or if you draw information from a series of websites without taking careful notes, you may end up taking credit for ideas that aren't yours, whether you mean to or not.

It's important to remember that every website is a document with an author, and therefore every website must be cited properly in your paper. For example, while it may seem obvious to you that an idea drawn from Professor Steven Pinker's book The Language Instinct should only appear in your paper if you include a clear citation, it might be less clear that information you glean about language acquisition from the Stanford Encyclopedia of Philosophy website warrants a similar citation. Even though the authorship of this encyclopedia entry is less obvious than it might be if it were a print article (you need to scroll down the page to see the author's name, and if you don't do so you might mistakenly think an author isn't listed), you are still responsible for citing this material correctly. Similarly, if you consult a website that has no clear authorship, you are still responsible for citing the website as a source for your paper. The kind of source you use, or the absence of an author linked to that source, does not change the fact that you always need to cite your sources (see Evaluating Web Sources ).

Verbatim Plagiarism

If you copy language word for word from another source and use that language in your paper, you are plagiarizing verbatim . Even if you write down your own ideas in your own words and place them around text that you've drawn directly from a source, you must give credit to the author of the source material, either by placing the source material in quotation marks and providing a clear citation, or by paraphrasing the source material and providing a clear citation.

The passage below comes from Ellora Derenoncourt’s article, “Can You Move to Opportunity? Evidence from the Great Migration.”

Here is the article citation in APA style:

Derenoncourt, E. (2022). Can you move to opportunity? Evidence from the Great Migration. The American Economic Review , 112(2), 369–408. https://doi.org/10.1257/aer.20200002

Source material

Why did urban Black populations in the North increase so dramatically between 1940 and 1970? After a period of reduced mobility during the Great Depression, Black out-migration from the South resumed at an accelerated pace after 1940. Wartime jobs in the defense industry and in naval shipyards led to substantial Black migration to California and other Pacific states for the first time since the Migration began. Migration continued apace to midwestern cities in the 1950s and1960s, as the booming automobile industry attracted millions more Black southerners to the North, particularly to cities like Detroit or Cleveland. Of the six million Black migrants who left the South during the Great Migration, four million of them migrated between 1940 and 1970 alone.

Plagiarized version

While this student has written her own sentence introducing the topic, she has copied the italicized sentences directly from the source material. She has left out two sentences from Derenoncourt’s paragraph, but has reproduced the rest verbatim:

But things changed mid-century. After a period of reduced mobility during the Great Depression, Black out-migration from the South resumed at an accelerated pace after 1940. Wartime jobs in the defense industry and in naval shipyards led to substantial Black migration to California and other Pacific states for the first time since the Migration began. Migration continued apace to midwestern cities in the 1950s and1960s, as the booming automobile industry attracted millions more Black southerners to the North, particularly to cities like Detroit or Cleveland.

Acceptable version #1: Paraphrase with citation

In this version the student has paraphrased Derenoncourt’s passage, making it clear that these ideas come from a source by introducing the section with a clear signal phrase ("as Derenoncourt explains…") and citing the publication date, as APA style requires.

But things changed mid-century. In fact, as Derenoncourt (2022) explains, the wartime increase in jobs in both defense and naval shipyards marked the first time during the Great Migration that Black southerners went to California and other west coast states. After the war, the increase in jobs in the car industry led to Black southerners choosing cities in the midwest, including Detroit and Cleveland.

Acceptable version #2 : Direct quotation with citation or direct quotation and paraphrase with citation

If you quote directly from an author and cite the quoted material, you are giving credit to the author. But you should keep in mind that quoting long passages of text is only the best option if the particular language used by the author is important to your paper. Social scientists and STEM scholars rarely quote in their writing, paraphrasing their sources instead. If you are writing in the humanities, you should make sure that you only quote directly when you think it is important for your readers to see the original language.

In the example below, the student quotes part of the passage and paraphrases the rest.

But things changed mid-century. In fact, as Derenoncourt (2022) explains, “after a period of reduced mobility during the Great Depression, Black out-migration from the South resumed at an accelerated pace after 1940” (p. 379). Derenoncourt notes that after the war, the increase in jobs in the car industry led to Black southerners choosing cities in the midwest, including Detroit and Cleveland.

Mosaic Plagiarism

If you copy bits and pieces from a source (or several sources), changing a few words here and there without either adequately paraphrasing or quoting directly, the result is mosaic plagiarism . Even if you don't intend to copy the source, you may end up with this type of plagiarism as a result of careless note-taking and confusion over where your source's ideas end and your own ideas begin. You may think that you've paraphrased sufficiently or quoted relevant passages, but if you haven't taken careful notes along the way, or if you've cut and pasted from your sources, you can lose track of the boundaries between your own ideas and those of your sources. It's not enough to have good intentions and to cite some of the material you use. You are responsible for making clear distinctions between your ideas and the ideas of the scholars who have informed your work. If you keep track of the ideas that come from your sources and have a clear understanding of how your own ideas differ from those ideas, and you follow the correct citation style, you will avoid mosaic plagiarism.

Indeed, of the more than 3500 hours of instruction during medical school, an average of less than 60 hours are devoted to all of bioethics, health law and health economics combined . Most of the instruction is during the preclinical courses, leaving very little instructional time when students are experiencing bioethical or legal challenges during their hands-on, clinical training. More than 60 percent of the instructors in bioethics, health law, and health economics have not published since 1990 on the topic they are teaching.

--Persad, G.C., Elder, L., Sedig,L., Flores, L., & Emanuel, E. (2008). The current state of medical school education in bioethics, health law, and health economics. Journal of Law, Medicine, and Ethics 36 , 89-94.

Students can absorb the educational messages in medical dramas when they view them for entertainment. In fact, even though they were not created specifically for education, these programs can be seen as an entertainment-education tool [43, 44]. In entertainment-education shows, viewers are exposed to educational content in entertainment contexts, using visual language that is easy to understand and triggers emotional engagement [45]. The enhanced emotional engagement and cognitive development [5] and moral imagination make students more sensitive to training [22].

--Cambra-Badii, I., Moyano, E., Ortega, I., Josep-E Baños, & Sentí, M. (2021). TV medical dramas: Health sciences students’ viewing habits and potential for teaching issues related to bioethics and professionalism. BMC Medical Education, 21 , 1-11. doi: https://doi.org/10.1186/s12909-021-02947-7

Paragraph #1.

All of the ideas in this paragraph after the first sentence are drawn directly from Persad. But because the student has placed the citation mid-paragraph, the final two sentences wrongly appear to be the student’s own idea:

In order to advocate for the use of medical television shows in the medical education system, it is also important to look at the current bioethical curriculum. In the more than 3500 hours of training that students undergo in medical school, only about 60 hours are focused on bioethics, health law, and health economics (Persad et al, 2008). It is also problematic that students receive this training before they actually have spent time treating patients in the clinical setting. Most of these hours are taught by instructors without current publications in the field.

Paragraph #2.

All of the italicized ideas in this paragraph are either paraphrased or taken verbatim from Cambra-Badii, et al., but the student does not cite the source at all. As a result, readers will assume that the student has come up with these ideas himself:

Students can absorb the educational messages in medical dramas when they view them for entertainment. It doesn’t matter if the shows were designed for medical students; they can still be a tool for education. In these hybrid entertainment-education shows, viewers are exposed to educational content that triggers an emotional reaction. By allowing for this emotional, cognitive, and moral engagement, the shows make students more sensitive to training . There may be further applications to this type of education: the role of entertainment as a way of encouraging students to consider ethical situations could be extended to other professions, including law or even education.

The student has come up with the final idea in the paragraph (that this type of ethical training could apply to other professions), but because nothing in the paragraph is cited, it reads as if it is part of a whole paragraph of his own ideas, rather than the point that he is building to after using the ideas from the article without crediting the authors.

Acceptable version

In the first paragraph, the student uses signal phrases in nearly every sentence to reference the authors (“According to Persad et al.,” “As the researchers argue,” “They also note”), which makes it clear throughout the paragraph that all of the paragraph’s information has been drawn from Persad et al. The student also uses a clear APA in-text citation to point the reader to the original article. In the second paragraph, the student paraphrases and cites the source’s ideas and creates a clear boundary behind those ideas and his own, which appear in the final paragraph.

In order to advocate for the use of medical television shows in the medical education system, it is also important to look at the current bioethical curriculum. According to Persad et al. (2008), only about one percent of teaching time throughout the four years of medical school is spent on ethics. As the researchers argue, this presents a problem because the students are being taught about ethical issues before they have a chance to experience those issues themselves. They also note that more than sixty percent of instructors teaching bioethics to medical students have no recent publications in the subject.

The research suggests that medical dramas may be a promising source for discussions of medical ethics. Cambra-Badii et al. (2021) explain that even when watched for entertainment, medical shows can help viewers engage emotionally with the characters and may prime them to be more receptive to training in medical ethics. There may be further applications to this type of education: the role of entertainment as a way of encouraging students to consider ethical situations could be extended to other professions, including law or even education.

Inadequate Paraphrase

When you paraphrase, your task is to distill the source's ideas in your own words. It's not enough to change a few words here and there and leave the rest; instead, you must completely restate the ideas in the passage in your own words. If your own language is too close to the original, then you are plagiarizing, even if you do provide a citation.

In order to make sure that you are using your own words, it's a good idea to put away the source material while you write your paraphrase of it. This way, you will force yourself to distill the point you think the author is making and articulate it in a new way. Once you have done this, you should look back at the original and make sure that you have represented the source’s ideas accurately and that you have not used the same words or sentence structure. If you do want to use some of the author's words for emphasis or clarity, you must put those words in quotation marks and provide a citation.

The passage below comes from Michael Sandel’s article, “The Case Against Perfection.” Here’s the article citation in MLA style:

Sandel, Michael. “The Case Against Perfection.” The Atlantic , April 2004, https://www.theatlantic.com/magazine/archive/2004/04/the-case-against-pe... .

Though there is much to be said for this argument, I do not think the main problem with enhancement and genetic engineering is that they undermine effort and erode human agency. The deeper danger is that they represent a kind of hyperagency—a Promethean aspiration to remake nature, including human nature, to serve our purposes and satisfy our desires. The problem is not the drift to mechanism but the drive to mastery. And what the drive to mastery misses and may even destroy is an appreciation of the gifted character of human powers and achievements.

The version below is an inadequate paraphrase because the student has only cut or replaced a few words: “I do not think the main problem” became “the main problem is not”; “deeper danger” became “bigger problem”; “aspiration” became “desire”; “the gifted character of human powers and achievements” became “the gifts that make our achievements possible.”

The main problem with enhancement and genetic engineering is not that they undermine effort and erode human agency. The bigger problem is that they represent a kind of hyperagency—a Promethean desire to remake nature, including human nature, to serve our purposes and satisfy our desires. The problem is not the drift to mechanism but the drive to mastery. And what the drive to mastery misses and may even destroy is an appreciation of the gifts that make our achievements possible (Sandel).

Acceptable version #1: Adequate paraphrase with citation

In this version, the student communicates Sandel’s ideas but does not borrow language from Sandel. Because the student uses Sandel’s name in the first sentence and has consulted an online version of the article without page numbers, there is no need for a parenthetical citation.

Michael Sandel disagrees with the argument that genetic engineering is a problem because it replaces the need for humans to work hard and make their own choices. Instead, he argues that we should be more concerned that the decision to use genetic enhancement is motivated by a desire to take control of nature and bend it to our will instead of appreciating its gifts.

Acceptable version #2: Direct quotation with citation

In this version, the student uses Sandel’s words in quotation marks and provides a clear MLA in-text citation. In cases where you are going to talk about the exact language that an author uses, it is acceptable to quote longer passages of text. If you are not going to discuss the exact language, you should paraphrase rather than quoting extensively.

The author argues that “the main problem with enhancement and genetic engineering is not that they undermine effort and erode human agency,” but, rather that “they represent a kind of hyperagency—a Promethean desire to remake nature, including human nature, to serve our purposes and satisfy our desires. The problem is not the drift to mechanism but the drive to mastery. And what the drive to mastery misses and may even destroy is an appreciation of the gifts that make our achievements possible” (Sandel).

Uncited Paraphrase

When you use your own language to describe someone else's idea, that idea still belongs to the author of the original material. Therefore, it's not enough to paraphrase the source material responsibly; you also need to cite the source, even if you have changed the wording significantly. As with quoting, when you paraphrase you are offering your reader a glimpse of someone else's work on your chosen topic, and you should also provide enough information for your reader to trace that work back to its original form. The rule of thumb here is simple: Whenever you use ideas that you did not think up yourself, you need to give credit to the source in which you found them, whether you quote directly from that material or provide a responsible paraphrase.

The passage below comes from C. Thi Nguyen’s article, “Echo Chambers and Epistemic Bubbles.”

Here’s the citation for the article, in APA style:

Nguyen, C. (2020). Echo chambers and epistemic bubbles. Episteme, 17 (2), 141-161. doi:10.1017/epi.2018.32

Epistemic bubbles can easily form accidentally. But the most plausible explanation for the particular features of echo chambers is something more malicious. Echo chambers are excellent tools to maintain, reinforce, and expand power through epistemic control. Thus, it is likely (though not necessary) that echo chambers are set up intentionally, or at least maintained, for this functionality (Nguyen, 2020).

The student who wrote the paraphrase below has drawn these ideas directly from Nguyen’s article but has not credited the author. Although she paraphrased adequately, she is still responsible for citing Nguyen as the source of this information.

Echo chambers and epistemic bubbles have different origins. While epistemic bubbles can be created organically, it’s more likely that echo chambers will be formed by those who wish to keep or even grow their control over the information that people hear and understand.

In this version, the student eliminates any possible ambiguity about the source of the ideas in the paragraph. By using a signal phrase to name the author whenever the source of the ideas could be unclear, the student clearly attributes these ideas to Nguyen.

According to Nguyen (2020), echo chambers and epistemic bubbles have different origins. Nguyen argues that while epistemic bubbles can be created organically, it’s more likely that echo chambers will be formed by those who wish to keep or even grow their control over the information that people hear and understand.

Uncited Quotation

When you put source material in quotation marks in your essay, you are telling your reader that you have drawn that material from somewhere else. But it's not enough to indicate that the material in quotation marks is not the product of your own thinking or experimentation: You must also credit the author of that material and provide a trail for your reader to follow back to the original document. This way, your reader will know who did the original work and will also be able to go back and consult that work if they are interested in learning more about the topic. Citations should always go directly after quotations.

The passage below comes from Deirdre Mask’s nonfiction book, The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power.

Here is the MLA citation for the book:

Mask, Deirdre. The Address Book: What Street Addresses Reveal About Identity, Race, Wealth, and Power. St. Martin’s Griffin, 2021.

In New York, even addresses are for sale. The city allows a developer, for the bargain price of $11,000 (as of 2019), to apply to change the street address to something more attractive.

It’s not enough for the student to indicate that these words come from a source; the source must be cited:

After all, “in New York, even addresses are for sale. The city allows a developer, for the bargain price of $11,000 (as of 2019), to apply to change the street address to something more attractive.”

Here, the student has cited the source of the quotation using an MLA in-text citation:

After all, “in New York, even addresses are for sale. The city allows a developer, for the bargain price of $11,000 (as of 2019), to apply to change the street address to something more attractive” (Mask 229).

Using Material from Another Student's Work

In some courses you will be allowed or encouraged to form study groups, to work together in class generating ideas, or to collaborate on your thinking in other ways. Even in those cases, it's imperative that you understand whether all of your writing must be done independently, or whether group authorship is permitted. Most often, even in courses that allow some collaborative discussion, the writing or calculations that you do must be your own. This doesn't mean that you shouldn't collect feedback on your writing from a classmate or a writing tutor; rather, it means that the argument you make (and the ideas you rely on to make it) should either be your own or you should give credit to the source of those ideas.

So what does this mean for the ideas that emerge from class discussion or peer review exercises? Unlike the ideas that your professor offers in lecture (you should always cite these), ideas that come up in the course of class discussion or peer review are collaborative, and often not just the product of one individual's thinking. If, however, you see a clear moment in discussion when a particular student comes up with an idea, you should cite that student. In any case, when your work is informed by class discussions, it's courteous and collegial to include a discursive footnote in your paper that lets your readers know about that discussion. So, for example, if you were writing a paper about the narrator in Tim O'Brien's The Things They Carried and you came up with your idea during a discussion in class, you might place a footnote in your paper that states the following: "I am indebted to the members of my Expos 20 section for sparking my thoughts about the role of the narrator as Greek Chorus in Tim O'Brien's The Things They Carried ."

It is important to note that collaboration policies can vary by course, even within the same department, and you are responsible for familiarizing yourself with each course's expectation about collaboration. Collaboration policies are often stated in the syllabus, but if you are not sure whether it is appropriate to collaborate on work for any course, you should always consult your instructor.

  • The Exception: Common Knowledge
  • Other Scenarios to Avoid
  • Why Does it Matter if You Plagiarize?
  • How to Avoid Plagiarism
  • Harvard University Plagiarism Policy

PDFs for This Section

  • Avoiding Plagiarism
  • Online Library and Citation Tools
  • Open access
  • Published: 07 May 2020

Factors affecting plagiarism among students at Jazan University

  • Hanaa A. Elshafei 1 , 3 &
  • Tamanna M. Jahangir 2 , 3  

Bulletin of the National Research Centre volume  44 , Article number:  71 ( 2020 ) Cite this article

10k Accesses

7 Citations

7 Altmetric

Metrics details

Plagiarism has been described over the past decades as a multi-layer dishonesty phenomenon emerging in higher education. A number of research papers have described a host of factors such as gender, socialization, productivity benefit, study motivation, methodological uncertainty, or easy access to electronic information through the Internet and new technologies as the driving forces for plagiarism.

The effects of plagiarism are pervasive and no one is exempted. Neither unfamiliarity nor ignorance excludes a person from the compulsory plagiarism’s ethical and legal problems. Institutional misconduct threatens student integrity, academic reputation, and professional reputation along with legal ramifications and financial penalties.

Methodology

The goal of the study is to investigate students’ propensity to use the Internet to plagiarize, factors affecting their tendencies, and plagiarism reasons.

In this research, we analyze the perception of plagiarism and academic misconduct among students at Jazan University, study major dishonesty factors, and study students’ views on plagiarism and misconduct laws.

Examination of the responses of the students to various plagiarism situations showed misunderstandings and misconceptions about many forms of plagiarism.

Our study emphasizes that the problem in our society is that students or budding innovators are being pressured to get involved in academic dishonesty in order to perform better. To have a safe environment, the amount of academic misconduct, theft, and plagiarism must definitely be reduced to a minimum.

Introduction

Plagiarism could be a major research concern of investigation within the academic world. It varies from the unreferenced use of published and unpublished ideas of others and from requests for research grants to the publication of a full paper under “new” authorship, sometimes in a different language. It can occur at any level of preparation, research, writing, or publishing: it refers to print and digital versions. In 2009 , Koul et al. describes plagiarism as a form of cheating and stealing, when one person takes credit for the intellectual work of another in cases of plagiarism. According to Fishman ( 2009 ), plagiarism happens when someone does the following: (1) uses words, ideas, or work products; (2) credited to another person or source identifiable; (3) without attributing the work to the source from which it was obtained; (4) in a situation where there is a reasonable presumption of original authorship; and (5) in order to obtain any profit, credit, or gain that does not need to be financial. A considerable portion of the total number of genuine deviations from a great research hone is evaluated by Pecorari ( 2012 ).

While numerous organizations are inquiring about their definitions of literary theft, small work has been so far wiped out clarifying and legitimizing it (Gert and Stefan 2015 ). Nonetheless, there are many different opinions on how to interpret plagiarism and what makes theft of copyright inexcusable because it distorts rational credit. The word plagiarism is described, according to the Oxford English Dictionary online 2017, as the practice of taking the work or ideas of someone else and carrying them on as one’s own.

This description, however, just gives some information about it and more detailed criteria are needed to make an act in a plagiarism case (Demirdover 2019 ). According to a study in the USA, there are five types of plagiarism that occur, direct, mosaic, self, paraphrase, and accidental, all focused on unethical issues:

Direct plagiarism: Without citing or pointing out the source, the entire text or part of the documents are copied word for word. This is one of the most common plagiarism types.

Mosaic plagiarism: The plagiarizer borrows phrases, without citation.

Self-plagiarism: The author uses his/her own earlier work without crediting it.

Paraphrasing/rephrasing: This is similar to direct plagiarism, except in this case the plagiarizer rearranges the words of the text or sometimes rephrases them according to their contents.

Accidental : Unintentional direct, mosaic, or paraphrase, without citation

Many of those teaching in higher education have considered plagiarism in the classroom as a form of dishonesty (Jereb et al. 2018 ). According to a report by plagiarism organization, “studies indicate that approximately 30 percent of all students may be plagiarizing on every written assignment they complete.” Up to 55% of college presidents say plagiarism in students’ papers has increased in the last decade. Plagiarism carries severe disciplinary and financial consequences. Repeated acts of plagiarism will lead to dismissal from the college.

The knowledge and skills of the student are threatened by academic misconduct; at the same time, it weakens the ability of the instructor to assess how well the student is performing in the course (Ryan et al. 2009 ). The principle is of supreme importance in all university programs, but it becomes particularly important in professional degree programs such as pharmacy, medicine, dentistry, and nursing as pass-outs (students) from such courses should have high ethics because their expectations impact human well-being directly (Neill 2008 ).

Regardless of the fact that plagiarism is carried out at all academic levels in this study, we focus on student misconduct and plagiarism, and why do students use the words or ideas of someone else and pass them on as their own? What factors are affecting this behavior?

Material and methods

The paper and pencil surveys are conducted during the academic year 2017–2018 at Jazan University in Saudi Arabia. Students were informed verbally about the nature of the research and were invited to participate freely. This study was attended by a group of 381 students, 209 males (54.8%) and 172 females (45.2%). The ages of students range from 24 to 35 years.

In this study, a simple questionnaire (both in English and Arabic) with a set of 10 questions was used to evaluate students’ knowledge of plagiarism and examination misconduct. Initial questions were asked about their gender, age, area of study, field of specialization, and average time spent on the internet in the survey tool (questionnaire). Additional questions were asked.

To make it understandable to students, we have translated the questionnaire into Arabic and there is no room left for any confusing information. Participation was entirely voluntary, random, and anonymous. Most responding categories were yes/no/uncertain or disagree/agree/not sure. Hard copies of the survey tool were distributed among students from Jazan University’s various colleges and were collected back in 10 min.

A total of 450 questionnaires were distributed, out of which 381 were returned and included in this study, with an overall response rate of 84.6% with 45.2% females and 54.8% males. Nevertheless, no unusual pattern has been found in the responses of male and female students. 81.5% of participants fell under the age group of 24–35. Approximately half (48%) of the participants were students of pharmacy, 22% were students of dentistry, 20% were students of nursery, and10% were students of science.

More than 55% of students were highly motivated to research and 45% fewer; 23.5% of students spend 2 or less hours per day on the Internet, 36.5% spend 2 to 5 h on the Internet, and 40% spend 5 or more hours per day on the Internet. The general information can be seen in Table 1 .

50.6% of participants agree that to complete their tasks and notice, they use the Internet. 40.15% of respondents accepted that this was due to social pressure caused by academic misconduct. Approximately 40% use plagiarism as their last resort fairly frequently. 56.9% of students strongly agreed to have strict enforcement of laws against such misconduct. However, only 32.5% said academic dishonesty is ok until caught. To our surprise 62.7% had no knowledge of punishments for plagiarism and academic misconduct.

Only 24.7% admitted that they knew people who were intentionally involved in plagiarism, and 52% of students supported the idea of implementing strict rules to help future generations (Fig. 1 ).

figure 1

The results of the survey in terms of percentage of response among students of Jazan University

The idea of plagiarism and academic misconduct has become so popular in the current world of competition (Bradshaw and Lowenstein 1990 ). Interventions aimed at curbing plagiarism do not always overpower the incentive of individuals to steal, and corruption cannot be minimized. In 2006 , Scanlan reported that student academic misconduct for colleges and universities, including those responsible for preparing health professionals, is a growing problem. While the introduction of honor codes has had a positive impact on this issue, further reduction in student cheating and plagiarism can only be accomplished through a comprehensive strategy that supports an academic integrity organizational culture. A questionnaire was issued to postgraduate students from Sweden’s medical faculties who attended a research ethics course during the 2008/2009 academic year and 58% replied. Less than one third of the respondents wrote that in the previous 12 months they had heard of scientific dishonesty (Nilstun et al. 2010 ). Therefore, it becomes our moral as well as ethical duty to make our students aware of the harms caused by such actions and we must also inform them about the consequences for academic misconduct that are approved by law.

Our main objective should be to provide our students with quality education and knowledge as well as good qualifications in examinations and research, but not at the expense of our integrity and dignity, which is called into question in the course of plagiarism and other academic dishonesty. Examination of the responses of Jazan University students to various plagiarism situations showed misunderstandings and misconceptions about many forms of plagiarism.

Our study emphasizes that in our society, the problem is that it pressures students or budding innovators to get involved in academic dishonesty to perform better.

Not only will it help to follow the rules, but we also need to prepare our students to withstand failure. Not only at school, but also at home, a healthy atmosphere should be given so that our students can perform without any fear of failure (Ryan 2009 ). The amount of academic misconduct, stealing, plagiarism should certainly be reduced to a minimum. To avoid and keep a check on plagiarism, we recommend proper education and technology involvement. This should serve as a wakeup call to transnational higher education regarding plagiarism (Palmer et al. 2019 ).

The key results of this research paper indicate that new technologies and the Internet have a clear and important impact on plagiarism. Since most students in our study agree that new technologies and the Web have a strong influence on plagiarism, we may conclude that technological advances and globalization have started to break down national borders and cross cultural boundaries.

In addition, the results could be a starting point for more research into the impact of digitalization and the Internet on plagiarism, and the role of socialization in plagiarism, may contribute to the plagiarism debate in institutions of higher education.

The variables found to be of the greatest importance by the students as causes of plagiarism relate to time management issues and social pressure, in addition to the lack of clarity and incomplete policies that regulate plagiarization.

Existing documents, procedures, and regulations do not explicitly describe the disciplinary mechanisms for plagiarizing students. Understanding the reasons behind plagiarism and promoting understanding among students of the problem may help prevent future academic misconduct through improved support and guidance during the time students study at the university. In the near future, focusing on preventive measures could have a positive impact on good scientific practice.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Bradshaw MJ, Lowenstein AJ (1990) Perspectives on academic dishonesty. Nurse Educ 15(5):10–15

Article   CAS   Google Scholar  

Demirdover C (2019) Plagiarism. Turkish Journal of Plastic Surgery 27(1):1.

Fishman T (2009) We know it when we see it is not good enough: toward a standard definition of plagiarism that transcends theft, fraud, and copyright. Paper presented at the 4th Asia Pacific Conference on Educational Integrity, NSW, (4APCEI) :1-5.

Gert H, Stefan E (2015) Plagiarism in research. Medicine,Health Care and Phisiology 18(1):91-101.

Jereb E, Perc M, Lämmlein B, Jerebic J, Urh M, Podbregar I (2018) Factors influencing plagiarism in higher education: a comparison of German and Slovene students. PLoS One 13(8):e0202252

Article   Google Scholar  

Koul R, Clariana RB, Jitgarun K, Songsriwittaya A (2009) The influence of achievement goal orientation on plagiarism. Learn Indi Diff 19:506–512

Neill US (2008) Publish or perish, but at what cost? J Clin Invest 118(7):2368

Nilstun T, Löfmark R, Lundqvist A (2010) Scientific dishonesty-questionnaire to doctoral students in Sweden. J Med Ethics 36(5):315–318

Palmer A, Pegrum M, Oakley G (2019) A wake-up call? Issues with plagiarism in transnational higher education. Ethics Behav 29(1):23–50

Pecorari D (2012) Textual plagiarism: how should it be regarded? Office of Research Integrity Newsletter 20 (3):3-10.

Ryan G, Bonanno H, Krass I, Scouller K, Smith L (2009) Undergraduate and postgraduate pharmacy students’ perceptions of plagiarism and academic honesty. Am J Pharm Educ 73(6):105

Scanlan CL (2006) Strategies to promote a climate of academic integrity and minimize student cheating and plagiarism. J Allied Health 35(3):179–185

PubMed   Google Scholar  

Download references

Acknowledgements

All participants and academic staff who helped us in conducting this survey. Thanks and appreciations for time and effort in answering our questionnaires to gether and to provide needed data.

Not applicable

Author information

Authors and affiliations.

Microbial Chemistry Department, National Research Centre, Cairo, Egypt

Hanaa A. Elshafei

Leslie Groves Hospital, Helensburgh road, Dunedin, NZ, New Zealand

Tamanna M. Jahangir

Jazan University, Jazan, Kingdom of Saudi Arabia

Hanaa A. Elshafei & Tamanna M. Jahangir

You can also search for this author in PubMed   Google Scholar

Contributions

HA participated in distributing and collecting questionnaires and monitoring data and prepared the table. TT analyzed and interpreted the data and prepared the figure. Both authors contributed in writing the manuscript. Both authors read and approved the final manuscript.

Authors’ information

Corresponding author.

Correspondence to Hanaa A. Elshafei .

Ethics declarations

Competing interest.

No conflict of interest exists.

Ethics approval and consent to participate

Consent for publication.

The authors of this manuscript have read the final version of this article and have agreed to its publication.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Elshafei, H.A., Jahangir, T.M. Factors affecting plagiarism among students at Jazan University. Bull Natl Res Cent 44 , 71 (2020). https://doi.org/10.1186/s42269-020-00313-z

Download citation

Received : 14 January 2020

Accepted : 02 April 2020

Published : 07 May 2020

DOI : https://doi.org/10.1186/s42269-020-00313-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Intellectual contribution
  • Academic misconduct
  • Jazan University

research article about plagiarism

plagiarism report

Prevent plagiarism, run a free plagiarism check.

  • Knowledge Base
  • Types of plagiarism and how to recognize them

The 5 Types of Plagiarism | Explanations & Examples

Published on January 10, 2022 by Raimo Streefkerk . Revised on November 21, 2023 by Jack Caulfield.

Plagiarism comes in many forms, some more severe than others—from rephrasing someone’s ideas without acknowledgement to stealing a whole essay. These are the five most common types of plagiarism:

  • Global plagiarism means passing off an entire text by someone else as your own work.
  • Verbatim plagiarism means directly copying someone else’s words.
  • Paraphrasing plagiarism means rephrasing someone else’s ideas to present them as your own.
  • Patchwork plagiarism means stitching together parts of different sources to create your text.
  • Self-plagiarism means recycling your own past work.

Types of plagiarism

Except for global plagiarism, these types of plagiarism are often accidental, resulting from failure to understand how to properly quote, paraphrase, and cite your sources. If you’re concerned about accidental plagiarism, a plagiarism checker , like the one from Scribbr, can help.

Table of contents

Global plagiarism: plagiarizing an entire text, verbatim plagiarism: copying words directly, paraphrasing plagiarism: rephrasing ideas, patchwork plagiarism: stitching together sources, self-plagiarism: plagiarizing your own work, frequently asked questions about plagiarism.

Global plagiarism means taking an entire text by someone else and passing it off as your own.

For example, if you get someone else to write an essay or assignment for you, or if you find a text online and submit it as your own work, you are committing global plagiarism.

Because it involves deliberately and directly lying about the authorship of a work, this is the most serious type of plagiarism, and it can have severe consequences .

Avoiding this kind of plagiarism is straightforward: just write your own essays!

Prevent plagiarism. Run a free check.

Verbatim plagiarism, also called direct plagiarism, means copying and pasting someone else’s words into your own work without attribution.

This could be text that’s completely identical to the original or slightly altered. If the structure and the majority of the words are the same as in the original, this counts as verbatim plagiarism, even if you delete or change a couple of words.

In academic writing, you can and should refer to the words of others. To avoid verbatim plagiarism, you just need to quote the original source by putting the copied text in quotation marks and including an in-text citation . You can use the free Scribbr Citation Generator to create correctly formatted citations in MLA or APA Style .

Generate accurate citations with Scribbr

Most plagiarism checkers can easily detect verbatim plagiarism.

Example of verbatim plagiarism

Direct plagiarism detected by Turnitin

Paraphrasing means putting a piece of text into your own words. Paraphrasing without citation is the most common type of plagiarism.

Paraphrasing, like quoting, is a legitimate way to incorporate the ideas of others into your writing. It only becomes plagiarism when you rewrite a source’s points as if they were your own. To avoid plagiarism when paraphrasing, cite your sources just as you would when quoting.

If you translate a piece of text from another language without citation, this is also a type of paraphrasing plagiarism. Translated text should always be cited; you’re still using someone else’s ideas, even if they’re in a different language.

Example of paraphrasing

Original (Doorman, 2003)
“Thus the past came to occupy a prominent place in Romanticism. The Romantic thinkers, however, had little affinity with historical schemes such as Condorcet’s. A linear and rational progression in history was the last thing they considered important. For them, the richness of the past lay in its otherness and strangeness rather than in what predictably preceded the here and now, in a distant era like the Middle Ages or antiquity rather than in the cursed, prosaic Enlightenment that preceded it. Such remote, distinct periods were usually manifestations of a golden age that had ended, but to which one could return with the aid of the imagination …” Romantic thinkers were fascinated with the past, but they rarely adopted a linear viewpoint on historical progress. Rather than the rational Enlightenment period, Romanticism is imaginatively preoccupied with the more distant and thus more enchantingly alien past: the Middle Ages and the ancient world.
Original (Doorman, 2003)
“Thus the past came to occupy a prominent place in Romanticism. The Romantic thinkers, however, had little affinity with historical schemes such as Condorcet’s. A linear and rational progression in history was the last thing they considered important. For them, the richness of the past lay in its otherness and strangeness rather than in what predictably preceded the here and now, in a distant era like the Middle Ages or antiquity rather than in the cursed, prosaic Enlightenment that preceded it. Such remote, distinct periods were usually manifestations of a golden age that had ended, but to which one could return with the aid of the imagination …” Romantic thinkers were fascinated with the past, but they rarely adopted a linear viewpoint on historical progress. Rather than the rational Enlightenment period, Romanticism is imaginatively preoccupied with the more distant and thus more enchantingly alien past: the Middle Ages and the ancient world (Doorman, 2003, p. 45).

Patchwork plagiarism, also called mosaic plagiarism, means copying phrases, passages, and ideas from different sources and putting them together to create a new text.

This can involve slightly rephrasing passages while keeping many of the same words and the same basic structure as the original, and inserting your own words here and there to stitch the plagiarized text together. Make sure to cite your sources whenever you quote or paraphrase to avoid plagiarism.

This type of plagiarism requires more effort and is more insidious than just copying and pasting from one source, but plagiarism checkers like Turnitin can still easily detect it.

Example of patchwork plagiarism

Patchwork plagiarism detected by Turnitin

Self-plagiarism means reusing work that you’ve previously submitted or published. It amounts to academic dishonesty to present a paper or a piece of data as brand new when you’ve already gotten credit for the work.

The most serious form of self-plagiarism is to turn in a paper you already submitted for a grade to another class. Unless you have explicit permission to do so, this is always considered self-plagiarism.

Self-plagiarism can also occur when you reuse ideas, phrases or data from your previous assignments. Reworking old ideas and passages is not plagiarism as long as you have permission to do so and you cite your previous work to make their origins clear.

Scribbr’s Self-Plagiarism Checker

Online plagiarism scanners don’t have access to internal university databases and therefore can’t check your document for self-plagiarism.

Using Scribbr’s Self-Plagiarism Checker , you can upload your previous work and compare it to your current document. The checker will scan the texts for similarities and flag any passages where you might have self-plagiarized.

Global plagiarism means taking an entire work written by someone else and passing it off as your own. This can mean getting someone else to write an essay or assignment for you, or submitting a text you found online as your own work.

Global plagiarism is the most serious type of plagiarism because it involves deliberately and directly lying about the authorship of a work. It can have severe consequences .

To ensure you aren’t accidentally plagiarizing, consider running your work through plagiarism checker tool prior to submission. These tools work by using advanced database software to scan for matches between your text and existing texts.

Scribbr’s Plagiarism Checker takes less than 10 minutes and can help you turn in your paper with confidence.

Verbatim plagiarism means copying text from a source and pasting it directly into your own document without giving proper credit.

Even if you delete a few words or replace them with synonyms, it still counts as verbatim plagiarism.

To use an author’s exact words, quote the original source by putting the copied text in quotation marks and including an in-text citation .

If you’re worried about plagiarism, consider running your work through a plagiarism checker tool prior to submission, which work by using advanced database software to scan for matches between your text and existing texts.

Paraphrasing without crediting the original author is a form of plagiarism , because you’re presenting someone else’s ideas as if they were your own.

However, paraphrasing is not plagiarism if you correctly cite the source . This means including an in-text citation and a full reference, formatted according to your required citation style .

As well as citing, make sure that any paraphrased text is completely rewritten in your own words.

Patchwork plagiarism (aka mosaic plagiarism) means copying phrases, passages, or ideas from various existing sources and combining them to create a new text. While this type of plagiarism is more insidious than simply copy-pasting directly from a source, plagiarism checkers like Turnitin’s can still easily detect it.

To avoid plagiarism in any form, remember to cite your sources . Also consider running your work through a plagiarism checker tool prior to submission, which work by using advanced database software to scan for matches between your text and existing texts.

Yes, reusing your own work without acknowledgment is considered self-plagiarism . This can range from re-submitting an entire assignment to reusing passages or data from something you’ve turned in previously without citing them.

Self-plagiarism often has the same consequences as other types of plagiarism . If you want to reuse content you wrote in the past, make sure to check your university’s policy or consult your professor.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Streefkerk, R. (2023, November 21). The 5 Types of Plagiarism | Explanations & Examples. Scribbr. Retrieved August 15, 2024, from https://www.scribbr.com/plagiarism/types-of-plagiarism/

Is this article helpful?

Raimo Streefkerk

Raimo Streefkerk

Other students also liked, what is self-plagiarism | definition & how to avoid it, how to avoid plagiarism | tips on citing sources, consequences of mild, moderate & severe plagiarism, what is your plagiarism score.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Ann Med Health Sci Res
  • v.4(Suppl 3); Sep-Oct 2014

Knowing and Avoiding Plagiarism During Scientific Writing

P mohan kumar.

Department of Periodontics, St. Joseph Dental College, Duggirala, Eluru, Andhra Pradesh, India

N Swapna Priya

1 Department of Dental Surgery, S.V Medical College, Tirupati, Andhra Pradesh, India

SVVS Musalaiah

Plagiarism has become more common in both dental and medical communities. Most of the writers do not know that plagiarism is a serious problem. Plagiarism can range from simple dishonesty (minor copy paste/any discrepancy) to a more serious problem (major discrepancy/duplication of manuscript) when the authors do cut-copy-paste from the original source without giving adequate credit to the main source. When we search databases like PubMed/MedLine there is a lot of information regarding plagiarism. However, it is still a current topic of interest to all the researchers to know how to avoid plagiarism. It's time to every young researcher to know ethical guidelines while writing any scientific publications. By using one's own ideas, we can write the paper completely without looking at the original source. Specific words from the source can be added by using quotations and citing them which can help in not only supporting your work and amplifying ideas but also avoids plagiarism. It is compulsory to all the authors, reviewers and editors of all the scientific journals to know about the plagiarism and how to avoid it by following ethical guidelines and use of plagiarism detection software while scientific writing.

Introduction

Medical and dental writing includes presentation of different scientific documents that consists research related topics, case presentations, and review articles, which help in educating and promoting health related information to the general public. Hence, all the medical and dental writers along with language skills and the ability to interpret the data, they should also be familiar in searching literature, understanding and presenting ones ideas or thoughts in the form of articles submitted to the number of available scientific journals.[ 1 , 2 ]

Due to the lack of education on plagiarism among the educational institutions and the members of journal holders we are allowing some types of plagiarized articles to publish.

In simple words, plagiarism is the use of others ideas or work without any credit to the original authors. In other words, taking credit for others work whether intentionally or unintentionally.[ 3 ]

Main route cause of plagiarism among dental and medical writers is the competitive stress among them and the availability of any information of others in the electronic media.[ 4 , 5 , 6 ] As the plagiarism is an unethical publication practice, it has to be avoided at the first stage itself.[ 7 ]

When the dental/medical writers want to publish a scientific paper, they have to be very specific, accurate and honest about the concept of the research. First, the author has to take sufficient time to read and understand thoroughly the main source of the article, and then he can organize into his own ideas or thoughts. Before submitting their ideas or manuscript to the journal office, the author has to rewrite the article in his own words without seeing from the original source and in doubt, takes help of the guide/instructor.[ 4 , 7 , 8 ]

This article reviews plagiarism at different levels, consequences, guidelines to avoid plagiarism and benefits from avoiding plagiarism.

Plagiarism Definition

The word plagiarism is derived from Latin. “Plagiare means to kidnap.”[ 3 ]

Office of research integrity definition

The Office of Research Integrity describes plagiarism as “theft or misappropriation of intellectual property and the substantial unattributed textual copying of another's work. It does not include authorship or credit disputes. The theft or misappropriation of intellectual property includes the unauthorized use of ideas or unique methods obtained by a privileged communication, such as a grant or manuscript review. Substantial unattributed textual copying of another's work means the unattributed verbatim or nearly verbatim copying of sentences and paragraphs which materially mislead the ordinary reader regarding the contributions of the author.”

Committee on publication ethics definition

In 1999, the Committee on Publication Ethics (COPE) defined plagiarism as, “plagiarism ranges from the unreferenced use of others’ published and unpublished ideas, including research grant applications to submission under “new” authorship of a complete paper, sometimes in a different language. It may occur at any stage of planning, research, writing, or publication: It applies to print and electronic versions.”

According to the Oxford English Dictionary, Plagiarism is defined as - “the action or practice of taking someone else's work, idea, etc., and passing it off as one's own; literary theft.”

The World Association of Medical Editors defines plagiarism as - “the use of others published and unpublished ideas or words (or other intellectual property) without attribution or permission, and presenting them as new and original rather than derived from and existing source.”[ 3 , 4 ]

Plagiarism is defined as the appropriation or imitation of the language, ideas and thoughts of another author and representation of them as one's original work. (The Random House Dictionary of the English Language - unabridged).

Academic dishonesty has reached from students in the classroom to the presenters in the scientific sessions and even to the reviewers and editors of unauthorized journal offices.[ 9 , 10 ]

The following are some of the common possible causes for the increase in plagiarism. For example: Due to the increased competition or laziness among students while writing dissertation and professional over ambition, competition or publish or perish attitude for promotion among young authors could be the result of plagiarism. Reviewers and editors of different scientific journals are also responsible to avoid plagiarism by using plagiarism detecting software before publishing the research.[ 2 , 11 , 12 , 13 ]

Source and Method of Data Collection

Availability of internet facilities and free online journals are the main sources of today's plagiarism among the students, faculty and researchers of any profession.[ 5 , 6 , 14 , 15 , 16 ]

Advancement in technology in conversion of text format into the electronic version, rise in competition levels and “publish or perish” attitude are the some important factors prone to plagiarism among the students/staff/researchers of educational institutions.[ 15 , 16 , 17 ]

Data collection

An online search on “plagiarism” was performed using PubMed/MedLine databases. In the MedLine each reference to the medical literature is indexed under a controlled vocabulary called medical subject headings (MeSH). These MeSH terms acts as a key to search the medical and dental literature. Thus MedLine/PubMed databases are used to search for literature which is available online throughout the world. Initially, 1121 references are obtained in PubMed/MedLine databases on the term “plagiarism” until date. A total of 893 articles are published on plagiarism under MeSH.

After filtering and based on the selection criteria, 35 articles were included in this review. The articles which are related to the dental and medical scientific writing were included in this review. It has taken 6 months for searching, filtering and selecting all the articles to include in this review. The sequence of data collection is demonstrated in Chart 1 .

An external file that holds a picture, illustration, etc.
Object name is AMHSR-4-193-g001.jpg

A flow chart diagram showing the steps used for selecting the articles included in this review

As there is no sufficient literature on this subject (topic), it is the time to educate all the professions on how to avoid plagiarism through the journals and educational institutions in order to prevent publishing diluted researches.

Common Types of Plagiarism

Plagiarism can be of various types. Plagiarism may be intentional or unintentional.

Intentional plagiarism

“Buying or borrowing or cut-copy-paste” or using some others work partly or completely without giving adequate credit to the original author results in intentional plagiarism.[ 7 , 8 , 9 ]

Unintentional plagiarism

Using some others work with wrong paraphrasing or improper citation refers to unintentional plagiarism.[ 1 , 7 , 8 , 9 ]

According to the COPE various types of plagiarism can be distinguished based on factors like: Extent (minor or major plagiarism), originality of copied material, type of material plagiarized, sources referenced or not, authors intention. The following are the most common forms of plagiarism seen in medical and dental publications:

  • Plagiarism of ideas: When the author “uses the ideas or thoughts of some others and presents as his own”[ 3 ] without giving adequate credit to the original authors results in plagiarism of ideas. For example, using the ideas from the previously published articles by postgraduate students while doing their dissertation work.
  • Plagiarism of text/direct plagiarism/word-for-word plagiarism: According to Roig this kind of plagiarism is defined as “copying a portion of text from another source without giving credit to its author and without enclosing the borrowed text in quotation marks.”[ 1 , 3 , 9 ] For example, most of the young authors do not know how to write and give a credit to the original work from where they have chosen. They just cut and paste from the original source and create an article without giving sufficient credit to the authors who has done the original work.
  • Mosaic plagiarism (patchwork plagiarism): When the author fails to write in his own words and “uses the same words or phrases or paragraphs of the original source” without giving adequate credit results in mosaic plagiarism.[ 3 , 7 ] For example, when the authors borrow words/sentences from the original source and do patchwork to his article results in patchwork or mosaic plagiarism.
  • Self-plagiarism: “Stealing or borrowing some amount of work” from his or her previously published articles refers to self-plagiarism.[ 1 , 3 , 7 , 8 ] For example, using one's own work partly and publishes the article in different journals results in self-plagiarism.

Penalties for Plagiarism

Since plagiarism can range from simple dishonesty to a serious problem, penalty depends on the severity of plagiarism. It ranges from formal disciplinary action (apology letters, retraction of the published article) to criminal charges (suspension and prosecution of authors).[ 1 , 17 , 18 , 19 , 20 , 21 ]

Example: “A practicing psychiatrist and radio and television broadcaster in London had to step down as director of the Center for Public Engagement in Mental Sciences in the institute where he was employed and was suspended from practice for 3 months by the General Medical Council.”[ 22 , 23 ]

Detection of Plagiarism

All the medical and dental ethical writers must check for the text duplication unintentionally by using plagiarism detection software before submitting to any journal office. Reviewers also should use plagiarism detection tools in order to avoid false publication practice by both intentional and unintentional authors. When the manuscript passes from the reviewers to the editors without identifying the copied text or ideas, the editor of the journal should finalize the fate of the article based on the extent of plagiarism by using powerful plagiarism detection software. The following are few plagiarism detection software which helps in screening for matching text in the article submitted by the authors.[ 24 , 25 , 26 , 27 , 28 , 29 ]

  • Cross Check™
  • http://www.ithenticate.com
  • https://turnitin.com/static/index
  • Viper ( http://www.scanmyessay.com/plagiarism - free software)
  • Software like eTBLAST
  • SafeAssign™
  • WCopyFind™
  • http://www.checkforplagiarism.net
  • http://www.grammarly.com
  • Sometimes simple Google Search also helps in detecting plagiarism.

Guidelines to Publish a Quality Paper without Plagiarism

Many of the students and authors still do not know the proper way of citing the sources. In order to produce a quality paper every author should follow the following guidelines.[ 3 , 22 , 30 , 31 , 32 , 33 ]

Few good rules to avoid a charge of plagiarism are:

  • Take sufficient time to complete your work
  • Understand the whole concept and write the new ideas in your own words
  • Avoid “copy-paste”
  • Always use few appropriate and accurate sources as possible
  • Learn how and when to quote and also avoid patchwork
  • Always cite new and in doubt, not common language
  • Follow the author's guidelines according to the biomedical journals
  • Cite references accurately
  • Always acknowledge and give sufficient credit to the original sources
  • Avoid writing several articles of the same type and submitting to different journals at the same time
  • Consult with a translator or native speaker before sending the final proof of the manuscript to the scientific journals
  • Use anti-plagiarism tools to detect any accidental plagiarism. For example, plagiarism detection software like Cross Check
  • Enclose the covering letter to the editor regarding for any overlap unintentionally.

Benefits of Avoiding Plagiarism

When writing a good scientific paper one should diagnose for any plagiarized material which helps in avoiding misrepresentation of any hypothesis or scientific misconduct. Table 1 enumerates the key messages given by the authors on knowing and avoiding plagiarism during scientific writing. Thus, every young author tries to learn how to write or present an article or research work in his own words by following the rules of good scientific writing. With the help of anti-plagiarism tools one can avoid duplicate manuscripts at journal office. Thus, it gives immense respect and truthfulness toward science and gives the way for quality papers to publish. Lastly, by rejecting plagiarized articles at journal office by the editor, it also helps every author to think for newer concepts.[ 23 , 26 , 33 , 34 , 35 ]

Summarize the key message given by all the authors on plagiarism in different articles which are included in this review

An external file that holds a picture, illustration, etc.
Object name is AMHSR-4-193-g002.jpg

Summary and Conclusion

In order to publish a good scientific paper, one has to make an honest effort to read the original sources thoroughly and then put down one's own ideas or thoughts in his own words with proper paraphrasing, citation and by using quotation marks where ever necessary to avoid plagiarism.

With the advancement of technological field, even the dental and medical writers need to think new for ideas, concepts, techniques or for any hypothesis which further helps in the advancement of dental and medical field.

Source of Support: Nil.

Conflict of Interest: None declared.

Artificial intelligence in academic writing: Insights from journal publishers' guidelines

  • August 2024
  • Perspectives in Clinical Research Ahead of print
  • Ahead of print

Himel Mondal at All India Institute of Medical Sciences Deoghar

  • All India Institute of Medical Sciences Deoghar

Shaikat Mondal at Raiganj Government Medical College and Hospital

  • Raiganj Government Medical College and Hospital

Joshil Kumar at Government Medical College Keonjhar Odisha

  • Government Medical College Keonjhar Odisha

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Conner Ganjavi
  • Michael B Eppler
  • Asli Pekcan

Giovanni Enrico Cacciamani

  • Indian J Ophthalmol

Himel Mondal

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

American Psychological Association

In-Text Citations

In scholarly writing, it is essential to acknowledge how others contributed to your work. By following the principles of proper citation, writers ensure that readers understand their contribution in the context of the existing literature—how they are building on, critically examining, or otherwise engaging the work that has come before.

APA Style provides guidelines to help writers determine the appropriate level of citation and how to avoid plagiarism and self-plagiarism.

We also provide specific guidance for in-text citation, including formats for interviews, classroom and intranet sources, and personal communications; in-text citations in general; and paraphrases and direct quotations.

research article about plagiarism

Academic Writer ®

Master academic writing with APA’s essential teaching and learning resource

illustration or abstract figure and computer screen

Course Adoption

Teaching APA Style? Become a course adopter of the 7th edition Publication Manual

illustration of woman using a pencil to point to text on a clipboard

Instructional Aids

Guides, checklists, webinars, tutorials, and sample papers for anyone looking to improve their knowledge of APA Style

COMMENTS

  1. What is plagiarism and how to avoid it?

    Self plagiarism: "Publication of one's own data that have already been published is not acceptable since it distorts scientific record." 1 Self-plagiarized publications do not contribute to scientific work; they just increase the number of papers published without justification in scientific research. 8 The authors get benefit in the form of increased number of published papers. 8 Self ...

  2. Full article: The case for academic plagiarism education: A PESA

    Recent research testing tools for plagiarism detection 'show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic' (Foltýnek et al, Citation 2020). There are now more than twenty major PDS on the market.

  3. Plagiarism detection and prevention: a primer for researchers

    Creative thinking and plagiarism. Plagiarism is often revealed in works of novice non-Anglophone authors who are exposed to a conservative educational environment that encourages copying and memorizing and rejects creative thinking [12, 13].The gaps in training on research methodology, ethical writing, and acceptable editing support are also viewed as barriers to targeting influential journals ...

  4. Factors influencing plagiarism in higher education: A comparison of

    Over the past decades, plagiarism has been classified as a multi-layer phenomenon of dishonesty that occurs in higher education. A number of research papers have identified a host of factors such as gender, socialisation, efficiency gain, motivation for study, methodological uncertainties or easy access to electronic information via the Internet and new technologies, as reasons driving plagiarism.

  5. (PDF) Plagiarism in research

    An intellectual pr oduct of one's own. It is no accident that plagiarism is discussed in relation to research, although it is also clearly. relevant in r elation to music, literature, art, and ...

  6. Modern threats in academia: evaluating plagiarism and ...

    Plagiarism and research integrity are sensitive issues in the academic setting, especially after the recent offspring of artificial intelligence (AI) and large language models (LLMs) such as GPT-4.0 .

  7. Full article: Plagiarism awareness efforts, students' ethical judgment

    Ethics research has frequently reported that unintentional plagiarism is more prevalent among HE students and the root cause is, limited or no awareness of nuances of academic ethics concerning plagiarism which results in poor ethical judgments (Farahian, Avarzamani, and Rezaee Citation 2022; Ruedy and Schweitzer Citation 2010).

  8. Plagiarism in research

    Plagiarism is a well-known and growing issue in the academic world. It is estimated to make up a substantial part of the total number of serious deviations from good research practice (Titus et al. 2008; Vitse and Poland 2012).For some journals it is indeed a serious problem, with up to a third of the published papers containing plagiarism (Zhang 2010; Baždaric et al. 2012; Butler 2010).

  9. Plagiarism in Project Studies

    Plagiarism is not only an issue in conferences and low-quality publications. The Office of Research Integrity (ORI) argues that [p]lagiarism is "one of the most frequent areas of concern for journal editors (Wagner, 2011) with a third of retracted journal articles being due to plagiarism or self-plagiarism (Fang et al., 2012)."

  10. What Is Plagiarism, How to Identify It, and How to Educate to Avoid It

    ABSTRACT. Plagiarism is a continuing and growing concern in higher education and in academic publishing. Educating to avoid plagiarism requires ongoing efforts at all levels and clear policies that explain the several types of plagiarism and potential consequences when it is found.

  11. Academic Plagiarism Detection: A Systematic Literature Review

    This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of ...

  12. Similarity and Plagiarism in Scholarly Journal Submissions: Bringing

    Defining plagiarism and its prevalence in manuscripts. To begin with, plagiarism maybe defined as "when somebody presents the published or unpublished work of others, including ideas, scholarly text, images, research design and data, as new and original rather than crediting the existing source of it."13 The common types of plagiarism, including direct, mosaic, paraphrasing, intentional ...

  13. Perceptions of and Attitudes toward Plagiarism and Factors ...

    The abundance of information technology and electronic resources for academic materials has contributed to the attention given to research on plagiarism from various perspectives. Among the issues that have attracted researchers' attention are perceptions of plagiarism and attitudes toward plagiarism. This article presents a critical review of studies that have been conducted to examine ...

  14. Insight into modern-day plagiarism

    The word plagiarism is derived from the Latin word "Plagium," meaning manstealing or kidnapping. In terms of biomedical publication, the word plagiarism means stealing the work or the writings of another researcher and presenting as own. It can be both unintentional and intentional [ 9 ]. The World Association of Medical Editors states that ...

  15. Plagiarism in articles published in journals indexed in the Scientific

    This study analyzes the possible occurrence of plagiarism and self-plagiarism in a sample of articles published in the Scientific Periodicals Electronic Library (SPELL), an open database that indexes business journals in Brazil. The author compared one sample obtained in 2013 (n = 47 articles) and another selected from 2018 (n = 118 articles). In both samples, we verified the guidelines that ...

  16. What Constitutes Plagiarism?

    The research suggests that medical dramas may be a promising source for discussions of medical ethics. Cambra-Badii et al. (2021) explain that even when watched for entertainment, medical shows can help viewers engage emotionally with the characters and may prime them to be more receptive to training in medical ethics.

  17. Plagiarism in Scientific Research and Publications and How to Prevent

    There are ways to avoid plagiarism, and should just be followed simple steps when writing a paper. There are several ways to avoid plagiarism ( 1, 6 ): Paraphrasing - When information is found that is great for research, it is read and written with own words. Quote - Very efficient way to avoid plagiarism.

  18. PDF Plagiarism: A Global Phenomenon

    This article aims to collate seminal works on plagiarism which concentrate on the aspects- reasons, and types of plagiarism, and the role of education institutions to minimize plagiarism. Keywords: academic writing, plagiarism, reason, types, institution role. DOI: 10.7176/JEP/12-3-08. Publication date: January 31st 2021.

  19. Why do students plagiarise? Informing higher education teaching and

    ABSTRACT. Several interventions have been implemented across higher education institutions with the aim of reducing the prevalence of plagiarism internationally, yet research dedicated to understanding the situational and contextual factors that contribute to plagiarism in an Australian context has been minimal.

  20. What Is Plagiarism?

    The accuracy depends on the plagiarism checker you use. Per our in-depth research, Scribbr is the most accurate plagiarism checker. Many free plagiarism checkers fail to detect all plagiarism or falsely flag text as plagiarism. Plagiarism checkers work by using advanced database software to scan for matches between your text and existing texts.

  21. Factors affecting plagiarism among students at Jazan University

    Plagiarism has been described over the past decades as a multi-layer dishonesty phenomenon emerging in higher education. A number of research papers have described a host of factors such as gender, socialization, productivity benefit, study motivation, methodological uncertainty, or easy access to electronic information through the Internet and new technologies as the driving forces for ...

  22. The 5 Types of Plagiarism

    Table of contents. Global plagiarism: Plagiarizing an entire text. Verbatim plagiarism: Copying words directly. Paraphrasing plagiarism: Rephrasing ideas. Patchwork plagiarism: Stitching together sources. Self-plagiarism: Plagiarizing your own work. Frequently asked questions about plagiarism.

  23. Knowing and Avoiding Plagiarism During Scientific Writing

    A total of 893 articles are published on plagiarism under MeSH. After filtering and based on the selection criteria, 35 articles were included in this review. ... Thus, every young author tries to learn how to write or present an article or research work in his own words by following the rules of good scientific writing. With the help of anti ...

  24. (PDF) Artificial intelligence in academic writing: Insights from

    Plagiarism, intentional or unintentional or both, is considered as misconduct, and it is disgraceful to authors. It may also warrant future legal complications. Furthermore, it may reduce the ...

  25. In-text citations

    APA Style provides guidelines to help writers determine the appropriate level of citation and how to avoid plagiarism and self-plagiarism. We also provide specific guidance for in-text citation, including formats for interviews, classroom and intranet sources, and personal communications; in-text citations in general; and paraphrases and direct quotations.