Five Ways to Defeat Automated Plagiarism Detection

Increasingly, unethical authors and predatory publishers are learning new tricks to make it more difficult to detect plagiarism in their writings and published articles. Here are five methods they are using to defeat automated plagiarism detection programs.

1. PDF files are made up of layers. One layer is the visual layer, and another one is the text layer. It is possible to alter the unseen text layer in a PDF file by changing all the letters to mojibake. For example, when you highlight the text in the article below, and then do Control + C to copy the text, when the text is pasted into Notepad, it is only garbage characters:

Copy the text, paste garbage characters.

Copy the text, paste garbage characters.

This makes it impossible for automated reading of the text. Unfortunately, for the authors, it means their papers are not crawled in Google, making them impossible to find using the search engine.

2. By coincidence, some letters in the Latin character set match letters in others. For example, the Latin letter e looks almost exactly the Cyrillic letter le:[1]

Latin e: =    e

Cyrillic le= е

 Other examples:

 Latin a =     a

Cyrillic a =  а

Latin o =     o

Cyrillic o = о

 There are additional matches with Latin letters and letters in the Greek character set. To exploit these similarities in the context of defeating plagiarism detection, someone would use a “find and replace” function, replacing Latin letters with similar-looking letters from other character sets. While some plagiarism detection programs are programmed to deal with this hack, not all are, so this trick may be successful in some systems.

3. Another trick is to use the find-and-replace feature to convert all spaces in a document to a character from a foreign characters set, and then use find-and-replace to convert that character to the color white, so it appears as a space again. Example:

Original text:

 Colorado is a U.S. state that encompasses most of the Southern Rocky Mountains as well as the northeastern portion of the Colorado Plateau and the western edge of the Great Plains. Colorado is part of the Western United States, the Southwestern United States, and the Mountain States. [2]

Text with spaces changed to another [Chinese] character:

Colorado画is画a画U.S.画state画that画encompasses画most画of画the画Southern画Rocky画Mountains画as画well画as画the画northeastern画portion画of画the画Colorado画Plateau画and画the画western画edge画of画the画Great画Plains.画Colorado画is画part画of画the画Western画United画States,画the画Southwestern画United画States,画and画the画Mountain画States.

Text with foreign character changed to white:

ColoradoisU.S.statethatencompassesmostof

theSouthernRockyMountainsaswellas the

northeasternportionoftheColorado

Plateauandthewesternedgeofthe

GreatPlains.画 ColoradoispartoftheWestern

UnitedStates,theSouthwesternUnitedStates,

andthe画 MountainStates.

This technique does leave some spacing problems that could be taken care of manually. You can select the text above to see the hidden characters.

4. The next method is called thesaural substitution. This simply means changing words in the text to words with the same meaning. This can be done using a manual or automated process. For example, re-using the Colorado paragraph above, we might take the original text and massage it into a similar text:

Original: Colorado is a U.S. state that encompasses most of the Southern Rocky Mountains as well as the northeastern portion of the Colorado Plateau and the western edge of the Great Plains. Colorado is part of the Western United States, the Southwestern United States, and the Mountain States. [2]

Edited: Colorado is a United States state that includes most of the Southern Rocky Mountains and also the northeastern section of the Colorado Plateau and the western part of the Great Plains. Therefore, Colorado is a component of the Western United States, the Mountain States, and the Southwestern United States.

5. The last trick is to find an article that is only written in a foreign language and then translate it to English using an automatic translator or by translating it manually. Because the plagiarism detection software’s database is unlikely to have the article in its original language, it will not detect the article as plagiarized.

Conclusion:

Authors who commit plagiarism want to hide evidence of their plagiarism. Predatory publishers who knowingly publish articles containing plagiarism want to prevent the plagiarism from being detected. In both cases, they sometimes use tricks to avoid detection by automated plagiarism detection programs.

The tricks can prevent open-access articles from being properly indexed in search engines. They can facilitate the publication of work that should never be published. Not all plagiarism detection software is the same, and we hope that software developers are able to defeat plagiarists’ tricks.

References

1. Gillam, L., Marinuzzi, J., Ioannou, P. “TurnItOff: defeating plagiarism detection systems.” In: 11th Higher Education Academy-ICS Annual Conference, University of Durham, 24–26 Aug 2010, UK.

2. Text taken from the Wikipedia article for Colorado.

13 Responses to Five Ways to Defeat Automated Plagiarism Detection

  1. Yurii Chinenov says:

    #3 – personally saw it.

  2. Frank Lu says:

    Nothing beats eyes and gut instinct. Automated software can only do so much. If the file cannot be processed by software, just reject it. No need for an explanation. If your eyes see some bad things, reject it. Again, no need for an explanation. We are here to protect the integrity of the journal and the entire peer review system, not to serve the wishes of disingenuous writers.

    Some of these guys will holler like crazy, even threatening to sue or curse that they will not submit an article to the journal again, but they ALL go out in a whimper. Honestly, we don’t need to have them in journals that we volunteer our time to serve.

  3. Nils says:

    Yet another good reason for legit journals to favor submissions in LaTeX.

  4. […] “Increasingly, unethical authors and predatory publishers are learning new tricks to make it more difficult to detect plagiarism in their writings and published articles. Here are five methods they are using to defeat automated plagiarism detection programs …” (more) […]

  5. deborah says:

    all of these tricks also make the text inaccessible to people using screenreaders.

  6. Guria says:

    Thank you so much. Dishonest people never stop finding ways to cheat.

  7. Ferdinand says:

    Today I encountered this site, and guess what, just a few hours later I discovered a plagiarized but slightly rewritten article. The plagiarizing article is in http://www.uap-bd.edu/jcit_papers/vol-1_no-1/JCIT-100715.pdf and the plagiarized article is http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1024422&url=http%3A%2F%2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D%26arnumber%3D1024422
    Text is somewhat rewritten, and figures are arranged but very similar.

    The plagiarizing article is in the Journal of Cases on Information Technology (JCIT), published by IGI Global. I get often spam from this publisher, inviting me to submit an article to their journals, so I guess it is a predatory publisher. Not yet on your list, though.

  8. Eu mesma says:

    If I find a text in ABC language, and google translate it to XYZ language, ( citing the scientific sources etc) do i get caught? I have work, kids and all that can’t afford the time for my master thesis! But I do speak 4 languages ;) Let me know please!!!

  9. Yes, for somebody cheating is a way of life(((

  10. coco says:

    how to change pdf to mojibake letters?

  11. Charuka says:

    is there any automated process as you describe in 3rd point?

Leave a Reply -- All comments are subject to moderation, including removal.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,578 other followers

%d bloggers like this: