March 15, 2016 9:46 pm

Creating a Simple Plagiarism Checker in Node.js

We are going to create a module which can be used by any application that needs plagiarism checking. For example, it can be used in a website that posts articles on web development to check if the submitted article has not been copied from another source. The module would return JSON and it can be hooked to different routes and application logic to perform different logic depending on the plagiarism score.

Creating a Simple Plagiarism Checker in Node.js

The plagiarism checker has only two dependencies, cheerio and proxyneedle. Cheerio would be used to traverse the DOM of the Bing search engine (to check if input text actually exists in the Web) with jQuery selectors and proxyneedle would be used to make server requests to Bing behind different proxies so that we limit the possibility of being denied access.

Package.json

Our module first requires its dependencies:

Run.js

Then, we export our main function which will be available to anyone else who uses the require statement on the run.js file.

The exported function accepts a text string which will be the text to check for plagiarism, a settings object and a callback which will be called with potential errors and the results of the plagiarism check.

Thereafter, we parse the text to an array of sentence excerpts using the private parseText function and if that function returns something we call the queryBing function with several object properties which serve the function if initialization. We also pass it the callback. That is all that the exported function does.

The parseText function checks if the text contains more characters than what is allowed in the settings that were passed to our module or the default of 1000. If the characters exceeds the limit – it stops the plagiarism check.

Otherwise, it divides the text into an array of sentences and further divides this array into array of sentence excerpts which cannot contain more characters than what is allowed in the settings or the default of 14. Finally, it adds a plus sign and quotes signs around the sentence excerpts to indicate to Bing that we need exact match.

The queryBing function sets up the necessary variables from the object passed to it. Then, it uses proxyNeedle to make a request to Bing and calls parseGooglePage when the response comes.

parseGooglePage takes the Bing response body as parameter. Then, it loads the body to cheerio and checks if there are any results in Bing. If there are results – the sentence excerpt with a particular index is plagiarized, if there are no results in Bing for it – it is unique. Therefore, if there are results it increments the number of duplicates and adds the sentence excerpt to the array of duplicated strings. Finally, it increments the index (of the sentence excerpt that we should look for in the array), recursively calls itself if there are more sentence excerpts to check or otherwise returns the results.

To use the Plagiarism Checker in your own module you would have to require it and call it like that ( the settings are optional):

Creating a Simple Plagiarism Checker in Node.js

Figure : Sending the returned JSON data from the Plagiarism Checker as a response

This is just a Plagiarism Checker made mostly for illustrative purposes. It makes a new request to Bing for each sentence separated by a dot (.) in the user’s text so it use wisely.

Tutorial Categories:

Author Ivan Dimov

Ivan is a student of IT, a freelance web designer/developer and a tech writer. He deals with both front-end and back-end stuff. Whenever he is not in front of an Internet-enabled device he is probably reading a book or traveling. You can find more about him at: http://www.dimoff.biz. facebook, twitter


Tutorial Categories:
  • Please how do I run this demo app on window 10 PC. I have already installed node.js but I don’t know how to run. Thanks

    • Ivan Dimov

      type:
      node app.js
      (in Terminal and when you are in the directory where the file is)