How Plagiarism Detection Works at HackerRank

Written By HackerRank | March 16, 2023

Preventing plagiarism in online assessments has always been important. But the widespread availability of AI tools has reinforced the need for plagiarism strategies that ensure all developers have an equal shot at landing job opportunities that match their unique skill sets and professional aspirations.

HackerRank’s mission is to accelerate the world’s innovation by focusing hiring decisions on skill, not pedigree. We do so by giving all developers the opportunity to showcase their skills in a fair and equitable testing environment. The integrity of the questions that comprise these coding tests is critical for developers and employers to feel confident in their fairness and efficacy.

We’ve found that a proactive plagiarism prevention and detection policy is the best approach for combating plagiarism, ensuring the efficacy of our tests, and providing a fair way for all developers to demonstrate their skills.

HackerRank’s Plagiarism Strategy

Assessment integrity at HackerRank has three core pillars: proctoring tools, plagiarism detection, and DMCA takedowns.

Proctoring Tools

One important component of ensuring assessment integrity is to build systems that provide the right proctoring capabilities. Our approach to proctoring is to capture a variety of behavioral signals, including tab proctoring, copy-paste tracking, image proctoring, and image analysis.

The purpose of proctoring is twofold. First, proctoring tools help prevent plagiarism by acting as a deterrent. Candidates who know that proctoring is in place are less likely to engage in such activity. Second, proctoring tools record data points that support plagiarism detection.

Plagiarism Detection

In addition to proctoring tools, the integrity of an assessment also relies on plagiarism detection. In other words, the ability to flag when a candidate likely received outside help.

The current industry standard for plagiarism detection relies heavily on MOSS code similarity. Not only does this approach often lead to higher false positives rates, but it also unreliably detects plagiarism originating from conversational AI or large language models. That’s because conversational AI can produce original code, which circumvents similarity tests.

Instead, HackerRank uses a machine-learning based plagiarism detection model to characterize coding patterns and check for plagiarism based on a number of signals. The model also uses self-learning to analyze past data points and continuously improve its confidence levels.

The result is a new ML-based detection system that is three times more accurate at detecting plagiarism than traditional code similarity approaches—and can detect the use of external tools such as conversational AI. This dramatically reduces the number of false positive plagiarism flags and ensures all developers are being judged in a fair and equitable testing environment.

DMCA Takedowns

The Digital Millennium Copyright Act (DMCA) is a United States copyright law that provides a legal framework for how copyright owners, online service providers, and users engage with copyrighted content. A DMCA takedown is when a copyright holder requests a website or online community to remove content that they believe infringes on their intellectual property.

DMCA isn’t a perfect system, and we recognize there are some drawbacks to pursuing a takedown policy. However, we’ve found that a proactive DMCA policy is necessary to minimize the spread of leaked questions, combat plagiarism, and provide a fair way for all developers to demonstrate their skills.

Accordingly, our DMCA approach centers on:

Ensuring a fair hiring opportunity for every developer by reducing plagiarism and upholding question integrity.
Conducting an intensive manual review process to validate claims, with particular care taken to protect open source and developer communities from mistaken requests.

Through an extensive review process, we identify, review, and request the takedown of content we believe to be question leaks. Reducing the number of leaked questions reduces the opportunity for candidates to commit plagiarism through the use of leaked solutions.

What Does Our Plagiarism Flag Mean for You?

If our detection system identifies a potential case of plagiarism, it issues a plagiarism flag, which indicates that the candidate might have copied their code or solution. We recommend that hiring teams conduct a manual review of the flagged code to ensure a false positive doesn’t disqualify an honest candidate. We recommend hiring teams refrain from auto-rejecting a candidate based on the plagiarism flag. Ultimately, the decision on how to respond to a plagiarism flag is up to hiring teams, and specific policies will vary with each employer.

Frequently Asked Questions

Can Your Plagiarism Detection System Detect Code From ChatGPT?

Yes. Our AI-enabled plagiarism detection system feeds several proctoring and user-generated signals into an advanced machine-learning algorithm to flag suspicious behavior during an assessment. By understanding code iterations made by the candidate, the model can detect if they copied and pasted code from an external source. However, it isn’t possible to identify what source the candidate used to obtain or create the code.

Does Your Plagiarism Detection System Automatically Fail Candidates?

No. Our detection system identifies potential cases of plagiarism and empowers hiring teams to decide if it’s an actual case of plagiarism.

I Still Have Questions About Plagiarism. Who Should I Contact?

If you’re a customer looking for support on plagiarism and its impact on your business, you can contact your customer success manager or our team at support@hackerrank.com.