25% of technical assessments show signs of plagiarism.
While it’s impossible for companies to fully prevent plagiarism—at least without massively degrading the candidate experience—plagiarism detection is critical to ensuring assessment integrity. It’s important that developers have a fair shot at showcasing their skills, and that hiring teams have confidence in the test results.
And the standard plagiarism detection method used by, well, everyone, is MOSS code similarity.
MOSS Code Similarity
MOSS (Measure of Software Similarity) is a coding plagiarism detection system developed at Stanford University in the mid-1990s. It operates by analyzing the structural pattern of the code to identify similarity, even when identifiers or comments have been changed, or lines of code rearranged. MOSS is incredibly effective at finding similarities, not just direct matches, and that effectiveness has made it the de facto standard for plagiarism detection.
That doesn’t mean MOSS is flawless, however. Finding similarity doesn’t necessarily translate to finding plagiarism, and MOSS has a reputation for throwing out false positives, particularly when faced with simpler coding challenges. In our own internal research, we’ve found false positive rates as high as 70%.
AI changes the game
While not perfect, MOSS has been a “good enough” standard for years. Until the advent of generative AI tools like ChatGPT.
ChatGPT has proven effective at solving easy to medium difficult assessment questions. And with just a bit of prodding, it’s also effective at evading MOSS code similarity. Let’s see it in action:
Step 1: We asked ChatGPT to answer a question and it did so, returning a solution as well as a brief explanation of the rationale.
Step 2: Next, we directly asked ChatGPT to help escape MOSS code similarity check, and it refused.
Step 3: However, with some creative prompting, ChatGPT will offer unique approaches. And the way that ChatGPT’s transformer-based model works, it generates distinct answers every time, giving it a huge advantage in bypassing code similarity detection.
Here are three different prompts and three totally different approaches. Note that ChatGPT transforms many variable names from the initial solution to evade code similarity checks.
Step 4: The moment of truth! When we submitted the revised answer through plagiarism detection, it passed cleanly.
What’s the implication?
Basically, MOSS code similarity checks can be easily bypassed with ChatGPT.
Time to panic?
If MOSS code similarity can be bypassed, does that mean that technical assessments can no longer be trusted?
On one hand, it’s easier for candidates to bypass the standard plagiarism check that the entire industry has relied upon. So, yes, there is a risk to assessment integrity.
On the other hand, plagiarism detection has always been a compromise between effectiveness and candidate experience. MOSS is not intrusive, but its high false positive rates render it less definitive than it could be. Ultimately, it’s not really detecting plagiarism. It’s detecting patterns in the code that could be plagiarism.
Move over, MOSS
What happens now?
Plagiarism detection gets rethought for the AI era. Expect companies to scramble for better versions of MOSS, more complex questions, different question types, and more to make up the difference.
At HackerRank, we’ve taken a different approach. While we’re always improving our question library and assessment experience, we’ve completely rethought plagiarism detection. Rather than relying on any single point of analysis like MOSS Code Similarity, we built an AI model that looks at dozens of signals, including aspects of the candidate’s coding behavior.
Our advanced new AI-powered plagiarism detection system boasts a massive reduction in false positives, and a 93% accuracy rate. In real-world conditions, our system repeatedly detects ChatGPT-generated solutions, even when those results are typed in manually, and even when they easily pass MOSS Code Similarity.
What happens when the example shown above gets submitted through our new system? It gets flagged for suspicious activity.
Clicking into that suspicious activity reveals that our model identified the plagiarism due to coding behaviors.
What’s more, hiring managers can replay the answer keystroke by keystroke to confirm the suspicious activity.
There’s nothing even close to it on the market, and what’s more, it’s a learning model, which means it will only get more accurate over time.