AI Writing Detection in SimCheck Similarity Reports

Last updated: April 3, 2023

SimCheck is launching a preview of its AI writing detection capabilities. This feature may help instructors identify when AI writing tools, such as ChatGPT, have possibly been used to write any part of the content submitted in a student’s assignment. The AI writing detection capability will only be available in SimCheck through the end of 2023.

An AI detection percentage will be added to the SimCheck Similarity report. Bear in mind that the AI detection results will not be visible to students; only instructors and administrators will be able to see it. Additionally, SimCheck will only be able to process submissions written in long-form English for AI detection.

If you plan to use the score as part of assessment in your course, be mindful that AI technology is constantly evolving along with the tools that attempt to detect it. There are many ways to address the use of AI in your course.

Frequently Asked Questions (FAQ)

How do SimCheck’s AI writing detection capabilities work?

SimCheck has added an AI writing indicator to the Similarity Report. It shows an overall percentage of the document that AI writing tools, such as ChatGPT, may have generated. The indicator further links to a report which highlights the text segments that our model predicts were written by AI. Please note, only instructors and administrators are able to see the indicator.

While SimCheck has confidence in its model, SimCheck does not make a determination of misconduct, rather it provides data for the educators to make an informed decision based on their academic and institutional policies. Hence, we must emphasize that the percentage on the AI writing indicator should not be used as the sole basis for action or a definitive grading measure by instructors.

When a paper is submitted to SimCheck, the submission is first broken into segments of text that are roughly a few hundred words (about five to ten sentences). Those segments are then overlapped with each other to capture each sentence in context.

The segments are run against the AI detection model and each sentence is given a score between 0 and 1 to determine whether it is written by a human or by AI. If the model determines that a sentence was not generated by AI, it will receive a score of 0. If it determines the entirety of the sentence was generated by AI it will receive a score of 1.

Using the average scores of all the segments within the document, the model then generates an overall prediction of how much text (with 98% confidence based on data that was collected and verified in SimCheck’s AI Innovation lab) in the submission is believed to have been generated by AI. For example, when SimCheck says that 40% of the overall text has been AI-generated, it’s 98% confident that is the case.
Currently, SimCheck’s AI writing detection model is trained to detect content from the GPT-3 and GPT-3.5 language
models, which includes ChatGPT.

Will students be able to see the results?

The AI writing detection indicator and report are not visible to students.

What does the percentage in the AI writing detection indicator mean?

The percentage indicates the amount of qualifying text within the submission that SimCheck’s AI writing detection model determines was generated by AI (with 98% confidence based on data that was carefully collected and verified in a controlled lab environment). This qualifying text includes only prose sentences, meaning that we only analyze blocks of text that are written in standard grammatical sentences and do not include other types of writing such as lists, bullet points, or other non-sentence structures.

This percentage is not necessarily the percentage of the entire submission. If text within the submission is not considered long-form prose text, it will not be included.

The percentage shown sometimes doesn’t match the amount of text highlighted. Why is that?

Unlike the Similarity Report, the AI writing percentage does not necessarily correlate to the amount of text in the submission. SimCheck’s AI writing detection model only looks for prose sentences contained in long-form writing. Prose text contained in long-form writing means individual sentences that make up a longer piece of written work, such as an essay, a dissertation, or an article, etc. The model does not detect AI-generated text such as poetry, scripts, or code. Nor does it detect short-form/unconventional writing such as bullet points, tables, or short exam answers.

What is the accuracy of SimCheck’s AI writing indicator?

SimCheck only flags something as AI-written when they are 98% sure it is written by AI. This means, however, that the tool will likely miss up to 15% of text written by AI, with a less than 1% false positive rate (incorrectly identifying fully human-written text as AI-generated).

For example, if it’s identified that 50% of a document is written by AI, SimCheck is 98% sure that at least 50% is written by AI with a less than 1% false positive rate, but it could contain as much as 65% AI writing.

What is the difference between the Similarity score and the AI writing detection percentage? Are the two completely separate or do they influence each other?

The Similarity score and the AI writing detection percentage are completely independent and do not influence each other. The Similarity score indicates the percentage of matching-text found in the submitted document when compared to SimCheck’s comprehensive collection of content for similarity checking.

The AI writing detection percentage, on the other hand, shows the overall percentage of text in a submission that SimCheck’s AI writing detection model predicts was generated by AI writing tools.

Which AI writing models can SimCheck’s technology detect?

The first iteration of SimCheck’s AI writing detection capabilities have been trained to detect models including GPT-3, GPT-3.5, and variants. SimCheck’s AI writing detection can also detect other AI writing tools that are based on these models such as ChatGPT, with plans to expand detection capabilities to other models in the future.