Through a Summer course as part of the Inspirit program, a program hosted my Stanford students, I wrote a research paper. I picked a research topic relating to NLP and plagiarism as cheating is a problem close to my heart. I used a pretrained LLM based off of BERT and two plagiarism datasets MRPC and CPC. A problem I ran into with the project was that the definitions of plagiarism were diverse. What does it mean to plagiarize? Verbatim copying? Paraphrasing? Taking ideas and rewriting them? The two datasets had slightly different definitions leading to poor generalization. Another problem was that just a simple example like John hates Claire, and Claire hates John would be flagged as plagiarism! This is because for an LLM who is trained through learning what words come in what context. Two proper nouns show up in about the same spots in sentences thus the model never really learns that they are very different I found the experience very rewarding and made me realize just how much work goes into research papers. I appreciate all the papers I read a lot more. I also found it exciting how much work there is left in palgiarism detection. I think it is become increasingly problematic as AI tools become more powerful.