A Technology Wishlist from the Sage Research Integrity Team

By Jen Rymont

Interest in research integrity and ethics in academic publishing has skyrocketed in recent years. With the rise of paper mills, AI, and a news story almost every week about data manipulation scandals, it’s no surprise that the academic community is increasingly focused on these issues. This newfound attention has led to a rise in research integrity startups - companies creating research integrity tools in the hopes of combatting these modern threats.

At Sage, our Research Integrity (RI) team recognizes the critical need for advanced RI Technology to effectively manage the growing volume of caseloads. As part of our work with the STM Integrity Hub, we’ve worked with start-ups to harness the power of technology to address these challenges in academic publishing.  We are actively seeking tools that can enhance our investigative efforts and proactively address problematic content before it even reaches our desks.

So, we decided to explore all the possibilities: what’s on our technology wish list? What kind of tool would get us excited, be a game changer, and seamlessly fit into the work we do every day?

Here’s what tops our list:

  1. An integrated AI detection tool in submission systems: As AI written content becomes increasingly common in journal submissions, the academic community has largely come to accept it as part of our future [see also here]. However, research integrity experts are grappling with how AI will be part of this future and how we can ensure its ethical and transparent use. Currently, our team and editorial staff rely on experience to identify AI-written content. This can be time-consuming and, as AI tools become more advanced, more difficult to differentiate between AI and human-written content. Although AI detection tools exist, embedding one to automatically screen submissions for suspicious or duplicated text would not only save time, but would also be a natural deterrent to unethical use of AI in academic work. (While this type of tool would be incredibly useful, human nuance is still needed to make the final decision to avoid making snap judgements and unfairly targeting non-native English language speakers – read more about the potential drawbacks of relying on AI in Christa’s blogpost.)

  2. An integrated tool for detecting dummy emails in submission systems: One of the biggest indicators of paper mill activity are emails that bounce back, or email addresses that appear inconsistent or unusual. However, these suspicious or fake email addresses are often only identified after a complaint on a published paper has been filed, by which point the damage to the academic record has already been done. We would love a tool capable of detecting suspicious emails at the point of submission. Key indicators might include whether the email is active, if it can receive emails, or if the email address was recently created.

  3. A conflict of interest database: One of the most time-consuming tasks in peer review or research integrity work is vetting author information. This includes verifying an author's credentials for authorship and searching for suitable reviewers during post-publication review. Our current approach, which relies on online searches is inefficient - it’s hard to review search engines quickly and search algorithms do not cater to research integrity. A database where you can quickly look up an author and their relationship to others, highlighting potential conflicts of interest, would enhance both our efficiency and efficacy. This kind of database could also offer detailed insights into researcher networks and funding affiliations.

  4. A database for grant numbers and institutional review board approvals: In our work, we frequently encounter the issue of paper mills recycling metadata across multiple publications. A system that allows us to verify registration codes and identification numbers (i.e. ethical approval numbers and grant numbers) upon submission would help us identify when a unique identifier is being recycled and confirm the authenticity of an article’s metadata. This tool could also be useful for institutions, ensuring no one is fraudulently claiming funding or approval using their name.

  5. Embedded tool for detecting unidentifiable institutions: During investigations into paper mills or articles containing fraudulent data, we frequently encounter researchers claiming affiliations that are not searchable, either citing departments that don’t exist or using the name of obscure or fictitious institutions. Given that most institutions and their departments have some sort of online presence, an embedded tool that automatically verifies an author would be invaluable for our author checks. While tools that verify institutional affiliation do exist, it’s very labor-intensive to check this manually for every author. An embedded tool would give us a clearer picture of potentially problematic authors or papers, allowing us to focus our verification work more effectively and save time.

We hope this information provides a clearer understanding of the current problems we face in research integrity and inspires the creation of new technologies. As previously mentioned, most of our checks are manual and at the mercy of Google search algorithms. Developing tools embedded in the submission systems would help stop problematic papers from entering the academic sphere at the point of submission and avoid time spent on investigations and retractions after the fact.

Works Cited

“The situation has become appalling’: fake scientific papers push research credibility to crisis point” Robin McKie. The Guardian. Feb 3, 2024. https://www.theguardian.com/science/2024/feb/03/the-situation-has-become-appalling-fake-scientific-papers-push-research-credibility-to-crisis-point

“Harvard cancer institute moves to retract six studies, correct 31 others amid data manipulation claims” Matt Egan. CNN Business. Jan 22, 2024. https://www.cnn.com/2024/01/22/business/harvard-dana-farber-cancer-institute-data-manipulation-claims/index.html

“Superconductivity scandal: the inside story of deception in a rising star’s physics lab” Dan Garisto. Nature. March 8, 2024. https://www.nature.com/articles/d41586-024-00716-2

“USC neuroscientist faces scrutiny following allegations of data manipulation” Corrine Purtill and Melody Peterson. Nov 23, 2023. https://www.latimes.com/science/story/2023-11-24/usc-neuroscientist-faces-scrutiny-after-data-allegations

“Papers and peer reviews with evidence of ChatGPT writing” Retraction Watch. https://retractionwatch.com/papers-and-peer-reviews-with-evidence-of-chatgpt-writing/

“AI in academia: It's time to stop being a salamander”. Benjamin Koelkebeck. Columbia Missourian. April 13, 2024. https://www.columbiamissourian.com/news/higher_education/ai-in-academia-its-time-to-stop-being-a-salamander/article_bb708aea-d25f-11ee-89ea-bf096bbb6bab.html

“The Spread of Artificial Intelligence in Academia”. Karuna Namala and Amber Guerrero. May 6, 2024. https://www.thecorsaironline.com/corsair/2024/4/29/the-spread-of-artificial-intelligence-in-academia

About the Author