Reading Time: 6 minutes
Photo by Marcel Strauß on Unsplash

I am no expert on the use of generative AI; all I can claim is that I learn by trying it out. Here, for better or worse, is one small lesson I learned inadvertently by playing with ChatGPT. The examples below do not, of course, have any statistical relevance, but I hope they are revealing – they certainly were an eye-opener for me.

I was prompted by the interesting post that asked the question “Can generative AI add anything to academic peer review?” which at first sight seemed to me to be little or nothing.  

Uploading a paper

Out of curiosity, I decided to upload a paper and see what results I got. So I went to the free ChatGPT site, GPT-3.5, aware that it is not up to date, and asked it to review a paper, just as I would a human reviewer. I keyed in “is the following text worth publishing? I want to carry out a peer review on it”, but inadvertently forgot to add the text of the paper. I was astonished to get a lengthy reply, even when I hadn’t given them the paper! This is clearly better than a human could ever manage.

Here is the start of the response I was given:

It goes on for another five points. Now, all these recommendations are sensible, but of course they have no specific reference to the paper I had (not) submitted. It is clear that a generative AI tool of this kind has a built-in question-answering capability quite separate to any concept-based AI algorithm that might be used to assess the uploaded paper. Such an ability is very similar to the question-answering tools Google has been steadily creating over the past few years on the Google interface (see “Google: Mind finally united with Brain”).

After this surprise discovery, I uploaded the abstract of an article that had already had a review by a human. I picked a (published) life science article, entitled “How do eubacterial organisms manage aggregation-prone proteome?” There are plenty of examples of public review; I found one on the F1000 platform (which does open peer review). Again, I got detailed feedback, but in excess of what I had submitted. Given that I had only submitted an abstract, why was Chat-GPT stating “the introduction provides context for the research”, when there is no introduction? Why is there a reference to “the discussion section” when there isn’t one?

As before, ChatGPT responded with some helpful (but unspecific) procedural tips, for example, ff you are going to write an academic article, it should have an introduction, a methodology section, some results and a discussion.

What about the article itself? Again, using only the abstract, ChatGPT stated “this article addresses an important and relevant topic in molecular biology and genomics”. Was that genuine, or was it simply based on the assertions of the authors, who state, as article authors always should, that their article has a novel deduction or discovery?

I’m no molecular biologist, but the system certainly identified the key assertion. I’m not so sure that it had (by implication) compared it with similar assertions in other papers, and hence concluded the paper states something new, because it didn’t state its working.

Equally, I’m not sure that ChatGPT quite understood the assertion. ChatGPT states “contrary to previous expectations, the study suggests that eubacterial proteomes with lower GC content are more prone to protein aggregation”. The original text states “we found that the aggregation propensity of proteomes increases with GC content”.

Human peer reviewers identified the key statement as: “genomes containing a poor percentage of GC pairs lead to proteomes more prone to aggregation than those encoded by GC-rich genomes”.  So the machine has identified the key assertion of the article; yet I’m not sure if it really evaluates it, apart from the bland statement, that this article is “important and relevant”.

ChatGPT result (using ChatGPT-3.5) Sept 2023

Checking the full text

When I uploaded the complete article with all the text, including “results and discussion”, “conclusions”, and so on, I got some detailed (and itemised) recommendations – in fact, much better than many human reviews, which often tend to ramble. The results warrant a detailed comparison by a subject-matter expert (which I am not).

Peer Review of Manuscript: “How do eubacterial organisms manage aggregation-prone proteome?”

Summary: The manuscript investigates the relationship between eubacterial genome GC content and protein aggregation propensity, as well as chaperone-dependence. The study uses machine learning models to classify proteomes based on these factors. Contrary to previous assumptions, the research suggests that lower GC content is associated with increased protein aggregation and decreased chaperone dependence.

General Assessment: The manuscript addresses a relevant and intriguing question in molecular biology and genomics. It presents a well-structured research study that challenges established hypotheses and offers new insights. However, there are some points to consider for improvement:

1. Clarity and Organization: The manuscript is generally well-organized, with a logical flow from introduction to methods, results, and conclusions. It could benefit from clearer headings and subheadings to enhance readability.

2. Introduction and Background: While the introduction provides context for the research, it could be more comprehensive by including a broader review of relevant literature and prior studies in the field. A clearer rationale for the study and a description of the knowledge gap would enhance the introduction.

Identifying discredited claims

For a final test, I tried uploading a paper containing an assertion that is widely discredited. I tried a humanities paper, for a change, and I found one of the earliest (if not the earliest) claims that Francis Bacon was the author of Shakespeare’s plays. This paper is actually in the form of a letter, published in 1856, and digitised as part of the Google Books project.

The letter was written by William Henry Smith in 1856, and is entitled “Was Lord Bacon the Author of Shakespeare’s Plays? A Letter to Lord Ellesmere”. I asked the same question of ChatGPT: is this article worth publishing? And I got the following, remarkable (and accurate) response:

The manuscript you’ve provided is a historical document written by William Henry Smith in 1856. It appears to be a letter addressed to Lord Ellesmere, discussing the authorship of William Shakespeare’s plays and suggesting that Francis Bacon might be the true author. The text presents arguments and evidence to support this theory.

To assess whether this manuscript is worth publishing in a contemporary context, several factors need to be considered:

Relevance: The manuscript addresses a historical debate about Shakespearean authorship, which has been largely settled in favor of William Shakespeare as the author of the plays. The argument for Francis Bacon as the author is considered a minority viewpoint with limited scholarly support. In the context of current Shakespearean scholarship, this argument may not be considered relevant or credible.

Credibility: The manuscript presents an alternative theory about the authorship of Shakespeare’s plays, but it lacks strong, widely accepted evidence to support this claim. Contemporary scholars have extensively researched and analysed Shakespeare’s works and have found substantial evidence linking them to William Shakespeare. Therefore, the manuscript may not be considered credible in the context of modern scholarship.

Historical Interest: While the argument presented in the manuscript may not hold up to contemporary scholarly scrutiny, it can still be of historical interest as it reflects the debate and theories that were circulating in the mid-19th century. It provides insight into the evolution of Shakespearean authorship theories.

Academic Contribution: If the goal is to contribute to academic scholarship on Shakespearean authorship, contemporary scholars would expect a more rigorous analysis of evidence and engagement with current research on the topic. The manuscript appears to rely on conjecture and speculation rather than empirical evidence.

Readability and Style: The manuscript’s language and style are consistent with the period in which it was written, which may make it less accessible to modern readers. Publishing it as a historical document may require contextualization and explanation for contemporary audiences.

In summary, while the manuscript may have historical value as a reflection of 19th-century debates on Shakespearean authorship, it is unlikely to be considered a credible or relevant contribution to contemporary Shakespearean scholarship. Its publication today would likely be more for historical and archival purposes rather than for advancing our understanding of Shakespeare’s works. So ChatGPT passed this test with flying colours. This conclusion could only be reached first by finding the article in a corpus, and then having some awareness of subsequent discussion of the article (there is an article in Wikipedia on “The Baconian Theory of Shakespeare Authorship”, for example).


What conclusion can be drawn? First, generative AI has been designed always to provide an answer – even if you haven’t asked the question properly. Secondly, until and unless it reveals what it has or has not compared, our use of generative AI will always be based on trust rather than on evidence. On the other hand, in some respects, such as giving sensible advice, and spotting discredited (and out of date) claims, ChatGPT is remarkable. It is clear there is a place for it in the academic process; the next question is how to integrate it sensibly.