Letter from Peter Wilmshurst to UCL President & Provost Prof Spence

Following the publication of our letter in the BMJ “Time to retract Lancet paper on tissue engineered trachea transplants” (doi: https://doi.org/10.1136/bmj.o498, published 02 March 2022), Peter Wilmshurst has written to UCL President & Provost Prof Spence. I reproduce his letter with his authorization. It is (or will shortly) also be cross-posted on Leonid Schneider’s blog.

4 April 2022,

Dear Professor Spence

On 18 May 2021 I wrote to you “to enquire what University College London (UCL) has done and plans to do about the fact that Professor Martin Birchall is a co-author of an article (Macchiarini P, Jungebluth P, Go T, et al. Clinical transplantation of a tissue-engineered airway. Lancet 2008;372:2023-30.) that has not been retracted despite being fraudulent.” At that time I presumed that you had enough integrity to realise that a fraudulent medical research article that was resulting in harm to patients should be retracted and enough common sense to realise that a cover up would harm the reputation of UCL. Since then the BMJ has published a letter from colleagues and I calling for retraction of the paper. The BMJ editors and lawyers checked the supporting evidence before publication. I am attaching a copy of the letter.

On 9 July 2021 UCL informed me that Professor Pillay had been appointed to investigate my allegations under UCL’s Research Misconduct Procedure. On 2 November 2021, UCL informed me that Pillay had decided: “The allegation is to be dismissed on the grounds of the substance of the concerns having been considered previously for the following reasons: As you know UCL has conducted a Special Inquiry into Regenerative Medicine at UCL and the Inquiry report was published in September 2017 which made a number of recommendations. The paper in question has been scrutinised by the Inquiry as well as two internal reviews at UCL, a House of Commons Select Committee as well as reviews by the Lancet itself. After careful consideration, I do not consider that you have submitted any new substantial evidence that alters the substance of the allegations that have already been addressed by all these various reviews.”

The new substantial evidence is the correspondence from Professor Castells in early 2018, in which he said the airway collapsed three weeks after the transplantation and it needed to be stented. That means that the main claim of the paper that the graft “had a normal appearance and mechanical properties at 4 months” was false. In addition, Castells said that the claimed improvement in lung function was also untrue. Therefore Pillay’s claim that I had not submitted any new substantial evidence is spurious, at least as far as UCL is concerned. Before 2018, the integrity of the 2008 Lancet paper had been doubted and the ethical basis had been questioned by distinguished surgeons, who pointed out that it was unethical to subject a patient to high risk experimental surgery without prior research demonstrating efficacy in an animal model, particularly when alternative conventional surgery was eminently feasible. The 2018 correspondence from Castells alters the substance of the allegations because it provides incontrovertible proof that the main claims in the paper are false and, because there is no possibility that this could be the result of inadvertent error, the correspondence is conclusive proof of fraud. UCL confirmed that the House of Commons Select Committee that Pillay referred to is the Science & Technology Committee.

There are a number of things that show that the Science & Technology Committee was very concerned that the 2008 Lancet paper has not been retracted. For example:

1. Sir Norman Lamb, who was then Chair of the Science & Technology Committee, added signposts in the previous Committee Report on Regenerative Medicine (2017) to alert readers to incorrect information about the 2008 Lancet paper. The subsequent Report on Research Integrity (2018) refers to the 2008 paper and some of the subsequent follow up research by Professor Birchall at UCL (i.e. RegenVOX). It says “misconduct processes have revealed that the research on using stem cells to support artificial trachea transplants is not reliable, and is based on exaggerated patient outcomes (see Box 2). The ‘RegenVOX’ clinical trial of stem cell-based tissue-engineered laryngeal implants referred to above is now listed
as ‘withdrawn’ on the Clinicaltrials.gov website. Having explored the issue of correcting the research record with our witnesses, we resolved to find a way of flagging the now contested evidence that the Committee received to readers of its report. We have arranged for a note to be attached at the relevant places in the online report with a forward reference to this inquiry. Our intention is to help readers of that earlier report to find further relevant information, not to alter the formal record of our predecessor’s work.

2. After Sir Norman became aware of the email correspondence from Castells in 2018, he sent letters to the Lancet asking the journal to consider retraction of the 2008 paper. Our BMJ letter quotes from Sir Norman’s letter dated 7 March 2019 and UCL has previously been sent a copy of the letter. Sir Norman clearly believed that the emails from Castells were compelling substantial new evidence.

3. Sir Norman also appeared on BBC Television’s Newsnight programme questioning the Lancet’s failure to retract the paper. The link is https://www.youtube.com/watch?v=CzygeoqvjoM The first part of the clip deals with Shauna Davison who died the day after her discharge from Great Ormond Street Hospital when her trachea collapsed and she suffered asphyxia. The second part shows Sir Norman being interviewed on the Newsnight programme.

4. The Medical Research Council has a timeline of “Leading research for better healthcare”. It originally had two major advances for the year 2008. One was “2008 First stem cell-based windpipe transplant conducted”. In early 2019, Sir Norman criticised the MRC for refusing to remove that entry when the Lancet 2008 paper was known to be false. In May 2019, the MRC removed the 2008 paper from the timeline. Below are links to the current timeline https://www.ukri.org/about-us/mrc/who-we-are/timeline/2000-to-present-day/ and the one that was on the MRC website in early 2019

5. Sir Norman is no longer a Member of Parliament, but he has seen our BMJ letter and amongst his other comments, he said that the failure to retract the paper “beggars belief”.

A question remains whether correspondence from Castell was new substantial evidence as far as UCL was concerned. Pillay maintains that the correspondence from Castells does not add to the scrutiny by the Special Inquiry into Regenerative Medicine at UCL and two internal reviews. The Report of the Special Inquiry was published in September 2017. That was before the date on which the fraud was confirmed by the correspondence in May 2018. So obviously the Report does not mention the correspondence between Castells and the Lancet. However, it does raise other concerns about Birchall. For example, it points out that the cell preparation in Bristol took place in a building not licenced under the Human Tissue (Quality and Safety for Human Application)Regulations 2007. The Regulator made a decision not to prosecute the Bristol team for the breaches of the regulations. In addition there is evidence that the four day incubation of the donor trachea with so-called “stem cells” started in Bristol, because the trachea was transported to Barcelona on 10th June, only two days before the operations. Accordingly the trachea should have been classed as an Investigative Medicinal Product in the UK, requiring regulatory approval from the MHRA, but no approval was obtained. Birchall’s attitude to regulations designed to protect patients is illustrated by his statement about the 2008 Lancet paper in an interview to Vogel (Science, 19 April 2013, volume 340, pages 266-8) “We ran rough-shod over regulations – with permission.” In fact there was no permission. He also said “It wasn’t done to the highest possible standards.”

UCL has refused my Freedom of Information requests for the reports of the two internal reviews that Pillay referred to. The reason given by UCL is “Whilst recognising that there is a strong public interest in this area of research, there is also a need for a safe space away from external influence in which allegations of research misconduct can be reviewed and decisions taken. If there was an expectation that these discussions would be disclosed to the public this would inhibit free and frank discussion and would lead to poorer decision-making. For this particular process, the need to ensure robust decision making is considered significant to maintain the integrity and effectiveness of the process itself.” While the Information Commissioners Office is considering my appeal against UCL’s decision, I made an FOI request for UCL to answer the following questions:

1. Was one of the internal reviews that Professor Pillay referred to titled “Allegation of research misconduct against Professor Martin Birchall, Professor Paolo Macchiarini and Professor Alexander Seifalian. Report of the Screening Panel”, which resulted from allegations made by Professor Pierre Delaere in January 2015?
2. Was one of the internal reviews that Professor Pillay referred to titled “Allegations of research misconduct against Professor Martin Birchall from Professor Patricia Murray. Report of the Screening Panel” with the report dated December 2018?
3. If one or both of the two reports mentioned in questions 1 and 2 were not the reports of the
internal reviews that Professor Pillay was referring to, what were the titles of the reports, when were they completed, who made the complaints that resulted in the internal reviews and when did UCL receive those complaints?

UCL has refused to answer those questions for essentially the same reason that it refused to provide
the reports of the internal reviews. UCL said “Whilst recognising that there is a strong public interest in this area of research, there is also a need for a safe space away from external influence in which allegations of research misconduct can be reviewed and decisions taken. UCL relies on individuals coming forward with complaints of academic misconduct, which they may be less likely to do if they thought the fact they had made a complaint might be made public.”

The reason given by UCL for being unwilling to provide a “yes / no” response to questions 1 and 2 is incomprehensible, because I already know the names of those internal reviews, I have copies of both reports and one of the reports is available on the internet. If these two internal reviews are the ones that Pillay was relying on, it calls into question his judgement and his integrity, because neither considered the evidence from Castells. In addition, subsequent events show that the two internal reviews provided false reassurance about the integrity of UCL employees, which raises additional concerns about the rigor of UCL’s internal review processes. Therefore it is worth considering the findings of the two internal reviews that I believe Pillay was referring to.

In his complaint, Professor Delaere alleged misconduct by Birchall and Professor Seifalian, who were
at the time employed by UCL, and by Macchiarini, who had left his honorary professorship at UCL by
the time the report was produced in late 2015. That was before the proof of fraud from Castells became available in 2018. So the report did not consider that evidence. The three UCL professors had been a co-author of a 2011 Lancet paper (Macchiarini P, et al. Tracheobronchial transplantation with a stem-cell-seeded bioartificial nanocomposite: a proof-of-concept study. Lancet 2011;378(9808):1997-2004). In addition, Birchall was named as senior author and Seifalian was a co-author of a 2012 Lancet paper (Elliot MJ, et al. Stem-cell based, tissue engineered tracheal replacement in a child: a 2-year follow-up study. Lancet 2012;380(9846):994-1000). In paragraph 17 of the report of the 2015 UCL internal review (screening) panel it was “noted that the published report on the 2011 synthetic tracheal transplant case, which had included Professor Seifalian as a co-author, was one of six published articles that had been reviewed by four surgeons at the Karolinska University Hospital and cited by them in their allegation of scientific misconduct against Professor Macchiarini on the grounds that the results published by him as the lead author did not appear to correlate with the patients’ actual clinical outcomes. However, the Panel noted that no reference had been made to Professor Seifalian in this allegation, and it determined that there was no prima facie evidence to suggest that Professor Seifalian could be held to account for any of the major inconsistencies or inconsistent and omitted clinical information that had been highlighted by
the Karolinska surgeons in their report.”

The 2011 Lancet paper has now been retracted because it was fraudulent. Professor Seifalian manufactured at Royal Free Hospital / UCL some of the plastic trachea that were supposedly “seeded with the recipients’ stem cells” before they were implanted by Macchiarini when he was working at the Karolinska Institute – the plastic tracheas were not made to GMP (Good Manufacturing Practice) standards. Professor Seifalian was dismissed from UCL on 15 July 2016 for misconduct during his collaboration with Macchiarini. The 2015 screening panel “determined that there was no prima facie evidence that any research misconduct . . . . had taken place, but that there was nevertheless some substance to (Delaere’s) claim that there was a misleading element within the 2012 Lancet published report which had
included Professor Birchall and Professor Seifalian as co-authors – namely with regard to the two figures within the report . . . . These figures had in the Panel’s view not given sufficient emphasis to the presence and possible contribution of the stent and omentum tissue wrap in the recovery of the child patient. Furthermore, the Panel felt that none of the evidence presented by Professor Birchall in this published report in fact serve to demonstrate that the addition of stem cells to the transplanted tracheal scaffold used in the patient case concerned played any therapeutic role in the functioning of the trachea and that none of the effects that were demonstrated in these published reports could be directly linked to the beneficial effects of stem cells

In addition, the 2015 screening panel “felt that Professor Birchall should be urged to give greater consideration to the need for clearer and more representative presentation of information and evidence in his published reports in order to support his assertions, to allow transparent and complete judgement by the scientific community, and to avoid exposure to further allegations of research misconduct, for example the presentation of misleading information, that might jeopardise his future research efforts and subject both himself and UCL to reputational risk. To this end, the Panel felt that Professor Birchall would be well advised to seek to check some of his assertions and the way that these were presented in his published reports with other senior colleagues and collaborators outside the co-authorship of his publications.

If Macchiarini was solely responsible for the false claims about airway transplantation in the 2008 paper and Birchall, Macchiarini’s co-principal investigator, were blameless, how is it that the 2012 paper made misleading claims about tracheal transplantation when Birchall was its senior author and Macchiarini was not even a co-author? The complaint from Professor Murray raised further concerns about publications by Birchall, but they were unrelated to the 2008 Lancet paper. Murray’s complaint was also before the information from Castell’s was known. Although the internal review screening panel’s report was produced after Castell’s correspondence with the Lancet, the screening panel’s report does not mention either the 2008 Lancet paper or Castell’s correspondence.

Murray alleged use of the same images in two separate publications, which had different methods, and deliberate misuse of research findings to support an application for ethics approval. Birchall admitted six images in one paper should not have been used because they related to animal experiments in a different paper. Birchall blamed this on a mistake by a former UCL PhD student and “the scientist overseeing the publication”. Birchall also admitted inaccuracies in a PhD thesis and errors in a research ethics committee application.

In addition, I understand that UCL refused to investigate more serious allegations and said that University College Hospital and Great Ormond Street Hospital should investigate those. One of the more serious allegations was that Birchall and Professor Lowdell knew from work undertaken by their PhD student that freeze-thawing the trachea significantly weakened the structural integrity, making it more likely to collapse. But this information was omitted from all papers and applications for ethics approval from UCL. Failure to take this into account was the reason that Shauna Davison’s trachea collapsed on the day after she was discharged from Great Ormond Street Hospital and, as a result, this 15 year old child died from asphyxia.

From these documents, I do not gain the impression of an aberrant medical researcher. Rather I see
a departmental culture of dishonesty and poor practice that UCL is trying hard to conceal. Therefore it is difficult to escape the conclusion that the reason UCL will not provide answers to my FOI questions is that those internal reviews did not consider the 2018 correspondence between Castells and the Lancet. If I am correct, disclosure of the information will confirm that Pillay has fabricated a spurious reason to avoid investigating the research fraud involing Birchall. I believe that if all the facts came to light, UCL would have to explain:

1. Why it employed Birchall and gave an honorary contract to Macchiarini on the basis of their fraudulent Lancet paper.

2. How enthusiasm for bogus science was used to justify lethal experimental surgery on young
patients at hospitals associated with UCL.

3. How large amounts of publicly funded grants were taken by UCL for research predicated on

I would like to know what UCL is going to do about this scandal and about the apparent attempt at
cover-up by Pillay.

Yours sincerely

Peter Wilmshurst

University Responsibility for the Adjudication of Research Misconduct, by Stefan Franzen

Stefan Franzen is a Professor of Chemistry at North Carolina State University. He is also a whistle-blower in a case of research misconduct that, eventually, after 10 years, led to the retraction of a 2004 Science article entitled “RNA-Mediated Metal-Metal Bond Formation in the Synthesis of Hexagonal Palladium Nanoparticles.

What he learnt about research misconduct, he learnt it the hard way.

Yet, whilst his personal experience of this specific controversy informs and nourishes the narrative, University Responsibility for the Adjudication of Research Misconduct is an academic book that has a much broader scope and ambition as illustrated by the table of content:

  1. Evolution in a Test Tube
  2. The Clash Between Scientific Skepticism and Ethics Regulations
  3. Scientific Discoveries: Real and Imagined
  4. The Corporate University
  5. The Institutional Pressure to Become a Professor-Enterpreneur
  6. The Short Path from Wishful Thinking to Scientific Fraud
  7. University Administration of Scientific Ethics
  8. Behind the Façade of Self-Correcting Science
  9. The Origin of the Modern Research Misconduct System
  10. Sunshine Laws and the Smokescreen of Confidentiality
  11. The Legal Repercussions of Institutional Conflict of Interest
  12. Bursting the Science Bubble

I encourage you to read the book. Here I want to discuss a specific point, which Stefan Franzen considers in particular in Chapter 10: the issue of the confidentiality of integrity investigations. In short, Franzen argues that confidentiality is bad for the whistle-blower, bad for the person(s) whose work is questioned, but convenient for institutions that may want to use it as a smokescreen to limit damages to their reputation.

Let’s start with a quote of the first sentence of Chapter 10:

The contradiction between the confidentiality practiced by adjudicating institutions and the public nature of academic science causes disruption of every aspect of research misconduct investigations. Prior to adjudication, allegations would best be kept from public view, but this can rarely be achieved in a collaborative research setting. The difficult problem of reigning in rumors or protecting informants and respondents from repercussions to their careers is often ignored by university administrators, even though the purpose of the NSF OIG [National Science Foundation Office of Inspector General] confidentiality regulation is to protect the individuals involved.

Later Franzen notes that in most cases the number of people who are in position to file an allegation is small (e.g. collaborators or competitors, who may have expressed concerns in the past) and that they are therefore easy to identify. He shows how, in his case, confidentiality was used to prevent him (or others with relevant expertise) from accessing relevant elements of the investigation (e.g. lab books or data that could have settled the case very quickly), but did not protect him as a whistle-blower: “In the hexagon case, everyone in the academic departments of both universities involved and many in the university administration knew who was involved in the case from the beginning” (In France, Rémi Mossery, who is the integrity lead for the CNRS, says that ~50% of the integrity cases that are reported to him are “collaborations that ended sourly”).

Franzen also considers the case of accusation against more junior scientists (e.g. PhD students or post-doctoral researchers) where confidentiality indeed could serve a purpose of protection of a vulnerable researcher, but where it also often serves to protect the supervisors from scrutiny in cases where mentoring problems may have contributed to the situation.

One further problem (alluded to in the first sentence of the chapter cited above) is the tension between correction of the scientific record and the determination of eventual sanctions. What is the priority and focus of integrity investigations? Is it to clarify and eventually correct the science or is it to determine the seriousness of wrongdoings and propose appropriate punishments. Do these two goals go hand in hand, or, to the contrary, would prioritising one or the other lead to rather different procedures, in particular when it comes to openness versus confidentiality? It is my personal impression (from my reading and the cases I am involved in / I have been involved in) that correction of science is not the priority in such investigations and that indeed confidentiality hinders correction of science too.

What is your experience (anonymous replies allowed )?

Down the rabbit hole of the Limit Of Detection (LOD)

This is a guest post by Gaëlle Charron, Maîtresse de conférences at Université de Paris.

In a post about SERS sensing hosted on this blog, I complained about LODs being often reported below the concentration range in which the sensor displays a linear signal vs. concentration response. Wolfgang Parak reacted here: he thinks this is not an analytical error. This is a good discussion to have, one that I meant to formalise for years to introduce the concept to my students. Wolfgang gave me the decisive incentive, and for that I thank him. In the following, I will go into full tutorial mode for that reason. Feel free to skip some parts if you feel offended, or to reuse the material if you find it useful. To help you navigate this post, here is a rough outline of it:

Wolfgang stated the following definition for the LOD:

I thought the typical definition of the LOD is the concentration in which the detection signal is at least three times bigger than the noise in the signal. This is an “all or nothing” response. At the LOD you can tell that “there is something”, but you can’t necessarily tell how much. The range of the linear response is much harder to achieve. 

I agree. This is actually the recommended IUPAC definition of the LOD.

The limit of detection (LOD), expressed as the concentration cLOD or the quantity qLOD is derived from the smallest measure yLOD that can be detected with reasonable certainty for a given analytical procedure. The value of yLOD is given by the equation

Where yB is the mean of the blank measures, sB the standard deviation of the blank measures and k is a numerical factor chosen according to the desired confidence level (note that the original notations have been modified to be in line with the ones used below).

Generally, a value of 3 is chosen for k; it corresponds to a 93% confidence level. But more on that later.

Let’s dive into the statistics that underpin this definition, or skip it if you want. Say you acquire several measurements of a blank sample and of analyte samples, for instance by recording replicate absorption readings of a spectrophotometric cuvette filled either with pure water or solutions of the analyte. Let’s focus on the blank sample. Because of the natural dispersion of measurements, you will not get the same readings each time. Actually, if you have acquired a large number of measurements, the frequency distribution of the readings will have a bell shape, that of a Gaussian distribution characterised by the mean signal of the blank, yB and its standard deviation sB (if you have acquired less measurements, say 10, it will look a lot more like ASCII art).

If you perform an extra measurement, there is a 15.9 % chance that it will give a reading above yB + sB because of the properties of the Gaussian distribution:

adapted from g/wiki/File:Standard_deviation_diagram.svg

There is only a 2.2% chance it will give a reading above yB + 2sB (point P). Therefore if you blindfold yourself, pick a cuvette on the cuvette rack, somehow manage to perform a measurement without ruining your shoes and obtain a reading of yB + 2sB, the odds that it is the blank sample are only 2.2%. In other words there is a 97.8% chance that what you did measure while blindfolded was not the blank sample but an analyte sample. yB + 2sB would be a nice cut-off value to avoid claiming the presence of an analyte when in fact it is absent, namely to avoid reporting a false positive. When the signal is above this value, you have a 97.8% probability of being right when claiming it is an analyte sample.

Let’s temporarily pick yB+2sB as a cut-off value to discriminate between the blank sample and the analyte samples. A reading below that value is assigned to the blank cuvette, a reading above is assigned to an analyte cuvette. This will efficiently avoid false positive (with 97.8% confidence) but will lead to plenty of false negatives.

Indeed, say one of the analyte samples has a true mean signal of exactly yB + 2sB. Upon acquiring lots of replicate measurements of that sample, you will also get a Gaussian frequency distribution of the readings. For the sake of simplicity for the moment, let’s assume that it will have the same width as that of the blank sample, ie. the same standard deviation. Let’s pick one of those measurements at random. There is a 50% chance that it is greater than yB + 2sB. Upon applying the yB+2sB criterion, that measurement would have been correctly assigned to an analyte sample. Let’s draw again. This time the value is lower than yB+2sB, the chances of it were also 50%. Applying the yB+2sB criterion, one would incorrectly assign that measurement to the blank sample. One would therefore be wrong and report a false negative with 50% probability. That yB+2sB criterion is not so good after all.

Ideally, one would like to avoid both false positive and false negative efficiently. It is then better to pick a cut-off signal value further away from the mean of the blank sample. Let’s put that new cut-off twice as far as previously, at yB + 4sB. Let’s also put an analyte cuvette with a true mean reading of exactly yB + 4sB on the cuvette rack, along with the blank cuvette and let’s put the blindfold back on. You pick one of the 2 cuvettes, press measure and you get a signal of yB + 2sB, below the cut-off of yB + 4sB. In claiming that the mystery cuvette is not that of the analyte sample when in fact it is (false negative), you have a probability of being wrong of only 2.2% because that reading of yB + 2sB is 2 standard deviations away from the true mean of yB + 4sB of the analyte sample. In claiming that it is not the blank when in fact it is (false positive), you have a 2.2% chance of being wrong because that reading of yB + 2sB is 2 standard deviations away from the true mean of yB of the blank sample. At point P exactly, the signal is as likely to arise from the blank than from the analyte. But as soon as you diverge from P, one cuvette assignment becomes markedly more likely than the other. P is therefore called the decision point.

How can you put it to use? Let’s take a huge cuvette rack with 200 groves in it. And let’s put 100 cuvettes of the blank sample and 100 cuvettes of an analyte sample with a true mean signal of yB+4sB in it, in a random order. With the blindfold on, let’s measure these cuvettes and sort them out according to the readings: less than yB+2sB, the cuvette goes to the “blank” rack, more than yB+2sB, the cuvette goes onto the analyte rack. Once the blindfold is off, we will find ourselves with a blank rack with 2 or 3 analyte cuvettes misplaced (false negatives) and an analyte rack with 2 or 3 blank cuvettes misplaced. Not bad, isn’t it?

In general, a cut-off of yB + 3sB is chosen instead of my personal pick of yB+4sB. The decision oint P is therefore 1.5 sB away from the blank and from the analyte sample having the smallest signal that can be confidently distinguished from that of the blank. At that point P, the probability of reporting the absence of the analyte when it is in fact present is about 7% (have a look at this standard normal distribution quantile table). The probability of reporting the presence of the analyte when it is in fact absent is also 7%. Overall, in choosing a mean signal of yB+3sB as the smallest measure that can be distinguished from the blank, one takes a 7% risk of being wrong when linking the signal to the presence or absence of the analyte. The presence or the absence of the analyte, that is the all or nothing that Wolfgang was referring to.

How should that limit of detection be practically determined? The definition we have explored above relies on the knowledge of the true mean value of the measurement of the blank yB, of its true standard deviation sB and likewise of the true mean of the measurement of the LOD sample yLOD. And when I say true mean, I mean enough measurements for the frequency distribution to actually look like a bell and not like a quick and dirty lego tower. The above definition also relies on the assumption that the measurements of both blank and LOD samples will display the same standard deviation. If the standard deviation of the LOD sample is smaller than sB, then at point P, the rate of reporting a false negative will be smaller than 7% so it would not impair the efficiency of the discrimination. However if sLOD > sB , then at point P, the rate of false negative would be higher than 7%. Higher to the point of being inacceptable? That depends on how much greater sLOD is and also on the application. But as a safety margin, it would be better not to assume anything about sLOD and just to measure it by recording plenty of readings. Back to the practicals, on a spectrophotometry case example, one would have to perform repeated measurements of the blank cuvette. One would then have to prepare cuvettes of analyte solutions at different concentrations and make repeated readings of them to try and pinpoint by trial and errors the concentration giving rise to a mean signal of exactly yB + 3sB (for a confidence level of 93%). One would have to accurately measure its standard deviation to ascertain the actual confidence level. That concentration would be then be the LOD. There, one should reward all these efforts with a nice cup of tea.

So how is the LOD usually determined? In many, many instances, in most instances actually, it is not done in this way. I took a good look at how it was done in 5 of the papers that were highlighted in the review which was the object of my initial post, 5 random picks out of the 9 mercury detection papers that reported LOD below the lower limit of the concentration range in which a linear response was observed. I don’t think it useful to name names here. I did find problematic data treatment in 4 of them, and one minor problem in the last one. The mere fact that I could so easily find 4 papers with major analytical problems in them in a table that contains 15 references speaks volumes about our collective issue with good analytical practices.

  • Paper 1: The shown data do not display any error bars or any signature of replicate measurements, and no measurement of the blank. Replicate measurements were acquired only for the upper value of the investigated concentration range (100 ppb). Extrapolating the relative standard deviation of the upper concentration value to the lower concentration value of the investigated range (1 ppb), I estimated the signal that would significantly differ from that of the 1 ppb sample. That signal falls within the range in which the signal vs. concentration plot is linear (1 ppb to 100 ppb). I used that linear relationship to graphically determine a LOD: my estimate was 100 times higher than that stated (10 ppb vs. 0.1 ppb). This is of course an upper estimate of the LOD since it is derived from the lowest end of the investigated concentration range and not the blank. But the first concentration to give a signal significantly higher than that of the 1 ppb sample is 10 ppb (with 93% confidence) and the signal varies linearly with concentration on the 1-100 ppb range. Yet it is claimed that a blank sample can be distinguished with the same level of confidence from a 0.1 ppb sample. This is odd and would deserve thorough double checking by simply performing measurements of the blank.
  • Paper 2: A logarithmic concentration range was investigated, with replicate measurements (and error bars) for each of the concentrations. On the upper half of the investigated concentration range, the signal vs. logarithm of the concentration plot was linear. On the lower half, the signal dependence made a plateau. However, there were significant differences between the signals of contiguous concentrations, even in that non-linear section. The authors used these significant differences to reach an upper estimate of the LOD. They looked for the smallest pair of contiguous concentrations that would give signals being apart by more than three standard deviations (the standard deviations of the signals were similar for these low concentrations). This is exactly in the spirit of the IUPAC definition and it proves Wolfgang right: yes you can detect the presence of an analyte even out of the range where the signal dependence on the concentration is linear. You cannot say how much analyte there is but you can say with 93% confidence that it is not the blank sample. However, in this pair of contiguous concentrations that gave signals being apart by more than three standard deviations (10-7 and 10-8 M), the authors picked the lowest concentration of the two as the LOD estimate. In my opinion they should have picked the greater concentration (10-7 M) of that pair since the lowest (10-8 M) is not statistically different from the next lowest contiguous concentration (10-9 M). But that is a detail, the methodology looks sound to me
  • Paper 3: Again, a logarithmic concentration range is investigated here. The signal is plotted against the logarithm of the concentration (from 2 ppt to 1 ppb), with error bars on the data points, and fitted to a linear model. Obviously, the blank cannot be represented as a data point on that plot since log(0) is undefined. However, the text states that the LOD has been inferred from the standard deviation of the blank. A blank was therefore measured but the details of those measurements, its mean and SD values, are not reported (a horizontal line with a shaded envelope could have been added onto the graph to display yB and sB respectively). From the claimed LOD (0.8 ppt) and drawn error bars of the non-blank samples, I graphically searched for the signal value that was used as an estimate of the mean signal of the blank. It coincides with the intercept of the calibration curve with the drawn y-axis. Therefore I suspect that the mean blank signal yB was inferred by extrapolation of the calibration model. I also suspect that the standard error of that calibration model (inferred from the residuals) was used as an estimate of the SD of the blank.
  • Paper 4: Another logarithmic concentration range here. The text states that the signal of the “background” was measured but there are no hint of those measurements (as a line with envelope for instance) on the signal vs. logarithm of the concentration. The stated method for estimating the LOD seems to diverge from the IUPAC definition in as much as it does not refer to the standard deviation of the blank: “The LOD was calculated without Hg2+ giving SERS signal at least three times higher than background.” The claimed LOD is 0.45 ppt. Yet on the data, the signals recorded for the standards at 0.5 and 1 ppt do not appear significantly distinct and the signal vs. concentration plot over the range 0.5 ppt – 5 ppb has a sigmoidal shape. So I would doubt that the blank signal is significantly different from that of a 0.45 ppt sample.
  • Paper 5: A linear concentration range was explored here (1.1-61.1 nM). There are error bars on the data points. A blank sample does not seem to have been measured: no mention of it in the text and no display on the graph. Using the error bar and the mean of the lowest investigated concentration, I determined the value of a signal that would be three standard deviations higher. It fell into the signals obtained for the investigated concentration range. I then graphically read an estimate of the corresponding concentration: it was about 10 times higher than the claimed LOD (10 nM vs. 0.8 nM). I could not figure out how the claimed LOD was derived.

To summarize:

  • Yes, Wolfgang has a point. One can have the capacity to say that an analyte is present with fair confidence (to detect it) outside of the range where you can measure it (the Paper 2 example).
  • But I also have a point: for this to happen, you need to acquire data outside of the range in which you have established a calibration model. You cannot say anything about a concentration range in which you have not actually performed measurements. And this is unfortunately often the case (Papers 1, 3, 4 and 5).
  • The quality of these blank measurements matter a lot as well. If you do not acquire enough readings, your estimate of the mean (yB) is likely to diverge from the true mean because your frequency distribution won’t be smooth enough to see the true mean with precision. The more measurements you make, the smoother the frequency distribution, the closer your estimate gets to the true mean. For instance, if you estimate the mean from 3 measurements, you know with 95% confidence that the true mean lies within ± 4.30 ⨯ sB/√3 ≈ ± 2.48 ⨯ sB about your estimate of the mean. If you record 10 measurements, that interval shrinks to ± 2.26 ⨯sB/√10 ≈ ± 0.71 ⨯ sB.
  • Often, people infer the properties of the blank sample through extrapolating the linear portion of the signal vs. concentration (or logarithm of concentration). This can go very wrong. I will spend some time on it in the last section of this really, really long post.

Why should we not infer the properties of a blank sample by extrapolation of a linear signal vs. concentration plot?

Let’s say that we have acquired measurement data for several analyte solutions spanning a linear concentration range. And we plot it. On this graph, I have expanded the linear regression line towards the y-axis. If the regression model is valid down to the zero concentration, the mean signal of the blank sample will be y0. If the data show homoscedasticity, namely a homogeneous variance throughout the calibration range, we can give as an estimate of the standard deviation of the blank the same standard deviation as that observed for the data, s0.

Then we acquire repeated measurements of a blank sample, enough to have a smooth frequency distribution (let’s say 30). We calculate the mean yB and standard deviation sB. There are three possible mathematical options: yB is either equal, smaller or greater than y0. Let’s look at those three cases.

Case n°1: the two mean yB and y0 do not differ significantly (as ascertained for instance by a t-test with yB, sB, y0 and s0 as arguments). This is a trivial case: the blank should be included in the calibration model. (The LOD would then fall within the linear calibration range. Yep, I can be irritating.)

Case n°2: the mean blank signal inferred from the measurements, yB, is significantly smaller than y0 (as ascertained by a one-tailed t-test). In this case, the true LOD is smaller than the one estimated from a y0+3s0 signal. It could also be smaller than the lower end of the linear calibration range (data point (c1,y1)). This would occur if yB+3sB <y1. I could not recall any example of such situation but that hardly counts as a solid proof that it is impossible. At any rate, this proves the necessity of exploring concentrations outside the linear calibration range because the sensor could then have a use, a detection one and not a quantification one, outside the initially explored range.

Case n°3: the mean blank signal inferred from the measurements, yB, is significantly greater than y0 (as ascertained by a one-tailed t-test). An estimation of the LOD based on a signal would then be wrong. What one can do is to check whether y1 and yB are significantly different. If not, then the true LOD needs to be looked for within the linear calibration range (there, I am still trying to push my point). If y1 and yB do differ significantly, the true LOD needs to be looked for between 0 and c1, by acquiring more data. This type of non-linearity is what you could observe in a protein assay for instance, when the fraction of proteins consumed by adsorption on the sidewalls of the cuvettes becomes non negligible compared to the total amount in solution.

I wrote most of this stuff with the help of Statistics and chemometrics for analytical chemistry, by Miller & Miller. This handy little book has been used and copied so much in my group that its pages are coming apart, the definitive experimental demonstration that it is worth reading.

And with this, I sincerely hope I have not bored you to death thank you for your attention.

Editor’s note. Did you know that you can cite blog posts in the scientific literature? For example, you could cite this one as follows: Charron, Gaëlle. “Down the rabbit hole of the Limit Of Detection (LOD)”. Rapha-z-lab, 14/12/2021. https://raphazlab.wordpress.com/?p=5422

What’s a limit of detection anyway? Wolfgang Parak responds to Gaëlle Charron’s blog post

In her guest post (Sensing by Surface Enhanced Raman Scattering (SERS) : to the Moon and back down to Earth again) published last week, Gaëlle criticised SERS articles that reported a limit of detection (LOD) below the limit of the linear range:

The range onto which the sensor responds linearly, onto which the signal vs. concentration calibration model will be built, is 0.5-1000 nM. Yet a LOD of 0.18 nM is claimed. What happens to the signal dependence on the concentration below 0.5 nM is either too noisy or too flat to enter the calibration model or it has simply not been tested. Yet, the sensor is claimed to be operational at a concentration within this unchartered territory. Out of the 13 entries in the table dealing with mercury sensing, 9 displays a LOD below the lower limit of the sensitivity range. Error is not incidental here, it is the norm.

Wolfgang does not agree that this is an error and he wrote to us. Here is his letter.

Dear Gaëlle and Raphaël

I am not sure if I understand one of your arguments. In my point of view the LOD, the limit of detection, can be lower than the lowest value of the linear range. I thought the typical definition of the LOD is the concentration in which the detection signal is at least three times bigger than the noise in the signal. This is an “all or nothing” response. At the LOD you can tell that “there is something”, but you can’t necessarily tell how much. The range of the linear response is much harder to achieve. Here the signal needs to go linearly with the concentration. I think you can have the case where for example at 1 nM you see a signal (3 times higher than noise), but for example at 2 nM the signal would not be twice. You see it, but due to noise and sensor response properties you could not really quantify how high the concentration is. You could in this case for example have from 10 nM to 500 nM a linear response, where the response really follows in a linear behavior to the concentration. Thus, for my understanding in general the LOD can be lower than the lower limit of the linear range.

I am not 100% sure about this, but this is how I understood the definition.

Best wishes, Wolfgang

What do you think? Can a LOD be below the linear sensitivity range of a sensor?

Guest Post: Sensing by Surface Enhanced Raman Scattering (SERS) : to the Moon and back down to earth again

This is a guest post by Gaëlle Charron, Maîtresse de conférences at Université de Paris.

I was about to submit a paper about the detection of atomic ions by SERS the other day. The paper had been in the pipeline for months. I went through a last survey of the recent literature to check for fresh references that it would have been unfair to leave out. When I bumped into a 22 pages review just about that: Examples in the detection of heavy metal ions based on surface-enhanced Raman scaterring spectroscopy.

It sucks, I thought, as cold sweat was pouring down my neck. I have been working on this “novel” idea for 9 years now. Revisiting the dyes developed as colorimetric indicators for the detection of metal ions through a SERS angle. SERS sensors exploiting not the absorption properties of the indicators, but their vibrational signatures. The first time I thought about it sometime in 2012, I was excited as a blinking Christmas tree. The literature about colorimetric quantification of metal ions, mainly from the 40’s, 50’s and 60’s was rich, reliable and pretty informative. Lots of options for commercial indicators, lots of experimental details, many of them put to use in classic lab courses. And above all, lots of thermodynamic constants to use to emulate the chemical system. It felt like I could do nano properly, to the quantitative standards of textbook chemistry. Obviously, many people would see the same opportunity, as every undergraduate chemistry student will have played with these complexometric indicators at one point or another in her curriculum. About a month after my initial epiphany, I discovered that Luis Liz-Marzàn had already killed the game. An ACS Nano paper about ultrasensitive chloride detection, in the pM range. And now a full review was out, with its 80 examples of metal ion detection by SERS. I was late.

Feeling moody, I dived into the review. A general introduction about how deleterious and ubiquitous metal contamination is, about how heavy metals are usually quantified and about the limitations of those methods. The classic primer about SERS and its accepted mechanisms. And then, metal target by metal target, examples of dedicated SERS sensors.

In the general introduction, a sentence caught my attention. More and more researchers have used SERS technology to detect and quantitatively or semiquantitatively analyse heavy metal ions in various environments. That sentence cites a paper of mine about the setting-up of a SERS sensor of Zn2+ in pure water, ie. in the simplest of matrices, in a lab environment. Like in the other cited references associated with that sentence, my team did not use SERS technology to quantify a metal contaminant in the environment. We just examined the possibility of quantifying that contaminant with SERS. Much like examining the effectiveness of a drug to treat a disease does not mean that the drug is used to cure the disease.

Why does it make a big difference? For one, for the sake of accuracy. Then because there are many shortcomings to developing any new chemical analysis method. Is the sensitivity appropriate for the concentration range in which the target analyte will likely be encountered? Are the readouts true enough, precise enough? How much time, effort and money does it take to produce a readout? How likely is the method to work every day of the week when we switch the spectrometer on or open the fridge to reach for a nanoparticle batch? All of the above compared to the standard analytical methods? All of the above when analysing a typical specimen of the targeted samples and matrices?

The authors of the review acknowledged, at least partially, those potential pitfalls. Summary tables of examples of detection were given for each metal ions with ranges of linear response to analyte concentration, LODs and comments. The latter listed the following adjectives: sensitive, accurate, anti-interference, reliable, complicated, selective, low sensitivity, simple, rapid, low reproducibility. Yet, at no point in the review is the practicality of the reviewed SERS sensors discussed in comparison with the standard methods used by people actually performing chemical analysis of contaminants. It seems like those people were never consulted. Like the mad nano-scientists and the end-users were never put in the same room with a tea trolley.

The simplest illustration of this appears in the reported LODs and sensitivity ranges, for instance in the case of mercury quantification. The maximum concentration in drinking water set by the US Environmental Protection Agency is 10 nM. The concentration range for mercury in drinking-water is the same as in rain, with an average of about 125 pM. Naturally occurring mercury concentration in groundwater is less than 2.5 nM. Yet Table 2 of the review lists a sensitivity range of 10 fM to 100 pM, fully irrelevant to flagging an abnormal concentrations in drinking water, rain water or groundwater, or several sensitivity ranges unsuitable to assess the safety of drinking water (9.97 pM-4.99 nM, 4.99 pM-2.49 nM).

The focus of the description of those sensors is on the chemical schemes put to use, many of which sound rather overhyped.

Wang et al. created a dual signal amplification strategy based on antigenantibody reaction to recognize copper ions (Figure 4c) [78]. Specifically, they started with decorating the multiple antibiotic resistance regulator (MarR) that worked as bridging molecules and 4-MBA served as a Raman reporter on the surface of AuNPs, and then Cu2+ions generated disulfide bonds between the two MarR dimers by oxidizing cysteine residues, which induced the formation of the MarR tetramers, leading to the aggregation of AuNPs and the reinforcement of the SERS signal of 4-MBA. In the meantime, another substrate, AgNPs capped with anti-Histag antibodies combined with MarR (C-terminal His tag) to constitute dual hot spots and the reticulation of AuNPAgNP heterodimers. The dramatic signal enhancement allowed the detection limit to reach 0.18 nM with a linear response in the range of 0.51000 nM.

Take a deep breath. And a drink. The smart mouth contest goes on and on.

Without quite a full chemical legitimacy I would say. In the previous example, you might have felt an itch. Let’s rewind and replay.

The dramatic signal enhancement allowed the detection limit to reach 0.18 nM with a linear response in the range of 0.51000 nM.

The range onto which the sensor responds linearly, onto which the signal vs. concentration calibration model will be built, is 0.5-1000 nM. Yet a LOD of 0.18 nM is claimed. What happens to the signal dependence on the concentration below 0.5 nM is either too noisy or too flat to enter the calibration model or it has simply not been tested. Yet, the sensor is claimed to be operational at a concentration within this unchartered territory. Out of the 13 entries in the table dealing with mercury sensing, 9 displays a LOD below the lower limit of the sensitivity range. Error is not incidental here, it is the norm.

Also, as an undergraduate, I had learnt that an indicator abruptly changes speciation at an analyte concentration on the order of the dissociation constant of the indicator-analyte complex. A sensitivity in the pM would call for a 10-12 dissociation constant, a magnitude that is only encountered with chelating ligands with many binding atoms and/or at high pH, conditions that were not discussed, and very seldom met in the reviewed examples. But that may be the object of a full discussion in itself: does anyone understand how the sensing actually occurs, I mean beside our fantasized chemical sketches?

I still went through the full review. The conclusion nearly had a point, too bad they did not think it was worth an actual discussion.

At present, SERS technology basically stays as a laboratory test, which still has a big challenge for the quantitative testing on-site and actual complex samples, so it cannot be regarded as one of the conventional detection assays.

(Damn right it isn’t. I have been chasing my own tail for 9 years.)

Closing the paper print, an image came to my mind. That of a father of three kids going to a car dealership, looking for a vehicle that could accommodate three child safety seats. To which the dealer presents a half-assembled Lamborghini.

–          Preliminary tests indicates that it can go to 200 km/hr in 10 s. It will have a DVD screen on the passenger side.

–          I doubt the 3 car seats will fit at the back.

–          Leather seats are included.

–          It is half-finished. Plus it has breadcrumbs all over and a large crack into the windshield.

–          Yeah. You might want to fix that before you drive with the kids.

Lamborghini Model SERS – Credit to Nathanaël Lévy (13 years old)

15 years of PLOS ONE and some stripes

PloS One has an interesting blog post celebrating 15 years since that adventure began with interviews of former member of the editorial team, Damian Pattinson, Ginny Barbour, Matt Hodgkinson, Iratxe Puebla and Joerg Heber. During that period PloS One published a quarter of millions of articles containing probably ~ one million figures so I was amazed to see that the one figure included in that blog post comes from our paper on the interpretation of stripy nanoparticles images (Stirling et al, Critical Assessment of the Evidence for Striped Nanoparticles ). To be fair, the reason for this choice has nothing to do with the merits of our article. Here is the relevant excerpt from the blog post

In some cases, difficult editorial situations led to innovative solutions that further advanced PLOS’ mission. Iratxe Puebla, Associate Director at ASAPbio, remembers a time where PLOS ONE’s publication criteria and Open Access publishing model drove knowledge forward:

PLOS ONE was created to remove barriers: for authors to publish their work (related to scope or perceived impact) and for readers to access and reuse scientific content. Looking back at the many initiatives and papers I was involved with during my time at PLOS, there is one article that exemplifies this goal of facilitating openness.

In 2014 PLOS ONE handled a paper that reported a re-analysis of previous publications reporting the creation of “striped nanoparticles”. The authors completed a re-analysis and critique of those findings and wished to publish their work in a journal so that it would be part of the scientific record, on the same ground as the original articles. The authors had had trouble getting earlier critiques published in journals, and decided to submit the paper to PLOS ONE. This is where the first barrier went down: PLOS ONE would not reject the manuscript because it reported a re-analysis or because it relied on previously available data, the evaluation would focus on the rigor of the methodology and the validity of the conclusions.

The paper underwent a thorough peer review process and was accepted. But then we encountered a dilemma: the re-analysis required comparisons to images in the original publications where the journals owned copyright. Should we ask for permission to publish the images under a single-use license or request to republish them under the CC BY license used by PLOS ONE? While the former would have been the traditional (and easier) approach, we chose to pursue the latter. Why? Because the journal wanted to make all its content be available for reuse, for both humans and machines, without having to check individual figures in individual articles for the permitted uses. The PLOS ONE team worked with the authors and the publishers of the original articles, and we were pleased that they agreed to have the images republished under the CC BY license. As a result, the full article, including all images, is available for reuse without license-related barriers.

Finding that resolution was not so easy and we were quite frustrated at the time as this post (How can we trust scientific publishers with our work if they won’t play fair? Julian Stirling) illustrates .

And the stripy nanoparticles saga tested PLOS ONE’s publishing platforms in other ways too. As the bottom of this post (Identity theft: a new low in the stripy nanoparticles controversy) recalls, it was too easy to create profiles & comment on papers with a false identity. A fake “Dr Wei Chen” and a fake “Dr Gustav Dhror” left 10s of comments on our article.

Do striped nanoparticles exist? Figure 3 from Stirling J, Lekkas I, Sweetman A, Djuranovic P, Guo Q, Pauw B, et al. (2014) Critical Assessment of the Evidence for Striped Nanoparticles. PLOS ONE 9(11): e108482. https://doi.org/10.1371/journal.pone.0108482

Allegations of research improprieties at Spherical Nucleic Acids company Exicure

Update 2 (14/12/2021): a class action has been filed by Bragar Eagel & Squire, P.C on behalf of stockholders

The complaint filed in this class action alleges that throughout the Class Period, Defendants made materially false and/or misleading statements, as well as failed to disclose material adverse facts about the Company’s business, operations, and prospects. Specifically, Defendants failed to disclose to investors: (1) that there had been certain improprieties in Exicure’s preclinical program for the treatment of Friedreich’s ataxia; (2) that, as a result, there was a material risk that data from the preclinical program would not support continued clinical development; and (3) that, as a result of the foregoing, Defendants’ positive statements about the Company’s business, operations, and prospects were materially misleading and/or lacked a reasonable basis.

Update 1 (14/12/2021): Exicure has filed a 8-K form (whatever that is). We learn that the Board of Directors of the Company (the “Board”) appointed Brian C. Bock, the Company’s Chief Financial Officer, as the Company’s Chief Executive Officer, replacing David Giljohann, effective December 10, 2021. He will serve as Chief Technology Officer of the Company through January 30, 2022, at which time he will separate from the Company and resigned as a member of the Board, effective December 10, 2021. So the scientist who developed the technology in Chad Mirkin’s group and who was one of the founders is exiting the company. Parts of the research program are winding down: On December 10, 2021, the Company announced its commitment to a plan to wind down the Company’s immuno-oncology program for cavrotolimod (AST-008) and the Company’s XCUR-FXN preclinical program for the treatment of Friedreich’s ataxia. […] This plan will implement a reduction in force where the Company will eliminate approximately 50% of the Company’s existing workforce on a staggered basis through January 2022 as well as other cost-cutting measures. […] Additionally, the Company is evaluating its facilities and contractual relationships utilized in the cavrotolimod and XCUR-FXN programs and the associated contractual obligations to determine the appropriate course of action and any associated charges to wind down the ongoing clinical trials. We also learn more about the results of the audit: the Audit Committee and the Company investigated statements made by Dr. Grant Corbett, the Company’s former Group Leader of Neuroscience. Dr. Corbett voluntarily resigned from the Company on November 8, 2021. As part of his resignation, he claimed that when he was employed by the Company, he intentionally misreported certain raw data related to the research and development of XCUR FXN. […] The investigation revealed that: (1) beginning in the autumn of 2020, Dr. Corbett misreported raw data from certain research and development experiments related to XCUR-FXN; (2) Dr. Corbett misreported the results of at least three different experiments that were conducted through at least February 2021; (3) the misreported data related solely to efficacy rather than safety of XCUR-FXN; (4) the misreported data was included in various public presentations and SEC filings from as early as January 7, 2021 through as late as August 12, 2021; (5) Dr. Corbett acted alone in misreporting the data, without the assistance or knowledge of anyone else at the Company, including Company management and other research and development employees and did not inform anyone at the Company of his actions until his resignation in November 2021; (6) Company management reasonably relied on Dr. Corbett’s analysis when making public statements that included Dr. Corbett’s misreported data; and (7) no other Company program was impacted by Dr. Corbett’s misreporting of the XCUR-FXN data. […] The Board and the Audit Committee also intend to enhance the Company’s policies and procedures regarding data management and integrity.

Original post 22/11/2021

Last Tuesday, Twitter user @NanoSkeptic drew my attention to a report by the company Exicure that develops biomedical applications of Spherical Nucleic Acids. Exicure was founded in 2011 by David Giljohann, Chad Mirkin and Shad Thaxton.


The report mentions allegation of research improprieties:

on November 9, 2021, the Audit Committee of our Board of Directors was notified of a claim made by a former Company senior researcher regarding alleged improprieties that researcher claims to have committed with respect to our XCUR-FXN preclinical program for the treatment of Friedreich’s ataxia. The Audit Committee has retained external counsel to conduct an internal investigation of the claim. We are currently unable to predict the timing or outcome of the investigation. We are unable to determine the potential impact of the asserted claim on our research and development activities or the timing of completion of our current research and development of our XCUR-FXN preclinical program for the treatment of FA, as the investigation of the asserted claim remains ongoing. In connection with the ongoing investigation, securities class actions and other lawsuits may be filed against us, certain current and former directors, and certain current and former officers. Any future investigations or lawsuits may also adversely affect our business, financial condition, results of operations and cash flow.”

Since Exicure’s Friedreich’s ataxia research is not published in scientific journals, there isn’t a lot more to say at this point. I have raised concerns about Exicure for some years…

https://platform.twitter.com/widgets.js https://platform.twitter.com/widgets.js

It is interesting to note that when allegations of research improprieties are made about the research programme of a listed company, those have to be made public immediately (even if the research results have not been shared publicly), whereas when something similar happens say in a University lab everything can stay secret for years until investigations are completed (even if the erroneous/fraudulent results have been published).

Guest post: Rewarding Reproducibility and Correction in Science

This is a guest post by Jan-Philipp Günther, Max Planck Institute for Intelligente System.

During my PhD, I was involved in multiple scientific discussions, which focused on the reproducibility of scientific results. At first, we examined these results, which have been published in high-ranked journals, out of curiosity and to understand them further, but then discovered that the measurements were not reproducible. After determining the sources of errors (measurement artefacts), we published our results. This cost us a lot of time and energy and was mainly rewarded by positive feedback from colleagues at conferences, but I think that many scientists who try to understand and reproduce published results are not willing to make their findings public – and even if they try to do so, they might face resistance from publishers. I was also amazed to learn of other papers, which are known by several senior scientists of the community to be irreproducible, but where this insight never appeared in print. As scientists, we know that mistakes can and will happen and that science is merely the process of advancing our understanding. Hence publications that over time are no longer considered to be correct are a part of science, but to an even greater extent should repeating experiments and corrections of the literature be accepted and encouraged. These corrections can save time and money and in rare cases (e.g., medical sciences) even lives. In the current era of bibliometrics, the pressure on scientists to regularly produce high impact papers and the pressure on journals to publish the most spectacular results as fast as possible has led to an increasing occurrence of errors and in some cases even fraud (see this project as an example to address “Scientific Misconduct and the Attempt of a Counterattack”). Additionally, the current system rewards irreproducible publications with more citations, where in most cases the citing article does not even mention the replication failure. This makes me as a young scientist believe that action is needed to bring these problems to the awareness of scientists and publishers. During a recent scientific meeting with many fruitful discussions, I had the idea to encourage and reward scientists to revisit experiments with dedicated awards, especially if repeating experiments reveals new insights or is able to correct the literature. Some ideas for awards are listed below, which will hopefully start a discussion on this topic within the community.

Award for reproducing published experiments

An award could be dedicated to scientists, who spend time to reproduce challenging experiments. This might encourage scientists to test published results or make their findings public, if they did run the experiments already, but did not find the time to publish their results. These studies should be awarded independent of the outcome. The focus should be on the effort invested, the challenges and the impact of the results. The award should also consider the contribution of young scientists, which did the actual experiment, since this can in some cases be extremely time consuming and risky.

Advancing scientific reproducibility award

This award might be handed to individuals or organizations, which foster the progression of scientific reproducibility and corrections through continuous effort or single innovations. One example might be the establishment of an online tool for scientific exchange, a journal with generous correction policies, or a group of scientists, who fought a long time to correct a certain part of the literature against resistance. It is of course necessary to exclude ongoing scientific discussions, since it might be impossible to determine, which side is correct, or it might be impossible to find an impartial committee or reviewers for the award.

Self-correction award

Quite often the original authors of papers gain additional insights or have discovered a mistake, but the stigma of correcting their own work or retracting the original publication is too big. Although many scientists seem to feel this way, it is not what I found to be the case in the vast majority of discussions, which I had with colleagues, who are in favor of self-correction and retraction and do not regard this as a stigma. Until the opinion that self-correction is a noble (and necessary) act has not reached the majority of scientists, awards for such self-corrections could be implemented. The award should of course only be handed out, if the candidates are willing to accept it.

Award for anonymous whistleblowers

Awards for whistleblowers might already exist, but this award should be dedicated to anonymous whistleblowers, which lead to scientific corrections. Especially in the case of scientific misconduct, it might be impossible for one of the authors to correct erroneous publications oneself without the support of all coauthors. In this case the authors should be encouraged to alert other scientists anonymously with a public announcement. If this announcement leads to a correction of the scientific literature, the whistleblower can be nominated. The prize money should be donated to open science foundations, since it of course cannot be handed out to the awardee publically. Hence, the awardee is not benefiting, but rather the scientific community.

Best PubPeer comment

PubPeer.com has the potential to become a popular and valuable tool for scientific correction, but this might strongly depend on the culture of the scientific exchange. High quality and respectful comments could be encouraged with an award. This could also be implemented with a dedicated title, symbol or name tag on the website. Maybe only signed comments should be considered.

It would be beneficial if an independent institution could be founded to handle the awarding process. Application or nomination for the awards should be open to everyone. I hope this will lead to further discussion on the topic of scientific correction and will maybe someday help scientists to make the right decisions for the benefit of us all. The utopian dream of the smooth, perfect (self-)correction in science, may never be achieved, but we will hopefully be able to foster a culture, where reproducing experiments will be honored and where mistakes can be addressed in a respectful dialogue. Please let me know your comments and concerns, and please feel free to develop these ideas further.

Automatically generated nano bullshit

Nanotechnology is a fast-growing technology that plays an important role in many areas of biotechnology, from biomedical engineering to biotechnology production. It is capable of producing new types of products for many applications. There are different nano-systems such as liposomes, metallic nanoparticles, and so on, and it can even produce new components to carry out its tasks. Nanoparticles are characterized by many features that make them suitable for biomedical engineering and in biomedical production. The development of antimicrobials in nanoparticle systems is considered to have changed the way in which the development of medical products was handled. In this paper, we reviewed some antimicrobial nanoparticle preparations that have been successfully applied on a wide range of biomedical applications in different domains.

I could go on and on and on about how great nanotechnology is, especially given that I did not write the above. No human did. It was generated automatically with Transformer (gpt 2). I gave a few words of input here and there. Transformer completed the rest (all the bits highlighted below where generated automatically).

Unfortunately, it turns out that there are people who use this, and other tools, to generate texts that look like scientific articles, and indeed go through peer review and are published. Guillaume Cabanac and Cyril Labbé, two computer scientists (and collaborators on the NanoBubbles project) have made it part of their job to detect those fake articles… and they found a lot of them! They are crowdsourcing the human assessment of problematic papers via this site. Check it out and add comments on PubPeer. It is mindboggling but also, quite funny. Here is my review of Application of some nanoparticles in the field of veterinary medicine

This article was detected by the problematic paper screener as potentially including “tortured phrases” because of the presence of the expressions “attractive reverberation” “medication conveyance” “nucleic corrosive”. It has been cited 23 times. The journal is published by Taylor and Francis. To provide Open Access, it charges a fee of US$800 which is to be paid by the authors, or on their behalf by their funders or institution, directly to the Faculty of Veterinary Medicine, Cairo University.

Reading the article, there are a number of other tortured phrases. For example, in the abstract “surface zone to mass extent” instead of surface to mass ratio, “wound recuperating” instead of wound healing, “contrarily charged” instead of negatively charged, “decidedly charged” instead of positively charged, “Immune-invigorating edifices” instead of immune stimulating constructs, “the normal of genuine populace”, “pulmonary publicity to” instead of pulmonary exposure to, “creature” in many expressions where “animal” would be expected (“creature stress”, “creature wellbeing”, “creature generation industry”)

More generally, most sentences are non-sensical. Consider for example: “It is not handiest in a situation to overcome the difficulties experiencing the customary cure, yet in addition, it allows the comprehension of assorted physiological and obsessive techniques.” or “The curiosity of newly synthetic atoms can provide us a new helpful medicinal drugs with the intention to treat diseases and guard creatures from viral or bacterial diseases and improve wound recuperating.”

This raises a few important questions. We already knew that there was a lot of bullshit in the bionano scientific literature. But, how much of it is automatically generated? Why are these articles published? Why do they go through peer review and why is it so difficult to remove them? Why publishers don’t seem that bothered to have published hundreds of nonsensical papers? And, maybe most importantly, why is this vaccuous, vague and hyperbolic style of writing so familiar and unsurprising?

Science in the castle

Small meetings are often the best meetings and that proved true once more last week at the Interdisciplinary symposium Schloss Ringberg 2021 organised by Peer Fischer and his team at the Micro, Nano, and Molecular Systems Group Max Planck Institute for Intelligent Systems, Stuttgart. They had invited me because of a shared interest in uptake & fate of nanomaterials in cells, but also, because of a shared experience with the difficulties of challenging published results (in their case, related to papers claiming molecular and enzyme swimmers ; see eg https://science.org/doi/10.1126/science.abe8322). The informal discussions with young and experienced scientists often turned around the questions of why people commit sloppy science, fraud and misconduct, but also, why is it so difficult to correct the scientific record and what can we do about it (in other words, the exact focus of NanoBubbles). More on this soon as I expect a guest post coming…

The meeting took place in the pre-Alps, near Munich in a medieval-looking castle which was actually built in the 20th century and now belongs to the Max Planck Society. A special place with a captivating history and wonderful staff. I hope to come back one day, amongst other reasons because it seems I have not yet discovered all the interesting rooms of the castle: