This is a guest post by Philip Moriarty, Professor of Physics at the University of Nottingham
A few days ago, Raphael highlighted the kerfuffle that our paper, Critical assessment of the evidence for striped nanoparticles, has generated over at PubPeer and elsewhere on the internet. (This excellent post from Neuroskeptic is particularly worth reading – more on this below). At one point the intense interest in the paper and associated comments thread ‘broke’ PubPeer — the site had difficulty dealing with the traffic, leading to this alert:
This thread is generating unprecedented interest in PubPeer. Please bear with us as we deal with the traffic. https://t.co/lwWdMrDzAN
— PubPeer (@PubPeer) January 7, 2014
At the time of writing, there are seventy-eight comments on the paper, quite a few of which are rather technical and dig down into the minutiae of the many flaws in the striped nanoparticle ‘oeuvre’ of Francesco Stellacci and co-workers. It is, however, now getting very difficult to follow the thread over at PubPeer, partly because of the myriad comments labelled “Unregistered Submission” – it has been suggested that PubPeer consider modifying their comment labelling system – but mostly because of the rather circular nature of the arguments and the inability to incorporate figures/images directly into a comments thread to facilitate discussion and explanation. The ease of incorporating images, figures, and, indeed, video in a blog post means that a WordPress site such as Raphael’s is a rather more attractive proposition when making particular scientific/technical points about Stellacci et al.’s data acquisition/analysis protocols. That’s why the following discussion is posted here, rather than at PubPeer.
Unwarranted assumptions about unReg?
Julian Stirling, the lead author of the “Critical assessment…” paper, and I have spent a considerable amount of time and effort over the last week addressing the comments of one particular “Unregistered Submission” at PubPeer who, although categorically stating right from the off that (s)he was in no way connected with Stellacci and co-workers, nonetheless has remarkably in-depth knowledge of a number of key papers (and their associated supplementary information) from the Stellacci group.
It is important to note that although our critique of Stellacci et al.’s data has, to the best of our knowledge, attracted the greatest number of comments for any paper at PubPeer to date, this is not indicative of widespread debate about our criticism of the striped nanoparticle papers (which now number close to thirty). Instead, the majority of comments at PubPeer are very supportive of the arguments in our “Critical assessment…” paper. It is only a particular commenter, who does not wish to log into the PubPeer site and is therefore labelled “Unregistered Submission” every time they post (I’ll call them unReg from now on), that is challenging our critique.
We have dealt repeatedly, and forensically, with a series of comments from unReg over at PubPeer. However, although unReg has made a couple of extremely important admissions (which I’ll come to below), they continue to argue, on entirely unphysical grounds, that the stripes observed by Stellacci et al. in many cases are not the result of artefacts and improper data acquisition/analysis protocols.
unReg’s persistence in attempting to explain away artefacts could be due to a couple of things: (i) we are being subjected to a debating approach somewhat akin to the Gish gallop. (My sincere thanks to a colleague – not at Nottingham, nor, indeed, in the UK – who has been following the thread at PubPeer and suggested this to us by e-mail. Julian also recently raised it in a comment elsewhere at Raphael’s blog which is well worth reading); and/or (ii) our assumption throughout that unReg is familiar with the basic ideas and protocols of experimental science, at least at undergraduate level, may be wrong.
Because we have no idea of unReg’s scientific background – despite a couple of commenters at PubPeer explicitly asking unReg to clarify this point – we assumed that they had a reasonable understanding of basic aspects of experimental physics such as noise reduction, treatment of experimental uncertainties, accuracy vs precision etc… But Julian and I realised yesterday afternoon that perhaps the reason we and unReg keep ‘speaking past’ each other is because unReg may well not have a very strong or extensive background in experimental science. Their suggestion at one point in the PubPeer comments thread that “the absence of evidence is not evidence of absence” is a rather remarkable statement for an experimentalist to make. We therefore suspect that the central reason why unReg is not following our arguments is their lack of experience with, and absence of training in, basic experimental science.
As such, I thought it might be a useful exercise – both for unReg and any students who might be following the debate – to adopt a slightly more tutorial approach in the discussion of the issues with the stripy nanoparticle data so as to complement the very technical discussion given in our paper and at PubPeer. Let’s start by looking at a selection of stripy nanoparticle images ‘through the ages’ (well, over the last decade or so).
The Evolution of Stripes: From feedback loop ringing to CSI image analysis protocols
The images labelled 1 – 12 below represent the majority of the types of striped nanoparticle image published to date. (I had hoped to put together a 4 x 4 or 4 x5 matrix of images but, due to image re-use throughout Stellacci et al.’s work, there aren’t enough separate papers to do that).
Putting the images side by side like this is very instructive. Note the distinct variation in the ‘visibility’ of the stripes. Stellacci and co-workers will claim that this is because the terminating ligands are not the same on every particle. That’s certainly one interpretation. Note, however, that images 1, 2, 4, and 11 each have the same type of octanethiol- mercaptopropionic acid (2:1) termination and that we have shown, through an analysis of the raw data, that images #1 and #11 result from a scanning tunnelling microscopy artefact known as feedback loop ringing (see p.73 of this scanning probe microscopy manual).
A key question which has been raised repeatedly (see, for example, Peer 7’s comment in this sub-thread) is just why Stellacci et al., or any other group (including those selected by Francesco Stellacci to independently verify his results), has not reproduced the type of exceptionally high contrast images of stripes seen in images #1,#2,#3, and #11 in any of the studies carried out in 2013. This question still hangs in the air at PubPeer…
Moreover, the inclusion of Image #5 above is not a mistake on my part – I’ll leave it to the reader to identify just where the stripes are supposed to lie in this image. Images #10 and #12 similarly represent a challenge for the eagle-eyed reader, while Image #4 warrants its own extended discussion below because it forms a cornerstone of unReg’s argument that the stripes are real. Far from supporting the stripes hypothesis, however, Stellacci et al’s own analysis of Image #4 contradicts their previous measurements and arguments (see “Fourier analysis or should we use a ruler instead?” below).
What is exceptionally important to note is that, as we show in considerable detail in “Critical assessment…”, a variety of artefacts and improper data acquisition/analysis protocols – and not just feedback loop ringing – are responsible for the variety of striped images seen above. For those with no experience in scanning probe microscopy, this may seem like a remarkable claim at first glance, particularly given that those striped nanoparticle images have led to over thirty papers in some of the most prestigious journals in nanoscience (and, more broadly, in science in general). However, we justify each of our claims in extensive detail in Stirling et al. The key effects are as follows:
- Feedback loop ringing (see, for example, Fig. 3 of “Critical assessment…”. Note that nanoparticles in that figure are entirely ligand-free).
- The “CSI” effect. We know from access to (some of) the raw data that a very common approach to STM imaging in the Stellacci group (up until ~ 2012) was to image very large areas with relatively low pixel densities and then rely on offline zooming into areas no more than a few tens of pixels across to “resolve” stripes. This ‘CSI’ approach to STM is unheard of in the scanning probe community because if we want to get higher resolution images, we simply reduce the scan area. The Stellacci et al. method can be used to generate stripes on entirely unfunctionalised particles, as shown here.
- Observer bias. The eye is remarkably adept at picking patterns out of uncorrelated noise. Fig. 9 in Stirling et al. demonstrates this effect for ‘striped’ nanoparticles. I have referred to this post from my erstwhile colleague Peter Coles repeatedly throughout the debate at PubPeer. I recommend that anyone involved in image interpretation read Coles’ post.
Fourier analysis or should we use a ruler instead?
I love Fourier analysis. Indeed, about the only ‘Eureka!’ moment I had as an undergraduate was when I realised that the Heisenberg uncertainty principle is nothing more than a Fourier transform. (Those readers who are unfamiliar with Fourier analysis and might like a brief overview could perhaps refer to this Sixty Symbols video, or, for much more (mathematical) detail, this set of notes I wrote for an undergraduate module a number of years ago).
In “Critical assessment…” we show, via a Fourier approach, that the measurements of stripe spacing in papers published by Stellacci et al in the period from 2006 to 2009 – and subsequently used to claim that the stripes do not arise from feedback loop ringing – are comprehensively incorrectly estimated. We are confident in our results here because of a clear peak in our Fourier space data (See Figures S1 and S2 of the paper).
Fabio Biscarini and co-workers, in collaboration with Stellacci et al, have attempted to use Fourier analysis to calculate the ‘periodicity’ of the nanoparticle stripes. They use Fourier transform of the raw images, averaged in the slow scan direction. No peak is visible in this Fourier space data, even when plotting on a logarithmic scale in an attempt to increase contrast/visibility. Instead, the Fourier space data just shows a decay with a couple of plateaus in it. They claim – erroneously, for reasons we cover below – that the corners of the second plateau and the continuing decay (called a “shoulder” by Biscarini et al.) indicates stripe spacing. To locate these shoulders they apply a fitting method.
We describe in detail in “Critical assessment…” that not only is the fitting strategy used to extract the spatial frequencies highly questionable – a seven free-parameter fit to selectively ‘edited’ data is always going to be somewhat lacking in credibility – but that the error bars on the spatial frequencies extracted are underestimated by a very large amount.
Moreover, Biscarini et al. claim the following in the conclusions of their paper:
“The analysis of STM images has shown that mixed-ligand NPs exhibit a spatially correlated architecture with a periodicity of ∼1 nm that is independent of the imaging conditions and can be reproduced in four different laboratories using three different STM microscopes. This PSD [power spectral density; i.e. the modulus squared of the Fourier transform] analysis also shows…”
Note that the clear, and entirely misleading, implication here is that use of the power spectral density (PSD – a way of representing the Fourier space data) analysis employed by Biscarini et al. can identify “spatially correlated architecture”. Fig. 10 of our “Critical assessment…” paper demonstrates that this is not at all the case: the shoulders can equally well arise from random speckling.
This unconventional approach to Fourier analysis is not even internally consistent with measurements of stripe spacings as identified by Stellacci and co-workers. Anyone can show this using a pen, a ruler, and a print-out of the images of stripes shown in Fig. 3 of Ong et al. It’s essential to note that Ong et al. claim that they measure a spacing of 1.2 nm between the ‘stripes’; this 1.2 nm figure is very important in terms of consistency with the data in earlier papers. Indeed, over at PubPeer, unReg uses it as a central argument of the case for stripes:
“… the extracted characteristic length from the respective fittings results in a characteristic length for the stripes of 1.22 +/- 0.08. This is close to the 1.06 +/-0.13 length for the stripes of the images in 2004 (Figure 3a in Biscarini et al.). Instead, for the homoligand particles, the number is much lower: 0.76 +/- 0.5 [(sic). unReg means ‘+/- 0.05’ here. The unit is nm] , as expected. So the characteristic lengths of the high resolution striped nanoparticles of 2013 and the low resolution striped nanoparticles of 2004 match within statistical error, ***which is strong evidence that the stripe features are real.***”
Notwithstanding the issue that the PSD analysis is entirely insensitive to the morphology of the ligands (i.e. it cannot distinguish between stripes and a random morphology), and can be abused to give a wide range of results, there’s a rather simpler and even more damaging inconsistency here.
A number of researchers in the group here at Nottingham have repeated the ‘analysis’ in Ong et al. Take a look at the figure below. (Thanks to Adam Sweetman for putting this figure together). We have repeated the measurements of the stripe spacing for Fig. 3 of Ong et al. and we consistently find that, instead of a spacing of 1.2 nm, the separation of the ‘stripes’ using the arrows placed on the image by Ong et al. themselves has a mean value of 1.6 nm (± 0.1 nm). What is also interesting to note is that the placement of the arrows “to guide the eye” does not particularly agree with a placement based on the “centre of mass” of the features identified as stripes. In that case, the separation is far from regular.
We would ask that readers of Raphael’s blog – if you’ve got this far into this incredibly long post! – repeat the measurement to convince yourself that the quoted 1.2 nm value does not stand up to scrutiny.
So, not only does the PSD analysis carried out by Biscarini et al. not recover the real space value for the stripe spacing (leaving aside the question of just how those stripes were identified), but there is a significant difference between the stripe spacing claimed in the 2004 Nature Materials paper and that in the 2013 papers. Both of these points severely undermine the case for stripy nanoparticles. Moreover, the inability of Ong et al. to report the correct spacing for the stripes from simple measurements of their STM instruments raises significant questions about the reliability of the other data in their paper.
As the title of this post says, whither stripes?
Reducing noise pollution
A very common technique in experimental science to increase signal-to-noise (SNR) ratio is signal averaging. I have spent many long hours at synchrotron beamlines while we repeatedly scanned the same energy window watching as a peak gradually appeared from out of the noise. But averaging is of course not restricted to synchrotron spectroscopy – practically every area of science, including SPM, can benefit from the advantages of simply summing a signal over the course of time.
A particularly frustrating aspect of the discussion at PubPeer, however, has been unReg’s continued assertion that even though summing of consecutive images of the same area gives rise to completely smooth particles (see Fig. 5(k) of “Critical assessment…”), this does not mean that there is no signal from stripes present in the scans. This claim has puzzled not just Julian and myself, but a number of other commenters at PubPeer, including Peer 7:
“If a feature can not be reproduced in two successive equivalent experiments then the feature does not exist because the experiment is not reproducible. Otherwise how do you chose between two experiments with one showing the feature and the other not showing it? Which is the correct one ? Please explain to me.
Furthermore, if a too high noise is the cause of the lack of reproducibility than the signal to noise ratio is too low and once again the experiment has to be discarded and/or improved to increase this S/N. Repeating experiments is a good way to do this and if the signal does not come out of the noise when the number of experiment increases than it does not exist.
This is Experimental Science 101 and may (should) seem obvious to everyone here…”
I’ve put together a short video of a LabVIEW demo I wrote for my first year undergrad tutees to show how effective signal averaging can be. I thought it might help to clear up any misconceptions…
The Radon test
There is yet another problem, however, with the data from Ong et al. which we analysed in the previous section. This one is equally fundamental. While Ong et al. have drawn arrows to “guide the eye” to features they identify as stripes (and we’ve followed their guidelines when attempting to identify those ‘stripes’ ourselves), those stripes really do not stand up tall and proud like their counterparts ten years ago (compare images #1 and #4, or compare #4 and #11 in that montage above).
Julian and I have stressed to unReg a number of times that it is not enough to “eyeball” images and pull out what you think are patterns. Particularly when the images are as noisy as those in Stellacci et al’s recent papers, it is essential to try to adopt a more quantitiative, or at least less subjective approach. In principle, Fourier transforms should be able to help with this, but only if they are applied robustly. If spacings identified in real space (as measured using a pen and ruler on a printout of an image) don’t agree with the spacings measured by Fourier analysis – as for the data of Ong et al. discussed above – then this really should sound warning bells.
One method of improving objectivity in stripe detection is to use a Radon transform (which for reasons I won’t go into here – but Julian may well in a future post! – is closely related to the Fourier transform). Without swamping you in mathematical detail, the Radon transform is the projection of the intensity of an image along a radial line at a particular angular displacement. (It’s important in, for example, computerised tomography). In a nutshell, lines in an image will show up as peaks in the Radon transform.
So what does it look like in practice, and when applied to stripy nanoparticle images? (All of the analysis and coding associated with the discussion below are courtesy of Julian yet again). Well, let’s start with a simulated stripy nanoparticle image where the stripes are clearly visible – that’s shown on the left below and its Radon transform is on the right.
Note the series of peaks appearing at an angle of ~ 160°. This corresponds to the angular orientation of the stripes. The Radon transform does a good job of detecting the presence of stripes and, moreover, objectively yields the angular orientation of the stripes.
What happens when we feed the purportedly striped image from Ong et al. (i.e. Image #4 in the montage) into the Radon transform? The data are below. Note the absence of any peaks at angles anywhere near the vicinity of the angular orientation which Ong et al. assigned to the stripes (i.e. ~ 60°; see image on lower left below)…
If anyone’s still left reading out there at this point, I’d like to close this exceptionally lengthy post by quoting from Neuroskeptic’s fascinating and extremely important “Science is Interpretation” piece over at the Discover magazine blogging site:
“The idea that new science requires new data might be called hyperempiricism. This is a popular stance among journal editors (perhaps because it makes copyright disputes less likely). Hyperempiricism also appeals to scientists when their work is being critiqued; it allows them to say to critics, “go away until you get some data of your own”, even when the dispute is not about the data, but about how it should be interpreted.”
Meanwhile, back at PubPeer, unReg has suggested that we should “… go back to the lab and do more work”.