Jan 10

Adventures in data visualization

If you happen to be a fan of data-driven political analysis, you are probably also well aware of the ongoing challenge of how to effectively and accurately visualize maps that show US voting patterns.  The debate over how to do this has been going on for decades but was nicely summarized in a 2016 article by the NYTImes Upshot section (https://www.nytimes.com/interactive/2016/11/01/upshot/many-ways-to-map-election-results.html).

Recently, Randall Munroe of XKCD fame came up with a version that is really nicely done version:


2016 Election Map

I was alerted to it by high praise for the approach from Vox:



Of particular note to me was that a cartoonist (albeit one with a strong science background) was the person who found the elegant solution to the density/population tradeoff issue across the US while also capturing the important mixing aspects (blue and red voters in every state).  That isn’t meant as a knock on the professional political and data scientists who hadn’t come with this approach, but more of a note on how hard data visualization really is and how the best, creative, effective solutions might therefore come from surprising sources.https://xkcd.com/1939/



Dec 12

AlphaZero Beats Chess In 4 (!?) hours

Google’s DeepMind group updated their game learning algorithm, now called AlphaZero, and mastered chess.  I’ve seen the game play and it elegantly destroyed the previous top computer chess-playing algorithm (the computers have been better than humans for about a decade now), Stockfish.  Part of what is intriguing about their claim is that the new algorithm leans entirely from self-play with no human data involved — plus the learning process is apparently stunningly fast this way.

Something is weird to me about the training time they are reporting, though.  Key things we know about how AlphaZero works:

  1. Deep convolutional networks and reinforcement learning. This is a classifier-based approach that is going to act the most like a human player.  One way to think about this is if you could take a chess board and classify it as a win for white, win for black, or draw (with a perfect classifier algorithm).  Then to move, you simply look at the position after each of your possible moves and pick the one that is the best case for your side (win or draw).
  2. Based on the paper, the classifier isn’t perfect (yet). They describe using a Monte-Carlo tree search (MCTS), which is how you would use a pretty good classifier.  Standard chess algorithms rely more on Alpha-Beta (AB) tree search.  The key difference is that AB search is “broader” and searches every possible move, response move, next move, etc. as far as it can.  The challenge for games like Chess (and even more for Go) is that the number of positions to evaluate explodes exponentially.  With AB search, the faster the computers, the deeper you can look and the better the algorithm plays.  Stockfish, the current world champ, was searching 70m moves/sec for the match with AlphaZero.  MCTS lets you search smarter, more selectively, and only check moves that current position makes likely to be good (which is what you need the classifier for).  AlphaZero is reported at searching only 80k moves/sec, about a thousand times fewer than Stockfish.

That all makes sense.  In fact, this approach is one we were thinking about in the early 90s when I was in graduate school at CMU talking about (and playing) chess with Fernand Gobet and Peter Jansen.  Fernand was a former chess professional, retrained as a Ph.D. in Psychology and doing a post-doc with Herb Simon.  Peter was a Computer Science Ph.D. (iirc) working on chess algorithms.  We hypothesized that it might be possible to play chess by using the patterns of pieces to predict the next move.  However, it was mostly idle speculation since we didn’t have anything like the classifier algorithms used by DeepMind.  Our idea was that expert humans have a classifier built from experience that is sufficiently well-trained that they can do a selective search (like MCTS) of only a few hundred positions and manage to play as well as an AB algorithm that looked at billions of positions.  It looks like AlphaZero is basically this on steroids – a better classifier, more search and world champion level performance.

The weird part to me is how fast it learned without human game data.  When we were thinking about this, we were going to use a big database of grandmaster games as input to the pattern learner (a pattern learner/chunker was Fernanad’s project with Herb).  AlphaZero is reported as generating its own database of games to learn from by ‘playing itself’.  In the Methods section, the number of training games is listed at 44m.  That looks way too small to me.  If you are picking moves randomly, there are more than 9m positions after 3 moves and several hundred billion positions after 4 moves.  AlphaZero’s training is described as being in up to 700k ‘batches’ but even if each of those batches has 44m simulations, there’s nowhere near enough games to explore even a decent fraction of the first 10 or so moves.

Now if I were training a classifier as good as AlphaZero, what I would do is to train it against Stockfish’s engine (the previous strongest player on the planet) for at least the first few million games, then later turn it loose on playing itself to try to refine the classifier further.  You could still claim that you trained it “without human data” but you couldn’t claim you trained it ‘tabula rasa’ with nothing but the rules of chess wired in.  So it doesn’t seem that they did that.

Alternately, their learning algorithm may be much faster early on than I’d expect.  If it effectively pruned the search space of all the non-useful early moves quickly, perhaps it could self-generate good training examples.  I still don’t understand how this would work, though, since you theoretically can’t evaluate good/bad moves until the final move that delivers checkmate.  A chess beginner who has learned some ‘standard opening theory’ will make a bunch of good moves, then blunder and lose.  Learning from that game embeds a ‘credit assignment’ problem of identifying the blunder without penalizing the rating of the early correct moves.  That kind of error is going to happen at very high rates in semi-random games. Why doesn’t it require billions or trillions of games to solve the credit assignment problem?

Humans learn chess in many fewer than billions of games.  A big part of that is coming from directed (or deliberate) practice from a teacher.  The teacher can just be a friend who is a little better at chess so that the student’s games played are guided towards the set of moves likely to be good and then our own human machine learning algorithm (implicit learning) can extract the patterns and build our classifier.  The numbers reported on AlphaZero sound to me like it should have had a teacher.  Or there are some extra details about the learning algorithm I wish I knew.

But what I’d really like is access to the machine-learning algorithm to see how it behaves under constraints.  If our old hypothesis about how humans play chess is correct, we should be able to use the classifier and reduce the number of look-ahead evaluations to a few hundred (or thousand) and it should play like a lot more like a human than Stockfish does.

Links to the AlphaZero reporting paper:




Nov 07

Cognitive Symmetry and Trust

A chain of speculative scientific reasoning from our work into really big social/society questions:

  1. Skill learning is a thing. If we practice something we get better at it and the learning curve goes on for a long time, 10,000 hours or more.  Because we can keep getting better for so many hours, nobody can really be a top-notch expert at everything (there isn’t time).  This is, therefore, among the many reasons why in group-level social functioning it is much better to specialize and have multi-person projects done by teams of people specializing in component steps (for tasks that are repeated regularly).  The economic benefits of specialization are massive and straightforward.
  2. However, getting people to work well in teams is hard. In most areas requiring cooperation, there is the possibility of ‘defecting’ instead of cooperating on progress – to borrow terms from the Prisoner’s Dilemma formalism.  That powerful little bit of game theory points out that in almost every one-time 2-person interaction, it’s always better to ‘defect,’ that is, to act in your own self-interest and screw over the other player.
  3. Yet, people don’t. In general, people are more altruistic than overly simple game-theory math would predict.  Ideas for why that model is wrong include (a) extending the model to repeated interactions where we can track our history with other players and therefore cooperation is rewarded by building a reputation; (b) that humans are genetically prewired for altruism (e.g., perhaps by getting internal extra reward from cooperating/helping); or (c) that social groups function by incorporating ‘punishers’ who provide extra negative feedback for the non-cooperators to reduce non-cooperation.
  4. These three alternatives aren’t mutually exclusive, but further consideration of the (3a) theory raises some interesting questions about cognitive capacity. We interact a lot with a lot of different people in our daily lives.  Is it possible to track and remember everything about our interactions in order to make optimal cooperate/defect decisions?  Herb Simon argued (Science, 1980) that we can’t possibly do this, working along the same lines as his ‘bounded rationality’ reasoning that won him the Nobel Prize in Economics.  His conclusion was that (3b) was more likely and showed that if there was a gene for altruism (he called it ‘docility’), it would breed into the population pretty effectively.
  5. No such gene has yet been identified and I have spent some time thinking about alternate approaches based on potential cognitive mechanisms for dealing with the information overload of tracking everybody’s reputation. One really interesting heuristic I ran is the Symmetry Hypothesis, which I have slightly recast for simplicity.  This idea is a hack to the PD where you can reason very simply as follows: If the person I am interacting with is just like me and reasons exactly as I do, no matter what I decide, they are going to do the same and in this case, I can safely cooperate because the other player will too.  And if I defect, they will also (potentially allowing group social gains through competition, which is a separate set of ideas).
  6. Symmetry would apply in cases where the people you often interact with are cognitively homogeneous, that is, where everybody thinks ‘we all think alike.’ Here, where ‘we’ can be any social group (family, neighborhood, community, church, club, etc.).   If this is driving some decent fraction of altruistic behavior, you’d see strong tendencies for high levels of in-group trust (compared with out-group), and particularly in groups that push people towards thinking similarly.  You clearly do see those things, but their existence doesn’t actually test the hypothesis – there are many theories that predict in-group/out-group formation, that these affect trust, that people who identify in a group start to think similarly.  Of note, though, this idea is a little pessimistic because it suggests that groupthink leads to better trust and social grouping should tend to treat novel, independent thinkers poorly.
  7. Testing the theory would require data examining how important ‘thinks like me’ is to altruistic behavior and/or how important cognitive homogeneity is to existing strong social groups/identity. This is a potential area of social science research a bit outside our expertise here in the lab.
  8. But if true, the learning-related question (back to our work) is whether a tendency to rely on symmetry can be learned from our environment. I suspect yes, that feedback from successful social interactions would quickly reinforce and strengthen dependency on this heuristic.  I think that this could cause social groups to become more cognitively homogeneous in order to be more effectively cohesive.  Cognitively homogeneous groups would have higher trust, cooperate better and be more productive than non-homogeneous groups, out-competing them.  This could very well create a kind of cultural learning that would persist and look a lot like a genetic factor.  But if it was learned (rather than prewired), that would suggest we could increase trust and altruism beyond what we currently see in the world by learning to allow more diverse cognitive approaches and/or learning to better trust out-groups.


I was moved to re-iterate this chain of ideas because it came up yet again in conversational drive into politics among people in the lab.  Although our internal debates usually center around how different groups treat the out-groups and why.  Yesterday, the discussion started with observing that people we didn’t agree with seemed often to be driven by fear/distrust/hate of those in their out-groups.  However, it was not clear that if you didn’t feel that way, whether you had managed to see all of humanity as your in-group or instead had found/constructed an in-group that avoided negative perception of the out-groups.  We did not come to a conclusion.

FWIW, this line of thinking depends heavily on the Symmetry idea, which I discovered roughly 10 years ago via Brad DeLong’s blog (http://delong.typepad.com/sdj/2007/02/the_symmetry_ar.html).  According to the discussion there, it is also described as the Symmetry Fallacy and not positively viewed among real decision scientists.  I have recast it slightly differently here and suspect that among the annoying elements is that I’m using an underspecified model of bounded rationality.  That is, for me to trust you because you think like me, I’m assuming both of us have slightly non-rational decision processes that for unspecified reasons come to the same conclusion that we are going to trust each other.  Maybe there’s a style issue where a cognitive psychologist can accept a ‘missing step’ like this in thinking (we deal with lots of missing steps in cognitive processing) where a more logic/math approach considers that anathema.

Aug 30

Evidence and conclusions

I think this should be the last note on this topic for awhile, but since it’s topical a new piece of data popped up related to possible sources of gender outcome differences in STEM-related fields.


The new piece of data was reported in the NY Time Upshot section, titled “Evidence of a Toxic Environment for Women in Economics”


The core finding is that on an anonymous forum used by economists (grads, post-doc, profs) there are a lot of negatively gendered associations for posts in which the author makes explicitly gendered reference.  In general, posts about men are more professional and posts about women are not (body-based, sex-related and generally sexist).  Note that “more” here is done as a likelihood ratio, mathematically defined but the effect size is not trivially extracted. Because we always like to review primary sources, I dug up the source, which has a few curious features.

First, it’s an undergraduate thesis, not yet peer-reviewed, which is an unusual source.  However, I looked through it and it is a sophisticated analysis that looks to be done in a careful, appropriate and accurate way (mainly based on logistic regression).  I read through it a bit and the method looks strong, but is complex enough that there could be an error hiding in some of the key definitions and terms.

Paper link:


Of note, the paper seems to be a careful mathematical analysis of something “everybody knows” which is that anonymous forums frequently include high rates of stereotype bias against women and minorities.  But be very careful with results that confirm what you already know, that’s how confirmation bias works.  In addition, economists may not be an effective proxy for all STEM fields.  I don’t know of a similar analysis for Psychology, for example.

But as an exercise, lets consider the possibility that the analysis is done correctly and truly captures the fact that there are some people in economics who treat women significantly differently than they treat men, i.e., that their implicit bias affects the working environment.  So we have 3 data points to consider.

  1. There are fewer women in STEM fields than men
  2. There are biological differences between men and women (like height)
  3. There are environmental differences that may affect women’s ability to work in some STEM fields

The goal, as a pure-minded scientist is to understand the cause of (1).  Why are there fewer women in STEM?  The far-too-frequent inference error is when people (like David Brooks) take (2) as evidence that (1) is caused by biological differences.  That’s simply an incorrect inference.

It turns out to be helpful to know about (3) but only because it should reduce you to less certainty that (2) implies (1).  It’s still critical to realize that we do not know that environment causes (1) either.  All we know is that we have multiple hypotheses consistent with the data and we don’t know.

What we do know is that (3) is objectively bad socially.  Even if (2) meant there were either mean or distributional differences between men and women, the normal distribution means there are still women on the upper tail and if (3) keeps them out of STEM, that hurts everybody.

The googler’s memo assumes (2) and reinforces (3), which is clearly and objectively a fireable offense.


Aug 11

See the problem yet?

The entirely predictable backlash against Google for firing the sexist manifesto author has begun.  Among the notable contributors is the NY Time Editorial page in the form of David Brooks.  In support of his position that the Google CEO should resign, he’s even gone so far as to dig up some evolutionary psych types to assert that men and women do, indeed, differ and therefore the author was on safe scientific ground.

The logical errors are consistent and depressing.  Nobody is arguing that men and women don’t differ on anything. The question is whether they differ on google-relevant work skills.  Consider the following 2 fact-based statements:

  1. Men are taller than women
  2. Women are more inclined towards interpersonal interactions

There is data to support both statements on aggregate and statistically across the two groups.  The first statement is clearly a core biological difference with a genetic basis (but irrelevant to work skills at Google).  However, the fact that (1) has a biological basis does not mean the second statement does.  The alternative hypothesis is that (2) has arisen from social and cultural conditions, not something about having XX or XY genes (or estrogen/androgen).  The question is between these two statements:

2a. Women are more inclined towards interpersonal interactions because of genetic differences

2b. Women are more inclined towards interpersonal interactions because they have learned to be

And while statement 2 is largely consistent with observations (e.g., survey data on preferences), we have no idea at all which of 2a or 2b is true (or even if the truth is a blend of both).  Just because an evo psych scientist can tell a story about how this could have evolved does not make it true either, it just means 2a is plausible (evo psych cannot be causal).  It’s unambiguous that 2b is also plausible. There simply isn’t data that clearly distinguishes one versus the other and anybody who tells you otherwise is simply exhibiting confirmation bias.

And anybody even considering that the firing was unjust needs to read the definitive take-down of the manifesto by a former Google engineer, Yonatan Zunger, who does not even need to consider the science, just the engineering:


His core point is that software engineering at Google is necessarily a team-based activity, which (a) means the social skills the manifesto author attributes to women are actually highly valuable and (b) the manifesto author now no longer has any prospects for being able to be part of a team at Google because he has asserted that many colleagues are inferior (and many others will be unhappy they’d even have to argue the point with him).  Given that any of these skills is likely present in a normal distribution across the population and that Google selectively hires from the high end for both men and women, even if his pseudoscience was correct, you’d still have to fire him to maintain good relationships with the women you’ve already hired from the very high end of the skill distribution.

It’s kind of amazing how bad so many people are at basic logic once a topic touches their implicit bias.  Also depressing, too.

Aug 08

Anti-diversity “science”

Somebody at Google wrote a memo/manifesto arguing against diversity (mainly gender), caused something of a ruckus and got himself fired.  The author was clearly either trying to get terminated (as a martyr) or simply not very bright.  A particularly articulate explanation of why it is necessary to fire somebody who did what he did is here (TL;DR the memo author doesn’t seem to understand very important things about engineering or being part of a company that has engineering teams):



There are a number of interesting things about the whole episode, but one that we’ve had pop up in the lab recently in discussion is when it is possible for science to be ‘dangerous.’  The memo provides a convenient example since an early section attempts to wrap assumptions of biologically-driven gender differences in a thin veneer of science.  It’s a particularly poorly done argument, which I think makes it easier to see the overall inference flaws.

The argument is something like:

Men and women differ on X due to innate, biological and immutable differences (feel free to throw “evolutionary” in there as well if you’d like).  Men and women also differ on Y, which therefore must also be due to innate and immutable biological causes.

That there exists some value of X that make the first statement true (e.g., the number of X chromosomes) is not really worth arguing about. It should be obvious that you can’t assert the second statement regardless.  I usually frame it as a reminder to consider the alternate hypothesis, which we can state here as “Men and women differ on Z, which is due to cultural and environmental differences.”  Cross-cultural studies of gender differences make it unambiguously clear that there are values of Z that make this third statement true as well.

So what do we do about the middle statements, for values of Y for which we do not know if they are based mainly on nature or nurture?  Well, for one thing, we don’t make policy statements based on them.

For another, though, we’d like to do science that tackles difficult and thorny issues like nature vs nurture, individual differences, stability of personality measures, effects of education, culture and environment.  But how do we do science, which is often messy and even unstable on the cutting edge, when there are ideologically minded individuals waiting to seize on preliminary findings to drive a political agenda?

I don’t actually know. And that bothers me a fair amount.

If you doubt the danger inherent here, consider that you can make a pretty good case that the current president of the US is largely in place due to exactly this kind of bad, dangerous science.  The alt-right, which probably moved the needle enough to swing the very close election, is a big fan of genetic theories of IQ, especially ones that support the assumption that the privileged deserve all their advantages.  So they are highly invested in the discredited book The Bell Curve and the type of argument that got Larry Summer’s fired from Harvard (the ‘fat tails’ hypothesis of gender differences).  These ideas are generally focused through the lens of Ayn Rand’s Objectivism which asserts the moral necessity of rule of the privileged over the masses — which is, fwiw, pretty well reflected in the googler’s memo as well.


Aug 01

In Memoriam Howard Eichenbaum

Howard Eichenbaum was a great scientist in the field of memory.  He passed away unexpectedly last week at the age of 69.  His research was directly on the boundary between basic neuroscience and cognitive neuroscience, making the connections from neurobiological studies done with rats to how human memory works.  He was particularly well known as a teacher and mentor.  His passing was noted to the MDRS community by his good friend Neal Cohen, which elicited a remarkable outpouring of affection and kind words about Howard.  Neal also shared this obituary:

Howard B. Eichenbaum

Howard B. Eichenbaum, a William Fairfield Warren Distinguished Professor in the Department of Psychological and Brain Sciences at Boston University, and an internationally recognized figure in advancing our understanding of the fundamental nature and brain mechanisms of memory, died in Boston on July 21, 2017 following recent spine surgery at age 69.

Eichenbaum’s contributions to the field of memory research were profound, in helping us to better understand how memory works and how it is organized in the brain. His contributions come from his extensive empirical findings, including the discovery of “time cells” in the hippocampus; his integrative approach, committed to synthesizing results across species, across methods, and across levels of analysis; his important theoretical advances, concerning multiple memory systems of the brain; his creative long-term editorship of the journal Hippocampus, even while serving on the editorial boards of 10 other journals; his mentorship, guidance, and encouragement of scores of undergraduates, graduate students, postdoctoral fellows, and junior faculty who went on to have their own significant impact on the field; and his remarkable history of service and leadership.

Eichenbaum joined the Boston University faculty in 1996 after obtaining a BS in cell biology and a PhD in psychology at the University of Michigan, then holding faculty positions at Wellesley College (1977-1991), University of North Carolina at Chapel Hill (1991-1993), and SUNY Stony Brook (1993-1996).  At the time of his death, Eichenbaum’s other roles at Boston University included serving as founding Director of the Center for Memory and Brain and of the Cognitive Neurobiology Laboratory, after having earlier founded both the Undergraduate Program for Neuroscience and the Graduate Program for Neuroscience.

His contributions have been formally recognized with multiple honors, including being named a Fellow of the American Association for the Advancement of Science, the American Academy of Arts and Sciences, and the Association for Psychological Science; appointment to the Council of the Society for Neuroscience and the NIMH National Advisory Mental Health Council; and election to Chair, Section on Neuroscience, American Association for the Advancement of Science.

Eichenbaum’s non-science pursuits included coaching his two sons’ Little League baseball teams for many years, taking his sons around the country on their “baseball-parks-of-America tour” – a quest to catch a game at every Major League Baseball Park in America that spanned 6 summers across 15 years, kayaking in the waters off Chatham, MA, and rooting passionately for his Boston Red Sox and University of Michigan teams. He is survived by his beloved wife of 35 years, Karen J. Shedlack; two sons, both pursuing graduate studies, Alexander E. Eichenbaum and Adam S. Eichenbaum; 100-year-old mother, Edith (Kahn) Eichenbaum; brother, Jerold Eichenbaum; sister, Miriam Eichenbaum Drop; nephews Michael Eichenbaum and Dylan Drop; and niece, Tali Eichenbaum.

Jun 26

Confirmation bias

Mistaking data consistent with your hypothesis for data establishing your hypothesis is a surprisingly common mistake, even for highly trained, experienced scientists.  The subjective experience is common: you develop and carry around a theory on some topic and over the course of your day, you run into evidence (anecdotes, or other scientific findings) that would be predicted by your theory.  So you think, “hey, that fits too, my theory must be right.”

That’s fine for informal theory development, but when you want to really test your hypothesis, it doesn’t work.  After you identify data consistent with your theory, you need to figure out what theory that data actually rules out.  The problem is that it is often the case that competing theories make similar predictions, so data fitting your theory doesn’t prove the alternate theory wrong.  This is why the process of doing science generally requires collecting new data that is carefully designed to discriminate between alternate theories.  This requires both (a) figuring out the alternate theories and (b) designing a good experiment — and so it can be hard.

An interesting example that has been popping up again in popular discussion is the question of the genetic contribution to IQ.  This is an area where everybody thinks they have a strong theory and it’s remarkable how poorly almost everybody does at considering alternates.

For example, you think genes contribute strongly to IQ and you notice that apparently IQ-related success seems to run in families, so you think “Aha! My theory is supported.”  Maybe you’ll even correctly spot the ‘null hypothesis’ that “IQ doesn’t run in families” as being disconfirmed by these data.  But the real alternate is “environmental factors determine IQ more than genes” and since families largely share environments, your data don’t discriminate.

Note that this doesn’t mean we know which theory is correct.  Both theories are consistent with the observed data and no strong conclusions can be drawn.  I imagine this is very frustrating to non-scientists because you’d rather than a clear answer than a perpetual state of uncertainty.  Scientists have to live with uncertainty all the time and it can make it tricky to talk about your work — you want to make simple statements, but you don’t want to overstate your confidence.

The topic has come up because Charles Murray is once again in the news, happily going around talking about his theory that (a) genes are a major/primary determinant of IQ and (b) these genes vary substantially across race.  If you know the actual science being done around this area, you know that neither of those statements are established to the point of ruling out any plausible alternatives.  Even setting aside the question of “what is measured in an IQ test” we know for sure that genes have an effect, education has an effect, and there is increasing evidence of other non-education environmental effects (lead, stress, nutrition).  Nobody knows the relative importance of these effects — and really careful thinkers are also well aware the relative values change across samples within the population (e.g., nutrition effects don’t count for much across a sample that is all well-nourished across the lifespan).

But if somebody like Murray presents the messy data, knowing that a lot of racist listeners are going to simply hear confirmation bias, and then make no effort to argue for non-racist intervention policy to improve the environment (that is entirely supported by the data he presents), then the rest of us can confidently rule out the hypothesis that Murray isn’t a racist.

If you are really interested in the science in this area, there are much better people to be paying attention to:



Mar 06

Explaining neuroscience

I ran across this link referenced by its title:

A neuroscientist explains a concept at five different levels


I was initially worried it would annoy me, but eventually decided to take a look at it anyway, figuring it would be interesting at the level of thinking about your audience when describing a complex scientific concept.  On one hand, parts of it are better than expected.  On the other, some of the interactions were a bit odd (the college & graduate student — but maybe it was hard to edit it for the time allowed).

Overall, it’s a good micro example of choosing your language to be appropriate to what you expect your audience to know.  I also noted that for the two youngest explainees, the scientist presented things to get a nice ‘wow’ response, which is probably a good memory aid (and generally good in teaching or explaining).  For the other audiences, he did a lot more listening, which seems natural since there are a lot of different possible backgrounds for undergraduates and graduate students.

It was advertised as similar in spirit to the Feynman descriptions of how everyday things work.  I don’t think it reaches that level, but those are pretty extraordinary, so it’s not really a fair standard.

Jul 27

Surgical skill

One potential application for our basic studies of skill learning is understanding the development of skill in performing surgery.  So I was intrigued when happening to stumble across the following report of factors predicting successful surgical outcomes:

Surgeon specialization and operative mortality in United States: retrospective analysis

BMJ 2016; 354 doi: http://dx.doi.org/10.1136/bmj.i3571 (Published 21 July 2016) Cite this as: BMJ 2016;354:i3571


The take home message was that ‘specialization’ may be as important as repetitions in successful skill performance.  The following clipped paragraph captures many of the things I find fascinating here:

At the same time, the degree to which a surgeon specializes in a specific procedure may be as important as the number of times that he or she performs it.11121314 A surgeon who specializes in one operation may have better outcomes owing to muscle memory built from repetition, higher attention and faster recall as a result of less switching between different procedures, and knowledge transfer of outcomes for the same procedure performed in different patients.815161718 If this specialization hypothesis holds true, a surgeon performing 20 procedures of which all 20 are valve replacements (denoting 100% specialization in the procedure) would have lower operative mortality rates than a surgeon who performs 100 operations of which 40 are valve replacements (denoting 40% specialization in the procedure). In contrast, the volume-outcomes hypothesis would suggest that selecting the surgeon who performs 40 valve replacements would lead to superior outcomes for patients. To the best of our knowledge, no study has described a statistical association between a surgeon’s degree of specialization in a specific procedure and patients’ mortality.


Of note, this is not what we’d expect from our lab work.  We find reps to be the most important.  Is there something we are missing in our work that captures some factor based on ‘specialization’?  Or is the study being influenced by other variables.

Note the assumption in the text that specialization leads to better “muscle memory.”  We think that should derive from reps, but have often wondered about interference.  Interference among procedures would produce a specialization effect, but we’ve never observed interference among sequences learned together.  Another possibility is something like “rust” — perhaps less specialized surgeons have longer intervals between performances (we have seen something like that in our forgetting curves).

It’s also possible that these ‘unspecialized’ surgeons are simply performing too many surgeries and having fatigue or other state effects.  You wouldn’t be able to spot those effects without the analysis done here because practice would be improving performance and hiding any costs of doing too many.

Or maybe it is something else entirely…

Older posts «