May 16

Replicability advocate John Ioannidis might be a bad actor

When you publish a finding titled “Why most published research findings are false,” the impact of your report is likely to have two major effects. The first is to encourage scientists to perform their research carefully and rigorously to ensure robust, reliable conclusions. The second is to provide a touchpoint for a general anti-science agenda to support those who want to push dangerous, self-interested ideas and need to be able to say “don’t listen to science or scientists, listen to me.”

Like a lot of Psychology department, ours assumed the research was driven by the first idea and have done a lot of self-study and introspection to see what we can do to improve research processes. I have been frequently somewhat bemused by this as we have always had a strong commitment to rigorous science in my lab and my impression is that this is true of all of my colleagues here at Northwestern.

I have been more concerned about the persistent apparent misunderstanding associated with the phrase “fails to replicate.” We all know from basic statistics that this does not mean “false.” When a study does not achieve the standard statistical threshold of p<.05 confidence to reject the null hypothesis, it means the study didn’t work. Technically it means that magnitude of the effect size the study tried to measure was not robustly larger than the error in measurement. A “false” hypothesis means the effect size is truly zero. “Fails to replicate” doesn’t mean we are sure the effect was zero, but only that probably it is smaller than we hypothesized when the study was designed. A study with “power” to detect an 0.4 effect size won’t reliably find an 0.2 effect size, even though an 0.2 effect size is not zero and often meaningful. And power calculations are both probabilistic (80% power means 20% of rigorous studies don’t work) and require precise estimates of both the magnitude and variance of your measures, which are based on previous studies and may be imprecise, especially in a new relatively unstudied research area.

Nothing in the above paragraph is controversial or revolutionary. It’s basic statistics all scientists learn in their first stats class. But if you conflate ‘fails to replicate’ with ‘false’ as in the title of Ioannidis’s paper, you risk misleading a large segment of the non-scientist community who is not trained on these ideas. Maybe it was just an accident or a slightly sensationalized title to draw attention to the issue. Or maybe not.

Which is why this report from Buzzfeed (excellently sourced, you can check their links) about a recent report from Stanford with Ioannidis as a co-author is of particular interest. It is a paper claiming COVID-19 is not as dangerous as previously thought because there are many more people who have been exposed to it (i.e., the asymptomatic rate is potentially 50x higher than previously thought). Which would be very important science, if true, and so we’d want it to meet very rigorous standards. But…

  • One of the co-authors was so concerned about weak methodology, she refused to be associated with the paper. The conclusion depends on a test presence of COVID antibodies that has a very high false positive rate (potentially dramatically over estimating the number of asymptomatic cases). Furthermore, she was so concerned about the impact of the paper she filed a complaint to the Stanford research compliance office.
  • The manuscript was released to the public through a preprint mechanism leading to headlines in new media all over the world starting on April 17th before the manuscript had received any peer review at all.
  • Ioannidis was on Fox News a few days after the non-peer-reviewed preprint release telling their audience that the COVID virus was much less dangerous than previously thought. His arguments were then echoed around the world by those arguing to release movement and travel restrictions.
  • The owner of airline Jet Blue was found to be a donor who supported the research through a directed donation to Stanford, was unacknowledged on the manuscript, but was in constant email contact with the authors through the scientific and publication process.

This is all, of course, textbook ‘worst case scenario’ for non-rigorous science with the potential to have high and highly damaging impact. Ioannidis is quoted in the article as describing the results as preliminary but “the best we can do” and that his work is driven by data, not politics (“I’m just a scientist”).

As a scientist with long experience and training in drawing conclusions from data, looking at this and other peculiarities, I’m going to propose another hypothesis: the concern about the Replicability Crisis in psychology (and science broadly) is at least partly being driven by people with an anti-science agenda who want to de-emphasize the value of science in effective, accurate public policy.

When you promote this agenda, even in a well-meaning manner to promote improved practices, you may be accidentally furthering the cause of people who want to, for example, sell you hydroxychloroquine (snake oil) or claim drinking bleach will cure you of anything.

Instead, you can simply continue to do your science rigorously. Replicate findings that you think are important (we do a lot of “replicate and extend” in my lab, it’s pretty standard technique). Don’t rely on splashy new unexpected findings in a new research domain or methodology — we described those as “cool if true” and wait for the second study. Think about what your science will mean to people outside the scientific community as well as within it. And if somebody asks you, suggest journalists use phrases like “preliminary studies suggest” for that cool new result instead of “scientists say” (or worse, “a new study proves”).

Feb 20

Illinois Primary Day March 17

If you are attending the meeting of the Cognitive Neuroscience Society in Boston this year, you will be out of town for the IL primary that Tuesday. If you are registered in Evanston, you can vote early at the Civic Center starting on March 2.

If you are unfamiliar with voting in Evanston/Chicago, the election process is handled through Cook County and here is where you can get information about what will be on your ballot before heading to the polls.

One of the interesting things about local elections is that there may be a handful of offices and candidates you don’t know much about. For example, judges are elected and unless you happen to know somebody, it can be unclear how to use your vote well. There are both nonpartisan and partisan organizations that will provide information, endorsements and recommendations so that you can inform yourself easily and rapidly before voting (not linked here).

Voting is a straightforward and quick process here, especially early voting. Highly recommended.

Jan 06

Pushing the upper bound on cognitive performance

Over the holidays, I discovered a new chess competition variant being played and streamed by Chess.com: Puzzle Battle.  Chess puzzles are a well-known training device that also have something of a micro sub-community within the chess world for artistry in creating positions with a difficult to spot but winning move.  They are highly recommended to people trying to improve at chess.

A new approach to this idea has popped up on all the various online chess sites where the idea is to speed-solve these puzzles, usually ones that aren’t as hard as the original “chess studies.”  These get called things like ‘puzzle rush,’ ‘tactics trainer’ or the latest: ‘Puzzle Battle.’  In a battle, two top-level players solve as many puzzles as possible in 3m or until they make 3 mistakes where mistake = incorrect move in the winning line.

The top performers are routinely hitting 50 puzzles solved in 3m.  This has to be seen to be believed, e.g., in the linked clip from the quarter finals of a current tournament being run now.  In fact, if you aren’t very, very experienced in chess, there’s a very good chance you’ll have no idea what is going on.  If so, slow the video way down…

 

What is happening is that they are getting a new chess position with up to 30 pieces on the board.  Somewhere is a winning move for the side to move.  The position must be parsed, analyzed, and the correct winning move selected and implemented.  Sometimes the puzzle requires a sequence of 3-5 accurate moves to prove the win.  The best solvers are doing 50 of these in a row at a rate of 3.6s each.

At least for me, this pushed my understanding on the upper bound of human cognitive performance out another step or two.

Potentially of note, in this linked battle (spoiler alert), an untitled puzzle specialist is beating one of the top-10 GMs in the world.  Apparently, the database of puzzles on Chess.com only has about 20,000 examples and some of the competitors have memorized a significant fraction of the db. The super-GM is certainly better at chess than his opponent, but is solving more of the puzzles on the fly so he’s tending to score more in the 40s.  Then there’s the question of how one memorizes the winning line in 20k chess puzzles so as to be found, retrieved and executed in 3.6s.

Oct 25

Answers to some semi-frequently asked questions about memory

Hello,
We are working on a group speech for a school project and need to reach out to an expert in the field of psychology. We have a few questions about the topic of brain capacity and we would greatly appreciate it if you would take a look at them and get back to us.
Here are our questions:
Does exercise affect how well the brain functions?
Can certain exercises be done to improve brain capacity? If so, what are some examples?
How much information can the brain hold?
Where did the myth of only 10 percent brain capacity come from?
What general methods do you think work best to fix brain injuries?

Thank you!

 

Today the above email arrived and reflects questions that come up fairly often to me as a memory researcher.  I decided to answer them here on the blog for availability for future reference (I might have actually done this before, at some point I should look and collect similar posts).

General answers:

  • Physical exercise probably does not have much immediate impact on brain function other than being fatigued after exercising might very temporarily slow function.  Longer-term, cardio-vascular fitness appears to be important in healthy aging.  Maintaining physical fitness through middle age and late life looks to be very helpful in keeping your brain working well.
  • Cognitive exercises also seem to help healthy aging.  It is less clear how well cognitive exercise helps younger people.  Most things that keep you cognitively active result in learning and for younger people and the information learned is probably the most important thing.  Being cognitively active mainly means doing interesting things that make you think.  Staying active when you are older probably helps with something like ‘capacity’ while doing this when you are young makes you smarter in a more direct way.
  • The brain appears to have the capacity to hold everything you are capable of learning over the course of your life.  It doesn’t appear to work that way — we forget things a lot — but this is due to problems storing everything permanently more than running out of room to store them.  Storing memories is slower than experiencing things, so a lot of your experiences don’t end up in your long-term memory.
  • The brain is made of neurons.  Neurons are electrically active when they are functioning.  You definitely don’t want them all firing at the same time — that would cause a seizure, as in epilepsy.  If the 10% idea has any basis at all, it’s related to about how many neurons could be firing at about the same time.  There is no hidden, reserve unused capacity in your brain.
  • The major problem with brain injuries is that neurons don’t grow back.  Almost all treatment and rehabilitation is training new processes to work with the remaining uninjured parts of the brain, and learning to work around the damaged parts.  This can work surprisingly well to recover function, but you can never really recover or replace lost brain tissue, unfortunately.

Aug 29

Saying farewell to Dr. Cohen

Picture after last lab lunch with Dr. Cohen before he moves on to UPenn

Mar 25

Recreational lockpicking

On the theme of demonstrations of exceptional skills via youtube, I recently ran across the channel of the LockPickingLawyer (https://www.youtube.com/channel/UCm9K6rby98W8JigLoZOh6FQ/about). He posts videos of picking various kinds of locks together with evaluations on how effective the locks are as security devices. I found this to be highly interesting for a variety of reasons.

First, this seems to be a wonderful example of a highly implicit skill. The mechanical interaction between the tool and the internal elements of the lock cannot even be seen. You could try to explain to me how to do this, but there’s absolutely no question you’d need lots of practice to carry out the procedure successfully. And yet, people obviously not only learn the skill, but get very, very good at it.

Second, this skill is even more pointless than learning to yo-yo (or speed solve a Rubik’s cube).  Locks are peculiar security devices in that they are a minor deterrent at most to actual criminals.  In most circumstances there’s a brute force way around a lock (bolt cutter, break a window) if somebody is determined to break in.  Probably the mostly likely case of somebody picking a lock is a locksmith helping you with a door when you’ve lost or misplaced the key.  And locksmith’s have access to tools that make the perceptual-motor skill relatively less critical.

But if you read the comments on the Lock Picking Lawyer’s videos, you’ll quickly discover this is a hobby that seems to have a reasonably sized interest base.  It appears to be called Lock Sport (http://locklab.com/) where people compete on speed or challenge themselves with increasing difficulty in a way reminiscent of puzzle-solvers (there’s a robust puzzle solving community on youtube as well, but puzzle solving seems like a very explicit process).

I’ve never met anybody who is into this — that I know off.  But if I was picking locks for fun, I don’t think I’d talk about it with people outside the community all that much.  People would likely think you were some kind of aspiring criminal.

Which makes it a great example of a skill some people get really good at, that takes many, many hours of practice and has no particular external value in achieving.

So why do people get good at it?  I can hazard a couple of guesses. In the video comments,some people report the process of practicing to be calming in a way that is reminiscent of ‘flow’ states, which we have thought might be related to dopamine.  Relatedly, the process of picking a lock probably produces a real substantial RPE (reward prediction error) feeling where you struggle with the task for a long time, then suddenly get an unexpected payoff of success.

Honestly, it looks like it might be a fun thing to learn.  But I think I’m not going to go buy tools and try it because I don’t want people to judge me.

Jan 04

Implicit Comics

Many people have emailed me the following comic:

https://www.smbc-comics.com/comic/conscious

Yes, I appreciate it.  No, I didn’t have anything to do with the author/artist.

Happy New Year!

P.S. I wish I could embed it more directly here, but I don’t want to deprive the author of links/traffic. It’s about implicit memory.

Nov 12

Lessons Learned on Scientific Field Work in Esports

Scene: Mandalay Bay Esports Stadium, a 12,000-seat arena in Las Vegas, Nevada. Two young men sip water and wipe their sweaty hands as they wait for a cue that it’s time to perform for the filled stadium seats and the two hundred thousand viewers online. Digital versions of themselves pose in sponsored gear on loop on the supersized screens above them. The cue is given – their faces harden. They don padded headsets to drown out the crowd. Their hands become a blur. The next 8 minutes will determine the Evo World Champion in Super Smash Bros Melee and the winner of $8,000 for a weekend of competitive gaming.

To perform at the level of a world champion, these players must perform at the peak of human cognition: based on knowledge they’ve accrued over a decade of play, they make high-level strategic and predictive gameplay choices by executing sensorimotor action combinations at the level of frames-per-second. Think ultra-high-speed chess with extreme, precise motor demands and you’ve begun to grasp esports.

As a scientist, I’ve made it my goal to understand the cognitive, physiological, and social components that ultimately determine who gets up on that stage. The psychological study of expertise has made clear that both practice and talent support cognitive and sensorimotor skill acquisition and performance. However, to understand the complete picture, we must also look at emotion regulation skills that allow for performing under pressure and the role of identity and stereotypes in granting access to the game at all. My work aims to take each of these components into account to find out what makes a Top 100 player.

Over the summer, I attended four national tournaments to work with 20 of the Top 100 players in the Super Smash Bros Melee competitive scene. My protocol involved pre-tournament surveys, cortisol sampling (a 45-minute commitment each morning for participants), a wrist-based heartrate monitor to be worn for the whole tournament, a 45-minute interview on skill and social support, and a 20-minute cognitive testing session. My plan is to combine the variance from each of these measures to explain both performance in that tournament and overall Top 100 ranking. Many of these measures had to be taken in the field, a setting often neglected in traditional psychology graduate student training. My pilot testing taught me three important lessons about field world which I explore below.

 

  • No matter how organized you are, things will go wrong and you will be stressed.

 

Be prepared to fail and MacGyver your way through travel and data collection.

For data collection, you can prepare for the worst by having access to everything both digitally and hard copy and keeping a checklist of all materials. Important tidbits like laptop chargers and labels tend to take a mental backseat to the primary study materials like questionnaires, but they’re just as important. Compartmentalizing helps too: I had a carryon dedicated to study materials and organized it so that I could look into it and immediately know what I was missing.

But even when you’ve prepared, some things will be out of your control. Different tournament venues provide different testing environments, so I had variation in screen visibility, ambient noise, and subject comfort for every cognitive test and interview. Despite a lengthy verbal explanation and multiple printouts on how to do saliva sampling to measure daily cortisol rhythms, subjects still took their samples at the wrong time or not at all. A heartrate monitor is still MIA from a subject who forgot to return it. Despite my plans, every single aspect of the study was challenged by the environment and the subjects themselves, factors outside of my control.

Even things within your control will go wrong. The stress of travel and data collection will compromise your executive functioning, especially if you’re working solo to save grant money. I had to retest my very first subject because I forgot to unmute the computer when audio cues are integral to the reaction time task – not to mention the multiple, small, silly mistakes I made from fatigue by the end of the month-long data collection marathon. Externalizing everything possible will unburden your working memory and as a result bolster your mental energy. When things go wrong, a master sheet of notes will help explain funky gaps in data and make judgments about which data to discard. Every detail belongs on paper so it won’t weigh you down: who was retested, the order of presentation, the testing environment, who wore what color watch, even what real names match up with gamertags.

In field work, so many factors are outside of your control that you cannot achieve perfection. Prepare the best you can to collect high-quality data, but also be forgiving of yourself of the mistakes and issues that will pop up along the way. My perfectionist tendencies in data collection only compounded my stress. Do maximum prep work so you can fall back on it when things go wrong or you feel overwhelmed.

 

  • Know the unique demands of your population.

 

In an ideal situation, you’ll be doing field research within your own community. Being a community member means you’re familiar with domain-specific skills and measure them better in your work; you’ll have a network from which to recruit, split rooms, and socialize; and you’ll have a gut sense of ethics as far as what is exploitative versus uplifting to the community. But even if you aren’t a member of the community you’re studying, it’s worth it to do your research on the unique demands within it. Otherwise you risk compromising your science or even failing to collect data at all.

For example, knowing the schedules of your participants (and potential participants) is crucial. Perhaps at a military base or boarding school subjects do not have 45 minutes after awakening to complete a saliva sampling procedure. Similarly, I had many subjects wake up 20 minutes before competing, meaning I couldn’t measure their cortisol awakening response. Schedules are important for recruitment too: I don’t approach people who are competing in 10 minutes, and on the flip side, I would be comfortable recruiting someone knowing they were done for the day and just hanging out. (Also, as a rule of thumb, I never asked people to participate right after they lost.)

I also knew walking in that I would have trouble with subjects showing up. Although I had many people do surveys beforehand and commit to times to meet, actually scheduling someone at a tournament is impossible because their schedule will vary with their performance and a thousand other factors, like who wants to hang out right then. Plus Smashers tend to party at national tournaments, meaning lots of oversleeping, hangovers, and lost materials. Communicating with my subjects via text instead of email was a good way to keep up with ever-changing availability. Furthermore, my regular attendance at national tournaments meant more opportunities to fill in cells for each person rather than absolutely needing to complete everything in one weekend.

Finally, sexual assault is a problem in the Smash scene. I as a woman was ready to navigate differently interactions with players who have a history of assaulting women – testing in public places rather than hotel rooms, limiting the information they know about me, that sort of thing. The only way I could know who to take caution with came from being embedded in the community’s network.

 

  • Your social identity intersects and determines your ability to do research and the content and quality of the data you do collect.

 

Most gaming scenes, especially for competitive and fighting games, are majority male. I do not exaggerate when I say that women make up less than 5% of Smash tournament attendees and about 2% of competitors. My lay theory on that percentage difference is that although many of us wish to be respected and feared top players, competitors and commentators have an especially bright spotlight of the male gaze put upon them, making them the most vulnerable to sexism and abuse – so a lot of us engage with the community by organizing tournaments and players, making art, or creating photo and video content. Whatever role we occupy, it’s safe to say that in gaming spaces, women are The Other, not the standard player, and with that minority status comes stereotypes. Unavoidable stereotypes about my gender changed the way I thought, felt, and worked on a daily basis, energy that could have gone to another hour on the ground at a tournament, another dinner out to network, or another reminder text for cortisol samples.

A huge amount of my energy during my summer tournament run was spent on first-impression management. Every morning was a struggle to handle the girl gamer stereotypes I may be slotted into, to balance impressions of my competence with my femininity. When I wore a skirt and makeup, I worried that people labeled me an “egirl” with the accompanying stereotypes of social climbing, promiscuity, and two-facedness. And indeed, in those clothes, I had better success cold-recruiting players and making friends but I was assumed to have no game competence – no one wanted to talk about the game, no one probed my scientific theories, but they were happy to feel out my social network and ask about my plans for the afterparty. If I presented more masculine, a suit jacket and loose shirt, many top players hardly gave me a second glance after I introduced myself, but people listened longer when I talked about my theories and directed conversations in more academic directions. (I always relied on the authority of a suit jacket in formal research presentations at events.) In the end, no outfit could reconcile the competing identities of scientist, dry and intelligent and unsexual and masculine thus competent, with a gamer girl, comfortable in her body and approachable and granted access to important figures of the community. My attempts to dress and act in between these two extremes only led to assumed incompetence of both social and research domains in my first meetings with people.

Even beyond first impressions, I spent so much energy navigating conversations to establish who I am, to research subjects and community members alike. Sometimes it was more useful to talk about my Smash commentary first, to establish my in-group gamer status, and in subsequent conversations bring up my research. Other times, people were clearly engaged in the science but assumed I knew nothing about the game, an assumption that had to be corrected over multiple interactions filled with jokes and references to in-depth game knowledge or my commentary gigs before they would even talk to me in an interview on a natural (rather than dumbed-down) level.

Besides my appearance, I managed my social impression in other ways. Players take a video game sexism survey as part of my initial questionnaires and I have no illusions that my status as a female researcher and gamer influence their likelihood of reporting their true thoughts. I do what I can to manage that: I recruited online rather than face-to-face as much as possible. And although I have never concealed my gender, the ambiguity of being a woman with a man’s name has afforded me much within the Smash scene.

If it isn’t obvious already, I’ll explicitly state it: every worry I’m expressing is directly linked to the integrity of my research. When I ask them about practice habits, I need them to speak to me using the highly domain-specific knowledge that may set them apart from one another – every Top 100 player is going to say they practice a certain number of hours, but how many use a CPU 20XX Fox set to approach for their chain-grab practice versus studying frame data on a computer before picking up the controller? When I ask them in surveys and in person about the role of women in the Smash community, I want them to be as comfortable as possible telling me the truth rather than sugar-coating their answers because they want to seem progressive to a scientist or because they want to date me. To even access members of a special population like the Top 100 requires networking and trust-building, all dependent on the above impressions. In the end, the identities that subjects assign to me are inextricable from the answers they give me and from accessing their expertise. Thus, it’s not something I can ignore.

In the end, my identity as a woman, scientist, and gamer all intersect to create a unique set of circumstances that, in the field and in the data, affect my research profoundly. I have a unique, diverse, and valuable perspective compared to the average gamer and games researcher, one reason I continue on despite challenges. Arguably, I have better access to top-level players to study. In other ways, I am permanently an out-group member who may never learn the truth of gender-gaming stereotypes or high-level game-specific knowledge on my own. The lesson here is that every scientist in search of truth, in the lab or otherwise, needs to remember how their identity is affecting their access to and interpretation of truth in specific ways, every step of the way.

 

Field work requires a balance of priorities: the integrity of the research, reducing noise that comes from uncontrollable circumstances, and your wellbeing as a researcher. Hopefully, these lessons I’ve learned are valuable to other researchers hoping to dive into field work themselves.

 

 

Related readings:

  • Taylor, T.L. (2012). Raising the stakes: E-sports and the professionalization of computer gaming. Cambridge,
    MA: MIT Press.
  • Taylor, N. (2018). I’d rather be a cyborg than a gamerbro: How masculinity mediates research on digital play. MedieKultur: Journal of media and communication research34(64), 21.
  • Consalvo, M. (2012). Confronting toxic gamer culture: A challenge for feminist game studies scholars. Ada: A Journal of Gender, New Media, and Technology, (1).

Oct 11

Implicit/Machine learning gender bias

I ran across a headline recently “Amazon scraps secret AI recruiting tool that showed bias against women” that I realized provides a nice example of a few points we’ve been discussing in the lab.

First, I have found myself describing on a few recent occasions that it is reasonable to think of implicit learning (IL) as the brain’s machine learning (ML) algorithm.  ML is a super-hot topic in AI and data science research, so this might be a useful analogy to help people understand what we mean by studying IL.  We characterize IL as the statistical extraction of patterns in the environment and the shaping of cognitive processing to maximize efficiency and effectiveness to these patterns.  And that’s the form of most of the really cool ML results popping up all over the place.  Automatic statistical extraction from large datasets that provides better prediction or performance than qualitative analysis had done.

Of course, we don’t know the form of the brain’s version of ML (there are a lot of different computational variations of ML) and we’re certainly bringing less computational power to our cognitive processing problems than a vast array of Google Tensor computing nodes.  But perhaps the analogy helps frame the important questions.

As for Amazon, the gender bias result they obtained is completely unsurprising once you realize how they approached the problem.  They trained an ML algorithm on previously successful job applicants to try to predict which new applicants would be most successful.  However, since the industry and training data was largely male-dominated, the ML algorithm picked up this as a predictor of success.  ML algorithms make no effort to distinguish correlation from causality, so they will generally be extremely vulnerable to bias.

But people are also vulnerable to bias, probably by basically the same mechanism.  If you work in a tech industry that is male dominated, IL will cause you to nonconsciously acquire a tendency to see successful people as more likely to be male.  Then male job applicants will look closer to that category and you’ll end up with an intuitive hunch that men are more likely to be successful — without knowing you are doing it or intending any bias at all against women.

An important consequence of this is that people exhibiting bias are not intentionally misogynistic (also note that women are vulnerable to the same bias).  Another is that there’s no simple cognitive control mechanism to make this go away.  People rely on their intuition and gut instincts and you can’t simply tell people not to as not doing so feels uncomfortable and unfamiliar.  The only obvious solution to this is a systematic, intentional approach to changing the statistics by things like affirmative action.  A diverse environment will eventually break your IL-learned bias (how long this takes and what might accelerate it is where we should be looking to science), but it will never happen overnight and will be an ongoing process that is highly likely to be uncomfortable at first.

In theory, it should be a lot quicker to fix the ML approach.  You ought to be able to re-run the ML algorithm on a non-biased dataset that equally successful numbers of men and women.  I’m sure the Amazon engineers know that but the fact that they abandoned the project instead suggests that the dataset must have been really biased initially.  You need a lot of data for ML and if you restrict the input to double the size of the number of successful women, you won’t have enough data if the hiring process was biased in the past (prior bias would also be a likely reason you’d want to tackle the issue with AI).  They’d need to hire a whole lot more women — both successful and unsuccessful, btw, for the ML to work — and then retrain the algorithm.  But we knew that was the way out of bias before we even had ML algorithms to rediscover it.

Original article via Reuters: https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

Jul 02

Real-World Adventures in Science, part 1: Aconcagua

On December 19, 2017, I found myself with a decision to make.  I was standing on the side of Mount Aconcagua in Argentina at an altitude of 19,700 feet, about three thousand feet from the summit of the highest mountain in the Western Hemisphere.  It was a snowy whiteout and the winds were extremely strong—this was Aconcagua’s infamous Viento Blanco (Spanish for the White Winds).  In addition to the inclimate weather, I no longer had a functioning camping tent or stove as they had not survived the strong winds. I had been forced to rely on the kindness of other climbers headed for the summit to shelter overnight and melt ice for water.  The decision was: attempt the summit in the somewhat dangerous weather conditions or return back down the mountain now?

Planning for this trip had started many months earlier.  Together with some climbing colleagues, we had even been sleeping in low-oxygen tents for four months as part of training to pre-acclimatize to high altitude conditions.  That wasn’t a fun experience and the feeling of chronic sleep deprivation was only partly balanced by some solid longitudinal data on changes in blood oxygen levels of the climbing team following overnight hypoxic simulations.  There was also the travel to Chile, the bus to Argentina, and days climbing up the mountain to this point.

Turning around meant not being able to follow through on all the invested effort. The Air Force had built a social media campaign around the climb and everything.  But making a mistake and attempting a dangerous summit has substantial risks.  As Paul pointed out later, we learn by trial and error in the lab, but high altitude mountaineering is often not a place where you can learn from a really bad choice…you might not get another chance at life or death decisions.

The theoretical lab model is that decision making is a mix of deliberate processing and intuitions.  We had a climbing plan, we trained for a wide variety of situations, we tried to think everything through in advance.  But then there are intuitions in the moment, where you have a gut sense of a reasonable or an unreasonable amount of danger.  Decisions are supported by a mix of these explicit and implicit processes and we are interested in how the mixing of these processes is affected in real circumstances.

An idea we had was that low oxygen, high altitude climbing might asymmetrically impair explicit decision-making compared to implicit decision-making processes.  As a proof-of-concept, I tried to explore this idea with some concrete data by bringing tools for cognitive assessment on the climb.

An Android tablet was used to administer cognitive tasks to our climbing team at different camps up the mountain. Cognitive data was successfully collected at elevations over 18,000 feet! Working memory, the explicit ability to hold things in mind, was indexed using a digit span task. In contrast, a test of simple reaction time was used to signify the goal of assessing implicit processing. The cognitive data will not be interpreted as this was not systematic data collection, but this effort highlighted the need to develop an accurate index of the implicit system, which is fundamental for multiple memory systems research to progress in this domain.

Heart rate and GPS data were also recorded throughout the excursion using the Garmin Fenix 3 to complement the cognitive assessment. This wearable platform provided the ability to collect critical physiological and contextual data about the climbing team for improved safety monitoring and novel data analytics. The Android tablet and Garmin Fenix 3 were supplemented with an external USB connectable battery pack and a portable solar charger for multi-day data collection. Hardware testing on the mountain found that battery life was a limiting, yet manageable factor in this endeavor.

This proof-of-concept venture demonstrates the feasibility of collecting real-world cognitive and physiological data during high-altitude mountaineering. Back on the mountain, our climbing team made the decision to not push through the harsh weather for the summit of Aconcagua. Decision making here depended on the interaction of multiple memory systems, and a bad decision could have prevented us from safely returning to basecamp. It is imperative to understand how multiple memory systems interact during decision making in extreme environments, because this could be the difference between life or death.

[In the lab, studies of decision-making are done with highly artificial tasks in tightly controlled situations (is this sine-wave grating an A or a B?).  However, our theory of how multiple memory systems contribute to decision making is supposed to apply to complex, real-world, high-leverage decision making.  Getting data on how well that works means venturing out into messy, uncontrolled contexts and struggling to collect data with definitions of the independent and dependent variables we hope make sense.  Kevin’s story is the first example of what we expect will be a series on adventures, lessons learned and maybe even ideas for formal controlled research inspired by case-study, semi-anecdotal data in the wild. – PJR]

Older posts «