Scroogeball

During my childhood the Minnesota Twins played in an arena nobody liked named after a politician of a similar reputation, the Hubert H. Humphrey Metrodome. The stadium, one of the last indoor multi-use stadiums to be built, forced supporters to spend hours indoors underneath the fluorescent white roof during the precious few months of good weather. This fact, along with the stadium’s other deficiencies and their lack of success following two World Series wins in 1987 and 1991, brought the team nearly to the brink of extinction in 2001, the same year the hero of those championship teams, Kirby Pucket, was inducted into the Baseball Hall of Fame.

All this made them the consummate underdogs, especially compared to their playoff nemesis the Yankees. George Steinbrenner seemed to be motivated each year to buy the best team he could. The Twins’ owner Carl Pohlad seemed to have the opposite desire in spite of being worth a couple more billion than his New York counterpart. In 2001 the Yankees spent $109,791,89 on team payroll, over four times the Twins’ total. That year the Twins payroll was dead last in baseball at $24,350,000, almost ten million dollars less than the next lowest team, the Oakland A’s.

Oakland, specifically the teams of this era, would become famous underdogs via the work of its general manager Billy Beane. At the turn of the millennium, he led the team to multiple 100+ win seasons on a fraction of the payroll that teams like the Yankees did to accomplish the same thing. This process, described in Michael Lewis’ book Moneyball and later in a film of the same name, was presented as a departure from the typical methods of player development and deployment. Because of their budget, the team was forced to find undervalued players to fill specific roles that, when placed together, seemed able to compete with the Yankees at a fraction of the payroll.

It’s an appealing story, especially for a fan of another small market team. But the fact that the very team chronicled in the book lost to my Twins that year, whose team lacked such a novel philosophy, could suggest it’s perhaps not as unique or effective as its outsized reputation.

Show me the Moneyball

Before he was general manager of the Oakland Athletics, Billy Beane was himself a consensus top prospect. In his book, Lewis describes how Beane and outfielder Darryl Strawberry competed for the attention of top scouts for the New York Mets, who had the first pick in the 1980 draft. Eventually, Strawberry was taken first overall by the Mets, who then used one of their later picks in the first round on Beane at 23rd. Beane’s major league career would be over before the decade was out. Strawberry, by contrast, played in the majors for 16 years until 1999, just three years before the action of Moneyball takes place and long after Beane had transitioned from the dugout to the front office.

This experience, especially the high praise lavished on him by scouts contrasted with his middling performance in the majors, left Beane with a skepticism of the traditional methods of prospect evaluation. For as much as this would define Beane’s approach to building his teams, it also reveals what would become a common trope for Lewis’ work: a fascination with those who appear to buck conventional wisdom, seeing a unique opportunity where others have not even thought to look. Like the financial titans he would skewer in The Big Short a few years later, the old fashioned world of the scouts Lewis portrays in his book seems parochial and out of touch, almost waiting to be shown up by someone with a fresh perspective and the commitment to their vision.

While this does make for a useful narrative arc for a book, I found myself wondering whether this was a useful rubric for understanding baseball, which more so than most sports is about big sample sizes and regression to the mean, especially over the course of a season or career. This statistical note is important to point out here because of what would become the executive summary of the Moneyball approach once it became the kind of bestseller that executives have to at least pretend to have read: Beane and the A’s pioneered a new way of running a baseball team by relying on statistics, rather than old fashioned gut feeling, to assess players better and succeed with a lower payroll.

For Lewis, Beane’s approach was the culmination of what Bill James had been begging teams to do for years. James sought to apply rigorous analysis to the game of baseball. This came to be known as “Sabermetrics”, so named for the Society of American Baseball Research (SABR). From humble origins, James created an audience for independent baseball writing and analysis that came not from watching a single team through the course of a season but by analyzing the entire league or in some cases the entire statistical record of baseball. He relished in challenging the consensus, from what makes players good to what makes a good team great.

Take the traditional focus on batting average as the indicator of a player’s offensive capabilities. For James, tallying up just the percentage missed two key facts:

Players can also get on base (and therefore score a run) via a walk, but these are not counted towards a player’s batting average
Extra base hits (doubles, triples, home runs) are far more valuable than mere singles, but they all count the same in a player’s batting average.

James, and Beane’s team at the Oakland A’s, were far more interested in an offensive player’s on-base percentage (OBP) and slugging percentage (SLG) than the more traditional batting average. These could be added together to calculate a new number, On Base plus Slugging or OPS, that would not only factor in how often the player reached on a walk but also capture how often they hit for extra bases.

This is on the simpler end of statistical analysis in baseball, especially nowadays, in part because the game itself lends itself so well to statistical analysis. It comprises a large number of discrete events that all begin and end from the same state: a pitcher throws the ball to a hitter. Only the team hitting can score runs and only the team in the field can record outs. There’s no clock, or there wasn’t then. Only the act of recording the 27th out or, more rarely, scoring the winning run as the home team, can bring the game to an end. Over the years many improbable things have happened that buck the expected outcome, but we only know how improbable they are precisely because there is so much data that demonstrates their improbability.

While Sabermetrics was controversial for some, it’s become an inescapable part of the modern game. Contact by a hitter can be measured in terms of its exit velocity and angle off the bat. Individual pitches can be assessed on how often it induces swings and misses or weak contact. A player’s season or career can be assessed in terms of how they performed in a particular situation but also how much they contributed to a team’s winning relative to the average available player, usually summarized as Wins Above Replacement (WAR). When Bill James started he had to manually compile the statistics he was interested in because they were often unavailable to the public and in some cases not even tracked by major league teams. Now this kind of information is available to anyone on a site run by Major League Baseball.

While statistics were one half of the Moneyball story, the other had to do with the labor market in baseball itself. If most teams were looking for players with high batting averages, they would be likely to undervalue a player with a low batting average who nevertheless managed to get on base often. For Beane and his front office team, who included other devotees of Bill James who had backgrounds in finance instead of on the field, this was an opportunity.

And this is precisely what they did, except the best exemplars of their method came not from their offense but for pitching. The three best starting pitchers for the Athletics in 2002 were Barry Zito, Tim Hudson, and Mark Mulder. They had two important things in common: all were relatively early in their career and all made less than a million dollars in 2002, in part because they were not eligible for free agency, which meant they couldn’t negotiate better contracts with other teams. That year each looked poised to become dominant starting pitchers, and for now the Athletics had them at severely below-market rate.

The performances of Zito, Mulder, Hudson for Oakland in 2002 at such low salary gave Beane the budgetary flexibility to replace the most talented players they had lost in free agency. The two highest paid players on their roster that season, David Justice and Jermaine Dye, were not uncut gems discovered by the team but established players. Justice was a good hitter but past his prime, who Oakland wanted because of his approach at the plate. Dye, on the other hand, was entering the prime of his career and would have led the team in OPS the year before had it not been for Jason Giambi, who had been granted free agency and signed with the Yankees after almost ten years in the Athletics organization. It just so happens that the combined salaries of Dye and Justice match almost exactly the going rate for top notch starting pitching that same year.

In 2002 Pedro Martinez led the entire league in ERA+, meaning he had the best ERA relative to the entire league average. He was just about to enter his highest-earning years as a player, earning $14 million in the 2002 season. Randy Johnson came after him, also at or around the peak of his earnings. If you watch highlights of either of these pitchers at the peak of their power, it would be easy to argue they are still underpaid. Even though neither of them competed in the World Series that year, both were among the leaders of their respective rosters in WAR, with Johnson followed by two more pitchers Curt Schilling and Byung-Hyun Kim.

That is how important good pitching is to success in baseball. Good relief pitchers protect even slim leads and convert them to wins. Good starters can set the team up to win lots of games in the regular season and form the bedrock of a successful postseason run. Pitching on short rest, it’s possible to only use your best pitchers in the playoffs, which is exactly what the Twins did when they played in the 2002 American League Division Series. Because Zito pitched in the last game of the regular season, he was unavailable for the first game of the series, which the Twins won behind an excellent bullpen showing by Johan Santana, J.C. Romero, and Eddie Guardado. Good pitchers at below the market rate is exactly what the 2002 Athletics had done and that was key to them tying the Yankees for 103 wins in the regular season. That is why it’s all the more remarkable that the team to beat them in the playoffs was the Minnesota Twins.

I would argue it was the moves Beane made to secure this trifecta of pitchers that had the biggest impact on their success. This is borne out by the WAR statistic as well, as they finished first, second, and fourth on the team in this category. The player that splits them in this stat, Miguel Tejada, features in the book as the antithesis of the kind of player Beane is after. That Lewis spends almost no time at all on these pitchers compared to the ups and downs of signing Scott Hatterberg or the drafting of Nick Swisher in the draft that year shows I think another blind spot of his when it comes to understanding baseball. To fill this spot in, we have to go back to the basics of the baseball labor market, which have been distorted in important ways since the earliest days of the game.

The Reserve Clause

From the very beginning of organized professional baseball, team owners wanted the best players as their exclusive employees for the lowest possible salary for the greatest number of seasons possible. Initially this happened through a reserve clause in the player’s contract that prevented him from negotiating new contracts with another team who might also want his services. Buttressing this legalese was an even more sacred gentleman’s agreement between the owners to not poach players from other teams. In 1890 the conflict came to ahead in the form of a player-owned and led league called, appropriately, the Player’s League, which competed for a single season before folding, taking with it the best chance to do away with the reserve clause until a few decades into the next century. (For a detailed history of the Player’s League, see this excellent essay).

Even when the highest salaries were measured in thousands and not millions, the basic rules of the labor market for baseball players functioned this way. As the years went on, the dynamic swung even more in the owner’s favor through open collusion to limit the number of teams and thus reduce the available roster spots in the major leagues, including specific protection from antitrust laws granted to them by a 1922 Supreme Court decision. With the advent of farm systems the notion of team control became even stronger, as player’s signed minor league contracts in the hopes of working their way up to the major league team, but only one team. Unless they were traded, in which case they’d move from Cedar Rapids to Waterloo and back again for a few thousand bucks whenever the manager wanted to trade them. This is what the A’s did with Jason Giambi and what the Twins did with one of the key players in their 2002 campaign, as we will see later.

I have simply had my heart broken too many times by the team to say anything positive about the Yankees, but when looked at in this light is there really something all that nefarious about Steinbrenner’s approach? Is it really so awful to simply outbid other teams by paying a player what they would ask for in exchange for instead of what the front office would prefer? Only very rich people can own a major league baseball team. And yet if a player leaves a beloved fanbase in exchange for more pay in a bigger market, he is often reviled as greedy when that anger might be better directed at the home team’s front office for not working harder to keep them.

In 2000 a group convened by then-commissioner Bud Selig published its findings under the title The Report of the Independent Members of the Commissioner’s Blue Ribbon Panel on Baseball Economics. They had been charged by the commissioner to investigate whether disparities in revenue and therefore payroll were destroying the competitive balance within Major League Baseball. The group consisted of conservative commentator George Will, former Federal Reserve Chairman Paul Volcker, Yale economist Richard Levin, and longtime Democratic Party Senator George Mitchell, who in just a little over a year would be appointed to lead the investigation into the 9/11 terrorist attacks. They concluded that indeed the economic divide between baseball franchises was perhaps the only one that demanded redistribution from the richer to the poorer.

Their recommendations included instituting a luxury tax on payrolls above a certain threshold and a draft where poor-performing teams could draft players from the organizations of more successful teams. Some of these were implemented in some form and some were not, but what I found most interesting is the assumption they state before plunging into their analysis of the game: “This report assumes that, year in and year out, player salaries and other costs of operating an MLB franchise ultimately will be borne by the fans of the game”. This means, they declare, that the interests of the fans are paramount, but it also means the capital of the owners must be preserved.

Lewis devotes a section of Moneyball to reveling in the living rebuke of complaints about competitive balance that is Billy Beane’s Oakland Athletics. Year after year they seem to compete or outcompete teams with much higher payrolls through their novel method and savvy approach to team and player development. And he is, as is often the case in his books, half-right: many teams who found themselves uncompetitive during this era had spent a lot of money on players who did not perform, or perhaps had too little revenue to justify big spending but no plan to win otherwise. But Lewis presents the unconventional approach of the A’s as perhaps the only solution to teams in their situation when in fact it is one of many options. Also, we now know from hindsight that it was not effective at winning the team a World Series or even an American League championship.

That Lewis went on to become a chronicler of America’s various financial crises from The Big Short to his forthcoming book about the collapse of crypto exchange FTX makes it all the better to examine the underlying financial and economic assumptions in Moneyball: that many professional players are overvalued relative to their contributions, that teams waste money by paying them high salaries when they could get by with less, and that small market teams were simply foolish for trying to develop players that could compete with better teams. Lewis was fascinated by Beane’s approach because it fit the same mold of unlikely heroes overcoming against all odds that he would go on to shoehorn into the history of the 2008 financial collapse with the Big Short.

This narrative arc is as pat as it is appealing, which is why as a used book store employee I have to assume Lewis has sold more nonfiction books than just about anybody. And it’s not that he’s wrong to say that Beane’s approach to running the team, which covered not just scouting and development but in-game decisions, wasn’t interesting or a departure from the norm or not worth writing about. But when you’re always interested in the guy who thinks he’s solved something lots of other people are also looking at too, it’s a recipe for tunnel vision. Which brings us to the team that would end the Moneyball season: the Minnesota Twins.

Scroogeball

In 2002 the Minnesota Twins had the lowest payroll in baseball, a few million dollars behind the Oakland Athletics. While Beane and the Athletics front office was putting together their unorthodox 2002 draft list, the biggest question both within and without the organization was over the very existence of the team.

In November 2001 the owners of Major League Baseball voted to eliminate two teams from the league. Though the teams were not named, it was widely speculated that the teams on the chopping block were the Twins and Montreal Expos. If I close my eyes I can still summon the hatred I felt as a ten year old for the men in suits, including Selig, who threatened to dismantle the Twins. When I visited the Hall of Fame with my dad last year to watch two more Twins be inducted into the Hall of Fame I, a fully grown adult, instinctively flipped off Selig’s bronze plaque before even realizing I was surrounded by kids.

The contraction was thought to be a way to eliminate the problem of too many non competitive teams. It was also speculated that the move was an attempt to gain leverage in the upcoming negotiations with the Player’s Association, with the vote coming mere hours before the existing agreement was set to expire.

Twins’ owner Carl Pohlad was eager to be bought out by his fellow owners after years of trying to get public funding for a new stadium to replace the dilapidated Metrodome, which even at its construction was a poor fit for baseball having been designed primarily for football. All through the winter of 2001 the league’s lawyers fought to overturn an injunction preventing the team from being disbanded issued by Hennepin County District Judge Harry Crump. For all its myriad faults, it was the Metrodome that may have saved the franchise. Crump ruled that the Twins must remain extant at least until their lease with the stadium expired in 2002.

The injunction halted negotiations over contraction plans, and eventually it was abandoned. This meant that the Twins would indeed play the 2002 season, but under a shroud of uncertainty. Just like the A’s were looking to avoid investing in the long term payouts of high school prospects in their draft picks, the owner of the Twins was hardly likely to invest heavily in a team he had just tried unsuccessfully to disband.

The three highest paid players on the 2002 Minnesota Twins were pitchers: starters Brad Radke, Rick Reed, and Eric Milton. 2002 was also the first year Johan Santana would see consistent time as a starting pitcher and he would lead the team in ERA+ and Fielding-independent Pitching (FIP), which tries to measure the pitcher’s performance independent of any impact that the players behind them have in turning hit balls into outs. Because of the rules of team control he was unable to leave the franchise and negotiate a better deal elsewhere, earning just over $200,000 that year. Though their payroll was low, the bulk of it lay in their starting rotation. This gave them the depth to win the division and ultimately a five game playoff series against the Athletics, though like Oakland they received a boost from young pitchers still under team control.

The highest paid position player for the Twins was All Star centerfielder Torii Hunter, a top prospect drafted almost ten years earlier. He was arguably the best defensive centerfielder in the league and would become one of the primary run producers along with fellow outfielder Jacque Jones and David Ortiz, one of the few players to be inducted into the Hall of Fame as a designated hitter for his performance with the Boston Red Sox.

Hunter is an interesting figure in this story because he embodies two things that would run counter to the Moneyball thesis. First, he was drafted as a top prospect right out of high school. This is the exact kind of thing Beane is celebrated for not doing in Moneyball, as he considers them both too risky and too far removed from contributing to the major league team. Lewis recounts Beane’s glee upon learning that so many other teams are fighting for high school outfielder Denard Span that he will be able to draft his preferred choice, infielder Nick Swisher, after all. “Eight of the first nine teams select high schoolers,” Lewis writes. “The worst teams in baseball, the teams that can least afford for their draft to go wrong, have walked into the casino, ignored the odds, and made straight for the craps table” (p.112). In Lewis’ treatment, picking players with greater upside but more risk and a delayed payoff is nonsensical. But if you look at the entire draft that year, you see it includes both players who never made it to the major leagues alongside Prince Fielder and future Hall of Fame pitcher Zack Greinke.

Secondly, Hunter’s best asset as a player was his superior fielding ability, which has always presented challenges for statistical analysis. Consider two plays: a flyout hit to almost exactly where the outfielder was already positioned and a leaping catch against the wall ,robbing the batter of a home run. Anyone who has watched a game knows that these two plays are vastly different, both in the skills required and the effect on the game’s outcome. But as far as the scorecard, and therefore the stats are concerned, these are the same. For many years the only notable fielding stat was errors, which measure what a player failed to do. Think of how confusing it would be if hitters were measured by the percentage of at bats they didn’t get a hit or pitchers by the amount of batters they didn’t strike out. An outfielder with many more errors than average was probably a poor fielder, but there was no way to identify great fielders statistically. But this was Hunter’s greatest and most consistent attribute as a player, both to the team in terms of wins and to fans in terms of entertainment (consult this or any other highlight reel if you don’t believe me).

This is what makes the Moneyball narrative a bit grating, especially with the benefit of hindsight. Lewis simply cannot help but valorize people who zig when others zag, despite the fact that sometimes it just makes sense to zag with everyone else. Teams commit to the long term payoff of a younger prospect instead of the more reliable performance of college players not because they are unaware of the risk but because of it. The Twins drafted and kept Hunter in the organization through his early, undervalued years and gave him his first big contracts as an established major leaguer. Many things could have gone wrong in that process, of course, but through Hunter’s own hard work and the patience of the front office, they didn’t, which meant Hunter was there to help the Twins beat the A’s in the first round of the playoffs.

Many of the high school players drafted ahead of the A’s picks in Moneyball never reached the major leagues, but if you look up many of the players that Beane gushes over in Moneyball, most of them never made it there either. And the fact that the A’s impressive regular season win totals during the years leading up to and following the seasons chronicled in Moneyball never translated to success in the playoffs, including their 2002 loss to the Twins, suggests that Beane did not unlock a secret recipe for success as Lewis portrays it.

If anyone deserved to have a bestseller made about an improbable team written about them, it should have been the Twins. Their defeat of the A’s is to this day their most recent playoff series win. Though they won another playoff game a couple years later, they currently have the longest active playoff losing streak in baseball. But the secret wasn’t a unique player analysis model. It was one of the oldest models in the book: labor arbitrage.

Conclusion

The 2002 MLB season is memorable to me personally because it very well might have been the last season of the Twins’ existence, but its inclusion in Moneyball has made even casual fans or people shopping at bookstores aware of it as well. And the fact that the team he praised for its low-budget success managed to lose to the only team with an even lower payroll is a fitting coda to an early version of Lewis’ renegade upstart stories.

That the A’s did not win a World Series using Beane’s method does not mean it wasn’t notable or interesting. That they lost to the Twins doesn’t mean Minnesota’s longtime General Manager Terry Ryan had the real secret to winning on a low budget either. Ryan made some successful trades for players along standard baseball lines, namely trading older players to teams who needed immediate contributions for prospects who would benefit losing teams in the long run. Cristian Guzman and Eric Milton were key members of the 2002 team acquired through a trade with the Yankees for Chuck Knoblauch a few years earlier. Ryan traded pitching prospect Jared Camp, who never made it to the majors, for Johan Santana in 1999. A.J. Pierzynski was traded to the Giants following the 2003 season for future shutdown closer Joe Nathan and ace starting pitcher Francisco Liriano. Those were great trades, but there were ones that did not work out so well too. That’s the nature of the game. They were not saved by the eschewing of old dogmas or the embrace of a new system, even though that inarguably makes for a better book.

Ryan also relied on the scouting of Mike Radcliff, who passed away just last year and was the paradigmatic old school scout of the kind Lewis spends the early chapters of Moneyball skewering. In this year’s draft the Twins lucked into the fifth pick and selected high schooler Walker Jenkins, who was the subject of one of Radcliff’s last scouting reports (he gave him the highest possible marks) and is currently one of the highest rated prospects in baseball. Radcliffe found many great players for the Twins and likely recommended some that didn’t pan out. That’s the nature of scouting. If I had to guess, he likely relied on a mix of statistical analysis and experience, which in the end is all anybody making these decisions has to go on in the moment.

Look through the 2002 statistics and you will be hard pressed to find the Twins at the head of any category, either individually or as a team. For that matter, you would have hard time finding the Athletics at the top of many categories either, despite their notable obsession with stats like on base percentage. The fun of examining these things in hindsight is that you know for certain what happened, so all that’s left is to test different theories as to why.

Did the Angels win the World Series because they struck out the least of any team? Probably not considering the Kansas City Royals had the next fewest strikeouts and they did not make the playoffs.

Did the Giants reach the World Series because they had Barry Bonds, who led the league in most offensive categories as well as overall WAR? That’s more persuasive considering he is one of the best hitters over the last few decades, performance enhancements notwithstanding, but comparing the individual stat leaders to the outcome of their team’s season introduces plenty of noise along with whatever signal is being transmitted.

Teams have won lots of games with low payrolls and teams have emptied their pockets completely and failed to make it to the postseason. This year, Oakland has the lowest payroll in baseball to go with one of the worst records, with many speculating they will be moved to Las Vegas. The fans have had to organize months-long protests against ownership to keep the team where it’s been for decades. The teams with the second and fourth lowest payrolls, Baltimore and Tampa Bay, have been competing for the best record in the most competitive division for much of the season. On the other end, both New York teams had the two highest payrolls and neither are likely to make the playoffs, along with fellow big spenders the Padres and Angels.

Baseball, like many things in life, would be easier if there was one weird trick to accomplishing what you want. Those running teams have not stopped looking for a competitive edge, but their options are constrained by the reality facing each team, whether that be the need to trade the few good players on your crappy team for future value or the short term payoff of filling a roster hole in exchange for watching a prospect become great for another team. If the Moneyball system really was the silver bullet it’s made out to be, then every team would be doing it, but they aren’t for one simple reason: you still have to play the game to find out who wins.

Checking the COMPAS: An Open Records Analysis

In 2004 Tim Brennan, David Wells, and Jack Alexander authored a report for the National Institute of Corrections titled Enhancing Prison Classification Systems: The Emerging Role of Management Information Systems. The report was commissioned through a contract with Northpointe Institute for Public Management of Traverse City, Michigan, founded by Wells and Brennan in 1989.

The report’s goals were to explore how then-recent advances in networked computing technology could improve the efficiency of classifying those in prison by risk level. “Basically,” they write in their executive summary, “current methods of prison classification are underutilizing this information technology infrastructure. The vast memory and analytical power of today’s hardware and software offer great potential for improving classification decisionmaking” (page xix).

Brennan et al. describe the work of classifying prisoners as “knowledge work”, as it involves prison staff compiling data from various sources and analyzing it using “implicit mental models and explicit algorithms”. Networked computers could improve productivity of those classifying prisoners by automating portions of the data collection process. They could also allow for more rapid classification of prisoners by prison staff, identifying potential trends or factors that might predict a person’s likelihood to commit new criminal offenses more quickly and accurate than human evaluators, or so they claimed.

These technologies have advanced much further in the intervening years, and Northpointe has offered their services under contract to multiple states, including the Wisconsin of Department of Corrections (WIDOC). They entered into a contract to provide their Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) tool to the WIDOC in 2015 and have remained under contract until at least FY2022. I requested records related to this contract and, after a little under a year, I received them. I will provide these for download in full at the end of this post, but because the records were quite extensive and this is an often overlooked aspect of the US prison system (imo), I thought it would be useful to provide a preliminary summary and analysis.

This issue is also of interest to me in part because it resembles at least in broad strokes the kind of work that I do for libraries, namely sorting, classifying, and describing all manner of published material from zines published in the 1980s to monographic series written by subject matter experts for a specialized, technical audience. This is done according to a dizzying array of description standards and metadata schema (each with their own acronym, of course), all of which must be processed and manipulated by the vendor-provided software where much of this work takes place. Because of this, I appreciate the vast improvements in efficiency that networked computers represent when it comes to classifying, storing, and retrieving information. However, as my e-mail inbox can attest, these technologies are not without their problems.

In my work the potential downside of misclassifying or misdescribing something is often minor. Perhaps a student does not find all the relevant books held by the library on a subject for their class assignment or a researcher is unable to find a specific work they are looking for even though it is held by our institution. There are more serious issues when it comes to the description of marginalized groups in libraries, such as the classification of literature from various parts of Africa alongside the literature of those countries that colonized them (See Classifying African Literary Authors). There are other examples related to other groups, but these hardly invalidate the inherent convenience of computerized library catalogs compared to their printed predecessors. In the case of prisoner classification systems, the risks are much greater: people may be kept in prison longer or required to take intensive psychological treatment programs that aren’t appropriate.

Northpointe acknowledges these risks themselves in their original bid for the contract. One of the risks of classifying prisoners, according to Northpointe, is the risk of “prisoner ‘deterioration’ and ‘prisonization’”. Though they do not define these terms specifically, the text which follows gives a telling suggestion:

These risks have serious consequences for both the institution (idleness, discipline problems) and the community (high recidivism, more alienated offenders). These risks are more likely if prisoners are simply “warehoused” and if the prison fails to match the inmates to needed programs to prepare for reentry to the community.
(Attachment B, page 133)

There is also the risk of litigation stemming from “custody classification errors” or the public relations problems stemming from “high profile crimes” being linked to early release or parole decisions. Conversely, “if agency policies and procedures adopt overly restrictive classifications styles,” they write, “more systematic ‘over-classification’ errors occur … escalating overcrowding”. Combine enough of these errors and it can lead to a “loss of public trust in agencies ability to distinguish high risk from low risk offenders” as well as a “failure to rehabilitate”, which produces “on-going cost escalations” and, again, the potential for costly litigation (ibid., page 133).

So how are these decisions made? Northpointe is rather unambiguous in its assessment of their competition: most systems in prisons and jails “were not developed statistically and have minimal or unknown levels of predictive accuracy”. Their COMPAS system, by contrast, was developed to improve on these efforts so that more information could be gleaned about a particular prisoner as soon as they enter prison custody. Because they market the service to both courts as a way to assess risk after arrest but before trial as well as to prison officials assessing whether someone can be released prior to the completion of their sentence (e.g. on parole), accurate predictions made using existing or minimal additional data is of the utmost importance. “The aim,” they write, “was to use data that was currently available at the earliest stage for new incoming prisoners. These data include the offender’s criminal history and selected demographic and other criminogenic factors” (ibid., page 135).

Through an analysis of “a large [Michigan Department of Corrections] database”, Northpointe claims to have found a number of these “criminogenic” factors that were statistically correlated to “the commission of new infractions”:

Age at first arrest
Age at assessment
Number of probation revocations
High school graduation
Criminal thinking
Educational vocational resources
Number of mental health commitments (ibid., page 135)

Unfortunately they do not provide a definition of “criminal thinking” in this document, though we will return to the subject later. “Age at first arrest” is also problematic, as being arrested is not the same as being found guilty and yet there seems to be no distinction made here. Furthermore, whether someone is arrested is often the decision of police officers rather than any kind of automatic process.

Nevertheless, “using both training and validation samples,” they write “and two separate statistical methods (Logistic regression and Random Forests), the above six [sic] factors formed an ‘optimal sub-set’ of factors for predicting new infractions.” That they misstate the number of factors when describing their own model in their own document makes me hesitant to take their numbers at face value. In any event, according to Northpointe’s submission their analysis found that both the logistic regression and random forest models had an accuracy of around 70%, meaning it correctly predicted whether additional disciplinary infractions occurred based on these factors (ibid. page 136).

In studying to be a librarian I took a class on data mining. One of the most important lessons our instructor hammered into us throughout the course is that creating the dataset will often comprise over 90% of the work of any given machine learning project. Computers asked to develop a model to predict the likelihood of an outcome will always give you an answer. They will never tell you that there is not enough data or that you are analyzing the wrong kind of problem with a particular method. A computer cannot tell the difference between data representing weather patterns and crop yields or that conveying the results of a psychological questionnaire. It is up to the people both creating the data and evaluating the results to determine how reliable a given prediction is for a particular context.

One of the projects we did in that class involved analyzing genetic sequencing data in an effort to identify potential connections between specific genes and specific attributes in a given animal. For this assignment we were paired with another person in the class and asked to prepare and then analyze a genetic dataset and provide a short impromptu presentation of our findings for the class, mainly to demonstrate we’d chosen an appropriate analysis type and prepared the dataset correctly. I happened to be paired with a woman pursuing her PhD in Animal Sciences. When I said that I found these datasets a little confusing because I lacked a background in genetics or biology, she informed me that when researchers in her field investigated some of the predictions made by models like the one we were practicing with they often found that the predicted connections were often faint or nonexistent.

As interesting as this class was, I am humble enough to recognize the limitations of my knowledge in this area. Luckily because Northpointe has successfully implemented versions of its COMPAS software in other states, their work has attracted the attention of experts in this area. In 2007 Skeem and Louden of University of California-Davis conducted an independent assessment of the COMPAS tool then in place and found it significantly lacking.

“The strengths of the COMPAS”, they write in their conclusion, “are that it appears relatively easy for professionals to apply, looks like it assesses criminogenic needs, possesses mostly homogenous scales, and generates reports that describe how high an offender’s score is on those scales relative to other offenders in that jurisdiction. In short, we can reliably assess something that looks like criminogenic needs and recidivism risk with the COMPAS. The problem is that there is little evidence that this is what the COMPAS actually assesses” [emphasis in original] (Skeem & Louden, page 29).

In addition to critiquing specific factors added to COMPAS, Skeem and Louden also cast doubt on whether COMPAS does actually predict recidivism. “In our view, the reader must wonder why the COMPAS produces no single “risk” score that can be evaluated by independent investigators. Instead, the authors create various ‘Risk Scales’ that change from evaluation to evaluation, and often combine parts of the COMPAS with other variables” (ibid., page 29). In addition to this lack of data, the authors also highlight the fact that there is no evidence COMPAS actually adjusts to changes in criminogenic factors over time. This is particularly important for use in prisons if, for example, completion of treatment programs is considered a factor for someone being granted parole. Because of these issues, Skeem and Louden state that they cannot recommend the COMPAS for application to individual offenders within the California Department of Corrections (ibid., page 6).

From what I can tell, that is precisely how it is being used by WIDOC.

A paper published by Northpointe staff in 2009 responded to these critiques to defend their product and its application in a response paper. It is no longer available on their website, but thanks to the Wayback Machine I was able to download a copy. It opens with a telling acknowledgement: “most of the evidence for the reliability and validity of COMPAS is found in the results of in-house research studies conducted by Northpointe across a variety of jurisdictions and states” (page 2). That is to say, the evidence purporting to show the efficacy of their tools is based on internal data not shared with anyone other than perhaps the agency with which they are under contract. Later on in their response the authors highlight the fact that peer-reviewed papers on COMPAS have subsequently been published, but the citations given for both of these were authored by the same people who authored this response paper. Neither share the underlying data used in their analysis.

They claim that because agency personnel do have access to this data that their analysis “are often subjected to a more thorough vetting than that provided by the editors or peer-reviewed journals” (ibid., page 2). However, it’s important to remember that these agencies are contracting with companies like Northpointe precisely because they do not have the ability or desire to develop their own tools for this kind of analysis. For example, I found no independent analysis by WIDOC staff in the records responsive my request, which did include specific mention of any meeting minutes or deliberations related to the bidding process.

Herein lies the limits of technological “efficiencies” in addressing inherently social problems like crime, punishment, and justice. Vague variables like “criminal thinking” provide ample room for clinical and correctional professionals to conclude that, for example, someone describing the effect of larger social forces on the circumstances of their crime is demonstrating a lack of remorse or unwillingness to accept responsibility for their actions. This is not hypothetical, as we will see later.

Consider one of the features COMPAS claims to aid in analyzing: an “inmate’s behavioral adaptation to prison” to determine if, for example, they could be moved to a less restrictive prison or become eligible for things like work release. Northpointe lists the factors their system uses to determine these “behavioral adaptation ratings”:

Cooperation with staff
Respect vs. Disrespect to staff
Completion of work tasks
Program successes vs. failures
Defiant
Aggressive to staff
Tries to Con staff [sic]
Troublemaker with other inmates
Victimizes weaker inmates
Quick Temper Etc.

Using this scale, they found “floor officers can reliably assess an inmate on several key behavior dimensions within minutes using this short checklist (e.g. less than 4 minutes)”. They caution that whoever is performing this analysis should know the inmate well enough to provide “a reasonably fair assessment” of their adjustment. These criteria are further refined in order to classify inmates into a number of “behavioral classes”. “The results are very encouraging and we found that the ‘inmate classes’ were validly linked both to prospective disciplinary levels, criminal history patterns, and also to several main criminogenic factors (e.g. criminal personality, criminal attitudes” (Attachment B, page 139).

I think it’s important to focus on a couple aspects of these adaptation criteria. Firstly, the emphasis on how quickly they can be completed by staff. There is little attention paid to establishing any guidance for how long prison staff should know a particular prisoner in order to make these assessments. This is apparently left to the institution or staff themselves to decide. There is precious little discussion of how prison staff supervising those completing these analyses can check the work of their subordinates, though naturally Northpointe does include costs for “training the trainers” in their bid (approximately $11,000 in the most recent contract renewal).

Secondly, the criteria are almost exclusively focused on interactions towards prison staff rather than the thoughts, emotions, behaviors, or actions of the prisoner themselves. When I would visit WIDOC prison I witnessed numerous staff members get visibly angry to the point of shouting at other prison visitors, including the elderly and small children, for very minor issues including moving beyond a taped line on the floor while waiting to be processed or failure to notify the prison in writing in advance that they would be using a wheelchair. Anecdotes are anecdotes, but personally these are not the kind of people I would want making snap judgments about my behavior (“less than 4 minutes”) that could determine whether I spend another year or more in prison.

One might argue that these are implementation problems as opposed to methodological flaws, but the emphasis on staff interaction shows that these criteria have little to do with characteristics of the prisoner and more to do with the attitudes of staff towards prisoners. Of course these factors are not completely absent, but as these bidding materials show even when they are included it is not without issue. In their bid Norhtpointe included some sample COMPAS reentry narratives and bar charts to demonstrate how the tool can be used to evaluate an individual’s risk. Here is a sample:

Bar chart showing sample from COMPAS system showing the re-entry risk factors for a sample prisoner. Provided by Northpointe as part of its bid materials.

These are accompanied by a narrative assessment that is meant to elaborate on what some of these factors mean, though confusingly they do not map exactly onto what is being shown in the bar chart. For example, while criminal history (both personal and familial), mental health, substance abuse, and ReEntry Vocation/education are present in both the narrative and the bar chart, all of the factors shown in the chart under Personality/Attitudes are reduced to a single section in the re-entry narrative as “Cognitive Behavioral/Psychological Score”, which in this example shows a score of 10 or “highly probable”. The section of the sample narrative assessment it where a “Cognitive Behavioral/Psychological Statement” could be is literally left blank.

There are training materials for how these COMPAS scales should be completed by prison staff, but one of the supposed benefits of the COMPAS software is that it can be customized to fit a variety of criminal justice settings, from pre-trial release to probation and parole decisions. They suggest that a subset of criteria be used to “triage” all offenders within a probation agents purview, with the full scale used for only “higher risk offenders” (“Meaning and Treatment Implications of COMPAS Score”, page 4).

This slide deck also sheds some light on what is meant by “criminal thinking”, which is apparently determined using the questionnaire shown below:

Sample questionnaire used by COMPAS to assess a person's criminal thinking. It includes three sections describing how the scale is measured, notes and treatment implications, and sample scale items

Consider statements like “A hungry person has a right to steal” or “The law doesn’t help average people”. If I were asked for my reaction to these statements I would almost certainly strongly agree. Apparently this means I may be in need of “cognitive restructuring”. If you want a definition of what that means you will have to file an open records request of your own.

Here we should return to where I opened, with Brennan et al. expressing a desire to use networked computers to improve the efficiency and effectiveness of classifying those in criminal custody. The desire to reduce such an inherently complex question as “will someone convicted or accused of a crime commit another crime in the future?” to a set of numbers s implicitly linked to a desire to outsource more and more of this work to computers. After all, computers are indeed better able to evaluate a set of numerical variables to predict a given outcome than a human would be if they attempted to do the same calculations by hand. Appeals for more data by policymakers are usually a request for more numbers to be analyzed, as opposed to non-numerical kinds of data such as oral or written personal histories or the notes from a psychological evaluation conducted by a licensed therapist. Furthermore, those analyzing the data often have more incentives to keep people incarcerated (or at least disincentives for release) than they do moving the opposite direction, and this will invariably color decisions at either the institutional or systemic level.

In spite of these problems, the contract with the WIDOC has been lucrative. In their request to extend their current contract, Northpointe, now called equivant, gives a cost for the licenses and hosting fees for COMPAS at over $930,000. The total cost, which includes project management, technical support, and training, comes in at over $1 million ((“Exhibit_A_-_WI_DOC_Contract_Renewal_Price_Proposal_FY22”). As is common with government contracting, this is far above the cost submitted with the initial bid. In their original cost proposal, Northpointe estimated that the cost over the life of the contract (up to 7 years) to be somewhere between 2 and 3 million dollars in total (“Northpointe_Cost_Proposal-Options_1_2_with_Notes.pdf”).

This brings me to my final point regarding what ultimately led me to request these documents in the first place. Services like COMPAS purport to improve the efficiency and effectiveness of prison operations, but in reality they often reinforce existing systemic issues while also providing plausible deniability in the form of a seemingly objective numerical rubric by which incarcerated people are evaluated. It is much easier for a DOC official to justify keeping someone in prison for any reason if they can point to a score on a chart to demonstrate instead of defend the decision solely on their own words and judgment. I’ve heard from others with loved ones in the WIDOC that they have faced many hurdles trying to request copies of COMPAS evaluations regardless of whether the person incarcerated has given their personal permission.

These systems have a clear impact on the persistence of mass incarceration because of their use in determining when and if someone is released. While the war on drugs and over zealous prosecution have been correctly highlighted as leading to mass incarceration, an often overlooked factor is the length of sentences and the difficulty of being released on parole or other forms of supervised release. Interrogating how systems like COMPAS are used by prison and jail administrators can hopefully address this issue and I hope that by making these records available I can aid in that effort.

There is much more to be found by going through these documents, certainly more than I could hope to cover in one post. There are materials from other vendors who also sought this bid, additional training and sample materials from Northpointe, and specific documents related to how these services operate in women’s prison or those for juveniles. They can be browsed or downloaded using the link below.

COMPAS-ORR

“All experiences are real”: A Deep History of Havana Syndrome (Part I)

Every Tuesday in my fifth grade science class our teacher asked us to share any recent animal sightings. If you claimed to have seen something that you didn’t recognize, she’d invite you to look through a small library of wildlife guides to see if you might be able to identify it.

For someone like me, who did not particularly like science class but did like books, this was a great arrangement. I could spend 15-20 minutes of a 40 minute class period looking through books with color illustrations of wildlife instead of filling out worksheets. All I needed to do was come up with an animal that matched the made-up description I had given her at the beginning of class.

Over time, my reports of increasingly exotic birds in and around Minnehaha Creek did start to raise suspicions, and eventually she determined that my reported findings were not of scientific use to the class.

Though my teacher eventually did get wise to this arrangement, medical doctors examining US diplomats do not seem as confident expressing skepticism. One person familiar with how the incidents are now being treated said in a December 2021 Washington Post article that “the fundamental statement we use is that all symptoms are real, all experiences are real”.

As Bartholomew and Baloh lay out in their book on the subject, the most plausible explanation for the Havana Syndrome phenomenon is mass hysteria. Their book, however, focuses on the chronology of Havana Syndrome events and the history of mass hysteria. Informative as it was, I could not shake the feeling that something deeper was informing how and why this was unfolding in the way that it did.

For starters, there was the assumption that old Cold War enemies must be somehow involved, even if it defied the logic of diplomacy and the laws of physics. Though easy to dismiss as just another invocation of one of America’s Official Enemies, it’s an interesting detail given that this new mass psychogenic illness seems to primarily affect US government personnel and their families working overseas.

There was also the repeated complaints from those afflicted that their illness was not being taken more seriously. This has won them many champions in Congress, who in the midst of a pandemic passed the HAVANA Act, which was designed to make sure that those being attacked by these mysterious symptoms could receive treatment for their injuries. The bill was passed unanimously in both the House and Senate and signed into law on October 9, 2021.

Mass hysteria is both an individual psychological ailment and a contagion that can only exist in groups. It often occurs during a time of heightened stress, manifesting itself in individual cases but depending on certain group dynamics to spread. To understand Havana Syndrome then requires understanding what was going on among US officials stationed in embassies in late 2016 and early 2017 that would cause such a reaction.

Trouble in Foggy Bottom

On February 11, 2017, the New York Times ran an article with the headline ‘A Sense of Dread’ for Civil Servants Shaken by Trump Transition based on interviews with employees of many different agencies.

In a grim preview of his rehabilitation into a #Resistance hero, one EPA official said that while some had “bristled” at the industry-friendly regulations enacted by George W. Bush, “at no point did they feel the alarm they do now.”

On May 17, 2017 Acting Attorney General Rod Rosenstein appointed Robert Mueller as Special Counsel to investigate possible coordination between the Trump campaign and Russia in the 2016 election. This was the culmination of speculation that had begun before his victory, but its immediate impetus was Trump’s firing of FBI director James Comey on May 9.

It’s not necessary for me to recount the story of his investigation because it was everywhere. In the end, as often happens, most of the people who followed this investigation closely saw more or less what they wanted to see. At no point was this clearer than upon the release of Mueller’s much-anticipated report. To those waiting for a very Mueller Christmas, the report, along with the various indictments that came out of the investigation, vindicated their suspicions at the precise moment that Trump himself was lauding Mueller’s findings as a total acquittal.

On November 24, 2017, the New York Times ran an article headlined Diplomats Sound the Alarm as They Are Pushed Out in Droves. It described a simmering conflict between then-Secretary of State Rex Tillerson and senior staff in the State Department. Even before he had been confirmed, Tillerson’s staff fired a number of senior foreign service officers and froze most new hiring. Tillerson made no secret of his belief that the department was too large and hoped to eliminate almost 2,000 positions. The article closes with a quote from Dana Shell Smith, former US ambassador to Qatar:

“These people either do not believe the U.S. should be a world leader, or they’re utterly incompetent,” she said. “Either way, having so many vacancies in essential places is a disaster waiting to happen.”

The head of the American Foreign Service Association (AFSA) Barbara Stephenson offered a similar assessment. In the December 2017 issue of the Foreign Service Journal, the official publication of the AFSA, she titled her statement for that issue as Time to Ask Why:

[T]he need to make the case for the Foreign Service with fellow Americans and our elected representatives has taken on a new urgency. The cover of the Time magazine that arrived as I was writing this column jarred me with its graphic of wrecking balls and warning of “dismantling government as we know it.”

While I do my best, as principal advocate for our institution and as a seasoned American diplomat, to model responsible, civil discourse, there is simply no denying the warning signs that point to mounting threats to our institution—and to the global leadership that depends on us.

By firing long-time employees and leaving many high level positions unfilled, the new administration seemed to be cutting off the head of an institution that operates on what Stephenson referred to in her 2017 testimony as the “up-or-out principle”. This means, roughly speaking, that employees must continue to advance in the organization or face dismissal.

That this is the tone of public comments among State Department personnel suggests that there was a lot of tension internally. Conflict between the State Department staff and those heading the agency had crossed from a political disagreement into an existential struggle.

Havana Syndrome Emerges

On May 23, less than a week after Mueller was first appointed, two Cuban officials stationed at the recently reopened Cuban embassy were asked to leave the US, an act of diplomatic retaliation. Unlike the Mueller investigation, this act was not widely publicized. In fact, the State Department only revealed it a few months later at a briefing on August 9.

That same day, the Associated Press published an article where anonymous officials suggested that the diplomats had been “exposed to an advanced device that operated outside the range of audible sound and had been deployed either inside or outside their residences”. This is the first mention of any kind of device being responsible for Havana Syndrome.

In addition to these mysterious events, US-Russia relations were concerned with a more overt form of diplomatic tit-for-tat. On August 2, 2017 Trump signed additional sanctions against Russia, Iran, and North Korea into law. Not long after, Putin announced that the US would have to reduce the size of its diplomatic mission in Russia. When asked about this at his New Jersey golf club, Trump told the reporters

“I want to thank him because we’re trying to cut down our payroll, and as far as I’m concerned, I’m very thankful that he let go of a large number of people because now we have a smaller payroll,”

The remark drew swift and stern condemnation from members of the foreign service. Ironically most of the positions eliminated were likely held by Russian nationals, but to former ambassadors like Nicholas Burns, the remark “justified mistreatment of US diplomats by Putin”.

The State Department held its first dedicated press briefing on the subject of Havana Syndrome on September 29, 2017. At this briefing they announced that all non-emergency personnel assigned to the US embassy in Havana would be returning to the United States with their families. The State Department said that this was in response to “attacks” instead of “incidents” as they had been described before.

Among the effects of these mysterious attacks were “ear complaints, hearing loss, dizziness, tinnitus, balance problems, visual complaints, headache, fatigue, cognitive issues, and difficulty sleeping.” When asked if anybody else at the hotel where some of these attacks supposedly occurred had experienced similar symptoms, one of the unnamed State Department officials gave an interesting answer:

[W]e’re not aware of any hotel staff or other individuals who have been attacked or suffered these systems [sic] beyond the U.S. Government personnel at the hotel. And in terms of our Cuban staff at the embassy, we’re not aware of any incidents involving them or attacks involving them. The victims that we’re aware of are the 21 U.S. Government personnel.

The State Department used the fact that this only seemed to affect US government personnel to conclude that they must be the result of some kind of attack. However, this could also be explained by mass hysteria, since symptoms often travel within particular social groups, in this case US officials.

It’s worth mentioning now that “Havana Syndrome” is a misnomer. A syndrome refers to a group of symptoms that all occur together which suggest the presence of a disease. This is not what we see among Havana Syndrome patients. Not only do the potential exposures vary in location and duration, but the symptoms themselves are not consistent from person to person. This also suggests mass hysteria.

In December 2017 Marc Polymeropoulos took a trip to Moscow. According to an interview with The World, he was there to meet the ambassador and embassy, routine for a longtime CIA intelligence officer. Then, it struck:

it was on the night of Dec. 5, I woke up to a start. I had vertigo. I had a terrible headache, tinnitus, which is ringing in my ears — something really, really traumatic had happened to me. I had been in Afghanistan, and Iraq, and other places. I served over three years after 9/11 in war zones. I’ve been shot at. I put myself in harm’s way. But this was the scariest moment of my life. And so I knew something terrible had happened. I made it through about 10 days with the symptoms on and off. I came back to the United States and then the symptoms got particularly awful. And about March, April of 2018, to the point where I couldn’t work anymore. And after really seeing numerous doctors and undergoing just this incredible journey of trying to find out what happened, I, you know, I couldn’t drive for a while, I lost my long-distance vision. And so, ultimately, I had to retire from the CIA in July of 2019.

Later in the interview, Marc speculates that Russia must have been involved. Unfortunately he does not explain how whatever attacked him in Moscow followed him back and in fact got worse after returning to the United States.

A CIA Officer Visits Moscow, Returns With Mysterious, Crippling Headaches : NPR — *Marc Polymeropoulos in Moscow 2017 (NPR)*

The Havana Syndrome phenomenon then spread to other geopolitical hotbeds. In June 2018, the New York Times reported that diplomatic staff in Guangzhou, China were being sent home following similar reports as those from the diplomats in Cuba, including Mark Lenzi. From their article:

Mr. Lenzi said that over the past year he and his wife had experienced similar physical symptoms, including headaches, sleeplessness and nausea, and on three or four occasions they heard odd noises, though they did not put them together until the disclosures last month.

The footnotes of the Mueller report show that Lenzi was actually interviewed as part of the investigation on January 30, 2018, not long after he first started showing symptoms (p.133). The subject of their interview was Konstantin Kilimnik, whom Lenzi worked with at the International Republican Institute’s (IRI) Moscow office in the early 2000s. At the time, according to Lenzi, Kilimnik was fired from the IRI because of his close ties to Russian intelligence, though another IRI official seems to have remembered it differently.

In addition to his foreign service positions, Lenzi was a deputy spokesman for John McCain’s 2008 presidential campaign and then worked for the New Hampshire Republican Party. In 2016, he told a local New Hampshire news station that despite his party affiliation he would be voting for Hillary Clinton:

Lenzi, 42, said that after working on NATO-related issues, he was dismayed to read Trump’s criticisms of NATO. Trump told the New York Times this week that as president, he would condition U.S. support of NATO allies on whether “they fulfill their obligation to us.”

“It’s not just that Trump is pro-authoritarian. He goes against what I was trying to with the International Republican Institute. There is a palpable fear in these countries about him becoming president.”

“As Trump went after mentors of mine personally – John McCain and Lindsey Graham — and opposes the principle I’ve worked for overseas.”

More recently, Lenzi has filed a lawsuit against the State Department over workplace discrimination after seeking accommodations under the Americans with Disabilities Act. The accommodations, which include reduced workload and the ability to use tinted sunglasses, were granted, but he alleges that the State Department began reassigning him to jobs below his qualifications after being afflicted by Havana Syndrome.

The filing provides a more detailed glimpse into the internal process of evaluating these claims and demonstrates the power of social networks in spreading psychogenic illness. Beginning in November 2017, while working as part of diplomatic security, Lenzi says he and his family started experiencing “headaches, lightheadedness, nausea, nosebleeds, sleeplessness, and memory loss,” which he reported to his superiors. In April, unbeknownst to Lenzi, another employee, whom Lenzi described as his “closest American neighbor” (it’s unclear if they were literally neighbors), was sent home from Guangzhou to be evaluated at the University of Pennsylvania.

Not long after the State Department sent out a technician to evaluate the living quarters of the employee who had just been evacuated. As Lenzi was working in diplomatic security, this technician needed to go through Lenzi to obtain the necessary equipment. According to Lenzi, the person evaluating the room was not really doing a thorough job and so he reported his concerns to his supervisor. His supervisor apparently did not share Lenzi’s concerns and told Lenzi that he was “being too emotional about an equipment issue.”

*Mark Lenzi maintains emotional control when discussing equipment issue with 60 Minutes CBS News*

In May 2018 the Consul General held a town hall for employees and informed them that the technician had found nothing out of the ordinary. Lenzi was upset. A few days after this meeting, he got in touch with his former “closest American neighbor”. From the filing:

On or about May 26, 2018, Mr. Lenzi contacted his former neighbor, who had been medevac’d to University Pennsylvania Hospital in Philadelphia. Mr. Lenzi informed her of the numerous symptoms he and his family had been experiencing over the past six months. After Mr. Lenzi described their short-term memory loss, Mr. Lenzi’s former neighbor stopped him and said that Mr. Lenzi needed to get himself and his family out of their apartment “right now.” ” She went on to say that she had pleaded with the State Department on three different occasions to inform and get her American consulate neighbors out of the Tower 7 Apartment Complex in Guangzhou, but that each time the State Department did nothing.

One day later he sent an email warning his diplomatic colleagues of these incidents. Eventually he and his family also left to be examined at the University of Pennsylvania’s Center for Brain Injury and Repair, which has examined multiple State Department employees claiming to suffer from Havana Syndrome.

About a year after he left China, doctors at the University of Pennsylvania published a study titled “Neuroimaging Findings in US Government Personnel With Possible Exposure to Directional Phenomena in Havana, Cuba” in the Journal of American Medicine. The article does not mention any of the symptoms experienced by the cohort. Instead they compared brain images to find “potential differences in brain tissue volume, microstructure, and functional connectivity in government personnel compared with individuals not exposed to directional phenomena.”

This study was widely cited as being proof that employees had suffered “brain damage” and therefore could not be suffering from any kind of mass hysteria. However, the study had a number of important limitations:

the analysis involved a small sample with high heterogeneity, compounded by clinical (as opposed to research) neuroimaging acquisition. In the absence of a common clinical severity score due to varied symptomatology, the cohort cannot be subdivided. … Additionally, it cannot be determined whether the differences among the patients are due to individual differences between patients or differences in level and degree of exposure to an uncharacterized directional phenomenon.

Not only did the symptoms from patients not match, but the samples involved were both small and highly varied. Therefore this analysis is only capable of showing differences between the group of diplomats studied and the reference cases as two separate groups. Nothing detected in the analysis could explain specific differences in brain matter between individual diplomats, to say nothing of whether or how an “uncharacterized directional phenomenon” might have caused them.

Though it contains lots of descriptions of his experiences with Havana Syndrome, the substance of Lenzi’s suit is employment-related. One of his chief complaints is being denied job placements overseas after reporting his symptoms. Recall that his symptoms began at the end of November 2017, about two months from when the US evacuated many of its employees from the embassy in Havana and at the same time the New York Times was reporting about conflict between career foreign service employees and Secretary of State Rex Tillerson over unfilled positions. Not long after, the head of the American Foreign Service Association warned in their official publication of the dire threats facing the country and their members in particular.

Early in their book, Baloh and Bartholomew dismiss a common misconception with respect to mass hysteria or mass psychogenic illness. Those suffering from these ailments are likely not self-consciously faking their symptoms or experiences. Instead the illness is itself a reaction to someone’s surroundings, both social and physical. In the late 19th century there were reports of vague illnesses associated with telegraph lines and steam engines as the devices came to play a larger and larger role in people’s lives. Often how the disease is described and spread can reveal deeper issues that might not be clear even to those reporting the symptoms.

In this two-part essay I argue that Havana Syndrome is itself symptomatic of a crisis of strategy and identity that most acutely affects people at the highest levels of US foreign policy. Though in some ways precipitated by the chaos of the Trump years, it has its roots in the inconsistencies and convenient fictions that formed the foundation of US policy and informed the structure of the US state itself during the Cold War.

Building on the massive mobilization during World War II, the US underwent a significant reorganization of its defense and intelligence apparatus based on the belief that they were engaged in an ideological death struggle. Assumptions made about the USSR and Russia in these early days would develop into institutional reflexes for US diplomats and intelligence operatives. These would form the basis of not just US policy but the creation of organizations like NATO.

When their opponents in this existential struggle voted themselves out of existence without firing a single ICBM (with lots of American help to be sure), both the US and NATO faced a strategic vacuum that was ultimately filled by the threat of terrorism. Though this was sufficient grounds for another massive reorganization of the national security state and a dramatic expansion of the US military’s presence overseas, these foundational fictions remained at the heart of US policy and persist unconfronted up until the present. Though these national security types may be the ones showing symptoms, as these fictions run aground against the changing geopolitical reality of the 21st century, nearly all Americans will have to confront them one way or another.

World War II and the National Security State

Though the US did have various informal and often ad-hoc intelligence operations during the 19th and early part of the 20th century, the modern national security state as we know it today is largely a product of World War II. In many cases, changes made in the immediate aftermath of the war were codifying changes that had either been discussed or implemented in a preliminary way during the war.

Prior to the signing of the National Security Act, the US had a Department of War and a Department of the Navy, both having been established in the first decades after the American Revolution. The State Department had also existed since the country’s founding, and had been considered the lead department for all peacetime foreign relations matters.

Now, the Department of War would be split into separate Air Force and Army departments, joining the Department of the Navy under a new structure called the National Military Establishment. This new group was to be headed by a single cabinet-level Secretary of Defense, replacing the Secretary of War. In 1949 an amendment to the Act was signed which officially created the Department of Defense from this National Military Establishment. Many activities which the US military had begun as part of the wartime effort were also slotted into this newly reorganized military structure.

Truman signing the 1949 Amendment to the National Security Act

The US did not have a governmental agency primarily responsible for foreign intelligence work until President Roosevelt established the Office of Strategic Services in 1942. At war’s end, Truman established the National Intelligence Authority via executive order to oversee the Central Intelligence Group. This National Intelligence Authority consisted of the Secretaries for War, Navy, State, and the president’s chief of staff. With the signing of the National Security Act this group was renamed the National Security Council (NSC). This group would, among other things, be nominally tasked with overseeing the operations of the newly-created CIA.

Though the exact membership of the NSC has changed as new departments are created and presidential priorities change, this is still the structure which still forms the foundation of presidential authority over the US military and intelligence capabilities. It should also be noted that in addition to these bureaucratic links between diplomacy and intelligence, at this time the State Department and CIA were being run by two brothers, Allen and John Foster Dulles.

In addition to reorganizing the institutions of government themselves, the National Security Act realigned US politics around the need for permanent defense mobilization against any existing or potential threats. While there was no shortage of bureaucratic jostling for position within this structure, the foreign intelligence, diplomacy, and military capabilities of the US were now firmly and irrevocably fused together under the banner of US national security.

These organizational changes mirrored or perhaps precipitated ideological changes in America. As Douglas T. Stuart, a historian who teaches at the Army War College, wrote in a study of the National Security Act:

Over time, the concept of national security displaced national interest as the leitmotif of American foreign policy, and it became increasingly difficult for U.S. policymakers to calculate American interests unless they were framed, and justified, by reference to national security … In accordance with the logic of national security, policymakers were predisposed toward worst case scenarios and tended to favor military instruments of power and influence. (p. 303)
Stuart, Douglas T. Ministry of Fear: The 1947 National Security Act in Historical and Institutional Context
International Studies Perspectives, Volume 4, Issue 3, August 2003, Pages 293–313, https://doi.org/10.1111/1528-3577.403006

Though Stuart frames this result as an unintentional consequence, considering it alongside the ideological origins of the containment doctrine suggests this may as well have been the purpose of this reorganization all along. By laying the groundwork for an extensive system of intelligence collection and covert activity, and by creating procedures for consultation between the civilian and military agencies involved in national security planning, the National Security Act gave the United States a framework designed for anti-Soviet containment.

Birth of Containment

George Kennan was working as deputy chief of mission at the US embassy in Moscow. It was from this post that he wrote his famous Long Telegram in 1946. The telegram, addressed to the Secretary of State but widely circulated throughout the government, lays out Kennan’s view of the Soviet state at the end of World War II. Following his assessment of the Soviet ideology and state structure, he writes:

It should not be thought from above that Soviet party line is necessarily disingenuous and insincere on part of all those who put it forward. Many of them are too ignorant of outside world and mentally too dependent to question self-hypnotism, and who have no difficulty making themselves believe what they find it comforting and convenient to believe. … The very disrespect of Russians for objective truth–indeed, their disbelief in its existence–leads them to view all stated facts as instruments for furtherance of one ulterior purpose or another. There is good reason to suspect that this Government is actually a conspiracy within a conspiracy; and I for one am reluctant to believe that Stalin himself receives anything like an objective picture of outside world. Here there is ample scope for the type of subtle intrigue at which Russians are past masters.

The picture Kennan paints of the Soviet Union is of an intensely paranoid and unpredictable adversary run by men so blinded by ideology that they may not even realize the depths of their self-deception. Officials operating within the Soviet government are encouraged to ignore any information which would contradict the official ideology. Therefore it is impossible to expect that the average Soviet citizen has anything approaching an understanding of the world around them. It is within this hall of mirrors that, according to Kennan, the Soviets have ample room to advance their own anti-American agenda as only they know how. Elsewhere in the telegram Kennan says that his assessment of the Soviet state should not be taken as an assessment of the Russian people, but I find this difficult to rectify with his allusions to Russian mastery of “subtle intrigue” and Russian “disrespect for objective truth.”

According to an article written by his biographer C. Ben Wright, earlier that year he told the audience at the Air War College that “with probably ten good hits with atomic bombs you could, without any great loss of life or loss of the prestige or reputation of the United States as a well-meaning and humane people, practically cripple Russia’s war-making potential.”

Even for the heady days of total US nuclear superiority, the idea of dropping ten nuclear bombs on a country “without any great loss of life” strikes me as basically insane. Nevertheless, Kennan would often contend that people citing his own words while arguing for aggressive action were misinterpreting his true intentions.

If Kennan’s stance was paradoxical, he would argue that with Russia this simply comes with the territory. Writing in a letter, he declared “in the consideration of Russian matters, [whenever] there is a question as to whether this or that, the answer is usually ‘both.’

In the same article Wright cites an unpublished essay titled “The United States and Russia” that gives some concrete suggestions as to what he may have meant in his own terms :

Don’t act chummy with them.
Don’t assume a community of aims with them which does not really exist.
Don’t make fatuous gestures of good will.
Make no requests of the Russians unless we are prepared to make them feel our displeasure in a practical way in case the request is not granted.
Take up matters on a normal level and insist that Russians take full responsibility for their actions on that level.
Do not encourage high-level exchanges of views with the Russians unless the initiative comes at least 50 percent from their side.
Do not be afraid to use heavy weapons for what seem to us to be minor matters.
Do not be afraid of unpleasantness and public airing of differences. Coordinate . . . all activities of our government relating to R ussia and all private American activities of this sort which the government can influence.
Strengthen and support our representation in Russia

Just as the US was fusing together its new intelligence and military agencies, Kennan was creating an image of the USSR and Russia that would inform how the inhabitants of this new national security apparatus saw their Cold War adversary. Attempts at normal diplomatic relations were futile, and instead the US policy should be to contain and isolate the Soviets (and indeed any communist or socialist government) lest the contagion spread.

While the US was molding its own bureaucratic and political ideology into its Cold War form, the transnational arm of the anti-Soviet coalition was also taking shape: NATO.

Diplomatic Origins of NATO

It’s easy to see the breakdown of the Allied powers as inevitable given the ideological differences between the parties, but the events themselves are interesting because the legal and political structures which emerged as a result of this fallout did much to shape the events of the ensuing decades.

The North Atlantic Treaty Organization (NATO) officially came into existence with the ratification of the North Atlantic Treaty in 1949. The basis of the alliance took shape during secret negotiations between the US, UK, and Canada held at the Pentagon earlier in March of the same year. (See Wiebes and Zieman 1983).

On April 1 the State Department produced minutes of these meetings which represented a preliminary agreement for what would become NATO. The most important aspect from a geostrategic point of view is the concept of mutual defense. This principle would form the basis of NATO’s Article 5, also known as the mutual assistance pledge. Then as now, which countries should be party to this agreement was the subject of intense debate, though establishing it as a regional alliance was important legally because such arrangements are allowed under Chapter VIII of the UN Charter.

Secretary of State Dean Acheson signs North Atlantic Treaty with President Harry Truman and Vice President Alben Barkley.

On April 1, 1954, almost five years after the North Atlantic Treaty was ratified by the United States, the Soviet Union presented a proposal for them to join the alliance:

Plainly enough, given the proper conditions, the North Atlantic Treaty Organization could lose its aggressive character, that is, if all the big powers which belonged to the anti-Hitler coalition became its participants. In view of this the Soviet Government, guided by the unchanged principles of its foreign policy of peace and desirous of relaxing the tension in international relations, states its readiness to join with the interested governments in examining the matter of having the Soviet Union participate in the North Atlantic Treaty.

In addition to perceiving the existence of NATO itself as a de facto anti-Soviet alliance, the immediate impetus for this request was the establishment of the European Defense Community via a treaty signed on May 27, 1952 but which was never ratified. This proposed arrangement was to include a newly rearmed West Germany, which had become a highly contentious issue in the years following the war.

The Bundeswehr would not come into existence until 1955, but its formation was the product of many years of effort among officials in West Germany, including former members of the Nazi Wehrmacht like Hasso von Manteuffel. I will admit to being a partisan of the Soviet view on these matters, but even setting that aside I find it difficult to see in this proposal the kind of irrationality or paradoxical thinking that Kennan presents in his writings. Is there really no community of aims among the former anti-Nazi alliance? Nevertheless, this request was rejected and the Cold War counterpoint to NATO, the Warsaw Pact, was established a few years later.

That NATO is an international alliance conceived of initially through negotiations at the Pentagon between Anglo countries is significant for understanding its real purpose. One of NATO’s most important functions during the Cold War was as a mouthpiece which could claim to speak for an alliance of many countries. NATO would prove essential for US rhetoric during the Cold War as a way of contrasting liberal capitalist democracies with the authoritarian Soviet communism. Over time, speaking for “the international community” or later the “rules-based international order” would become common even in cases where it was clear that the US was in the driver’s seat.

*Detail from 1956 travel map of Western Europe published by ESSO*

As fascinating as all this diplomatic and covert action was, the ultimate backbone of US strategy post-World War II was the nuclear bomb. Though the term “mutually assured destruction” has become a shorthand for Cold War nuclear policy, it is notable that US military strategists were aware of the thorny political issues raised by nuclear weapons even before the Soviets had successfully tested a weapon of their own in 1949.

Nuclear Strategy and the American Cold War Ethos

A 1946 paper titled The Absolute Weapon: Atomic Power and World Order was one of the earliest attempts at formulating US nuclear policy. Nuclear weapons caused destruction at such an epic scale that it was challenging many of the underlying assumptions of military strategy. Even from this early date, though, it was clear that the biggest challenges presented by nuclear weapons would be political, not technological.

The paper identified two political dilemmas posed by nuclear weapons. The first was that the primary political tool for arriving for any agreement between sovereign states is a voluntary treaty. Even if a treaty limiting atomic weapons could be negotiated and signed, the immense power wielded by nuclear armed states meant that the incentive to break the agreement would only grow as more states signed on.

The second dilemma was that the growth in military air power meant attacks could happen more quickly, so the prospect of any kind of negotiated solution to the threat of atomic attack among any international bodies seemed slim to none.

The political response to these twin dilemmas posed by nuclear weapons among political leaders was ably summarized in the report:

After a few early flights of fancy, most of the political analysts lapsed into a discreet silence on the subject. It was quickly apparent that they had been handed one of the toughest problems which the members of their guild had ever had to face. … Each sortie into some promising opening either ended up against a solid wall or led to another tangle of seemingly insoluble problems. No clue could be found to a simple formula which would offer repose to men’s minds while opening up new vistas of unruffled prosperity. In fact there was reason to believe that nothing of the sort would ever be found and that the job was one of arduous and patient examination of a whole mosaic of related problems extending indefinitely into the future.
p.3

Indeed no straightforward political solution to the problem of nuclear weapons has yet been found. Though some at the time believed no country could summon the scientific knowledge and technical resources required to create such a sophisticated device, the political benefits to having one all but guaranteed that total US nuclear superiority would not last for long. The work of nuclear strategists and the national security state was developing sufficient countermeasures such that any use of nuclear weapons on one side would all but guarantee that the attacker would be also annihilated by a nuclear counterattack.

In order to reap the geopolitical benefits of this power, the holders of nuclear weapons must convince their opponents that they would indeed use it when faced with an existential threat. In a word: credibility.

It was against the backdrop of this uneasy standoff that the complex politics of the Cold War played out. The main concern of US intelligence, military planning, and diplomacy would be to be under constant vigilance against any technological or strategic development that might give the USSR enough of an advantage to encourage them to actually fire their weapons.

The authors of the report cited above believed that once other countries reached nuclear parity, it was far more likely that nuclear weapons would be used either at the very beginning of a war or after a few years of sustained fighting, probably by whichever side was losing. They declared it would be “foreign to human nature” for war to erupt between nuclear powers that would not involve nuclear weapons because if the conflict was great enough so as to lead to the outbreak of war, then it would be unlikely for war to occur in the first place.

Though the US and USSR never did fight a head-on conventional war, looking back on the 20th century we can see that almost the exact opposite of what Brodie predicted happened. Instead of one single moment of annihilation, the Cold War was fought all over the world, with those in the decolonizing world doing the majority of the fighting and dying. Governments were overthrown covertly out of US embassies and weapons of the non-nuclear variety circulated around the world, all the while the threat of nuclear weapons waiting in the offing in case things really got serious. These wars were fought with covert arming and training of militias and aggressive propaganda efforts, often more intensely concerned with domestic rather than foreign opinion.

Intelligence work has been murky since the days of the Venetian Council of Ten, but now with weapons which could destroy the world many times over with just a few sorties or launches from submarines, the stakes of spook paranoia would reach dizzying heights. Over time, it became clear to anybody either already at the top of US national security as well as those who sought to rise up the ranks that there was rarely much to gain from deflating potential threats.

As the national security state solidified, this constant need for newer threats lurking around every corner became a matter of bureaucratic survival. Disagreements over strategy and tactics between different factions within it would reverberate through US politics, though largely unacknowledged by and often unknown to the general public.

On all sides, though, national security became the prime directive of the US state. Wars overseas were justified in the name of warding off a seemingly omnipresent enemy that no matter how much the US spent was always about to pull ahead if they weren’t ten steps ahead already. The imagery and rhetoric used to convince the American people became more sophisticated alongside similar refinements in the tools for effecting US policy overseas in both covert and overt ways.

What had once been fixed distinctions between wartime and peacetime were increasingly becoming blurred. Whether we were at war or peace became less dependent on formal declarations and more a matter of assessing the levels of violence in a given place and time.

The US may have “won” the Cold War, but the strategies and tactics used to achieve that victory would have profound effect on both domestic and international politics up to the present. These effects and their connection to this string of bizarre incidents known as Havana Syndrome will be the subject of part 2.

Orion's Bastard

Category: other writing

Scroogeball

Show me the Moneyball

The Reserve Clause

Scroogeball

Conclusion

Checking the COMPAS: An Open Records Analysis

“All experiences are real”: A Deep History of Havana Syndrome (Part I)

Trouble in Foggy Bottom

Havana Syndrome Emerges

World War II and the National Security State

Birth of Containment

Diplomatic Origins of NATO

Nuclear Strategy and the American Cold War Ethos

Menu