When Africanus asked who, in Hannibal’s opinion, was the greatest general, Hannibal named Alexander… as to whom he would rank second, Hannibal selected Pyrrhus… asking whom Hannibal considered third, he named himself without hesitation. Then Scipio broke into a laugh and said, “What would you say if you had defeated me?”
Like Hannibal, I wanted to rank powerful leaders in the history of warfare. Unlike Hannibal, I sought to use data to determine a general’s abilities, rather than specific accounts of generals’ achievements. The result is a system for ranking every prominent commander in military history.
Inspired by baseball sabermetrics, I opted to use a system of Wins Above Replacement (WAR). WAR is often used as an estimate of a baseball player’s contributions to his team. It calculates the total wins added (or subtracted) by the player compared to a replacement-level player. For example, a baseball player with 5 WAR contributed 5 additional wins to his team, compared to the average contributions of a high-level minor league player. WAR is far from perfect, but provides a way to compare players based on one statistic.
I adopted WAR to estimate a given military tactician’s contributions beyond or below an average general. My model, which I explain below, provides an estimate for the performance of an average general in any given circumstances. I can then evaluate a general’s quality based on how much they exceeded or fell short of a replacement general in the same circumstances (assuming a replacement general would perform at an average level). In other words, I would find the generals’ WAR, in war.
My first challenge was constructing a reliable dataset. Since I was unable to find a comprehensive dataset of historical battles, I decided to build my own. I used Wikipedia’s lists of battles as a starting point. While not comprehensive, Wikipedia’s lists include 3,580 unique battles and 6619 generals, which provided a sufficient sample to create a model. I then developed a function that could scrape key information for each battle, including all of the commanders involved in the battle, the total forces available to those commanders, and the outcome of the battle. The resulting dataset provided a large sample of battles to create a baseline (replacement-level) performance, against which I would compare the performance of individual generals.
Sample of battle data on Wikipedia, before scraping
Sample of battle data scraped and processed into dataframe
I then constructed a linear model from that sample of battles. For each battle, I separated the combatants’ forces into infantry, cavalry, artillery, air force, and navy. I could then weight a general’s numerical advantage or disadvantage compared to their adversary, and better isolate the general’s ability as a tactician. The resulting model was surprisingly conservative in its weights, suggesting that raw soldier quantities have a relatively small effect compared to other factors such as terrain or technology, which further research could investigate in more detail. In this project, however, the results potentially inflate the importance of a commander’s tactical acuity compared with other factors.
I was ready to rank each general and delve into the results. I did so by isolating each general’s battles, and assigning a WAR score to their performance in each battle. For instance, French Emperor Napoleon gained .49 WAR for his victory at the Battle of Borodino. Since French troops slightly outnumbered the forces of Russian Empire, the model gives a replacement general in Napoleon’s position a 51% chance of victory. The WAR system assigns Napoleon 1 win for his victory, but subtracts the chance a replacement general would have won anyway. Thus, Napoleon gains .49 wins above replacement.
The system uses a similar methodology to handle defeats. For instance, Russian general Mikhail Kutuzov, one of Napoleon’s adversaries in the Battle of Borodino, was attributed -.49 WAR from the confrontation. By suffering defeat, he achieved -1 win, but there’s a 51% chance a replacement general would have lost anyway.
Among all generals, Napoleon had the highest WAR (16.679) by a large margin. In fact, the next highest performer, Julius Caesar (7.445 WAR), had less than half the WAR accumulated by Napoleon across his battles. Napoleon benefited from the large number of battles in which he led forces. Among his 43 listed battles, he won 38 and lost only 5. Napoleon overcame difficult odds in 17 of his victories, and commanded at a disadvantage in all 5 of his losses. No other general came close to Napoleon in total battles. While Napoleon commanded forces in 43 battles, the next most prolific general was Robert E. Lee, with 27 battles (the average battle count was 1.5). Napoleon’s large battle count allowed him more opportunities to demonstrate his tactical prowess. Alexander the Great, despite winning all 9 of his battles, accumulated fewer WAR largely because of his shorter and less prolific career.
However, outside of Napoleon’s outlying success, the generals’ WARs largely adhere to a normal distribution. This suggests his success is attributable to command talent, rather than an anomaly in the model’s findings. In fact, Napoleon’s total WAR was nearly 23 standard deviations above the mean WAR accumulated by generals in the dataset.
There were also generals that had surprisingly low total WAR despite a reputation as master tacticians. Robert E. Lee, commander of the Confederate States Army, finished with a negative WAR (-1.89), suggesting an average general would have had more success than Lee leading the Confederacy’s armies. Lee was saddled with considerable disadvantages, including a large deficit in the size of his military and available resources. Still, his reputation as an adept tactician is likely undeserved, and his WAR supports the historians who have criticized his overall strategy and handling of key battles, such as ordering the disastrous ‘Pickett’s Charge’ on the last day of the Battle of Gettysburg. In the words of University of South Carolina professor Thomas Connely, “One ponders whether the South may not have fared better had it possessed no Robert E. Lee.”
German field marshal Erwin Rommel, nicknamed the ‘Desert Fox’ for his successes in North Africa during World War II, also performed poorly in this model, finishing with -1.953 WAR. This finding disputes the praise Rommel has received as a tactician from modern generals, including Norman Schwarzkopf and Ariel Sharon. However, like Lee, Rommel has been the subject of considerable historical debate. In particular, critics have attributed much of his reputation as a tactical genius to both German and Allied propaganda. British generals reportedly exaggerated Rommel’s tactical abilities in order to minimize disapproval regarding their defeats.
Modern generals performed relatively poorly in the model. American general George S. Patton, described by historian Terry Brighton as “among the greatest generals of [World War II],” accumulated only .9 WAR. The failure of modern generals to perform well in WAR may be attributable to changes in warfare which have prevented individual generals from participating in a large number of battles.
Among post-World War II generals, Israeli commanders stood out. Israeli military leader Moshe Dayan finished with 2.109 WAR (60th overall), an impressive amount for a modern general but relatively modest compared to pre-20th Century tacticians. Similarly, former Israeli Prime Minister Ariel Sharon accumulated 2.171 WAR (58th overall) for his battlefield successes in the Suez Crisis, Six-Day War, and Yom Kippur War.
Finally, I compared Hannibal’s assessment of the top generals of all time to my model. According to WAR, Hannibal underrated his own abilities— of all generals to date, Hannibal had the highest WAR at 5.519 (6th overall). Alexander the Great, who Hannibal named the top general, was just behind Hannibal’s mark with 4.391 WAR (10th overall). However, Alexander died after fighting just 9 battles, winning all of them. Hannibal had 17 battles to accumulate value, winning 13, losing 2, and drawing 2. Thus, I agree with Hannibal’s assessment that Alexander was the more able tactician, even though Hannibal provided more total value — Alexander demonstrated his ability to win battles, and likely would have continued to win had he not succumbed to illness.
My findings starkly diverge from Hannibal’s assessment regarding Pyrrhus of Epirus, a Greek general and early Roman rival. My model credits Pyrrhus with only 3 battles and -0.53 WAR. Although Hannibal attributes innovative military tactics to Pyrrhus, I am deeply skeptical of his overall tactical acumen, even before considering his inability to prevent catastrophic casualties to his armies during his victories.
This project and the resulting visualizations hopefully provide a fun and interesting way to explore and compare generals’ relative success. WAR provides a useful paradigm for empirically comparing generals, although future research could improve this model by expanding the dataset or considering other factors, such as strength of opponent. Please play around with the visualization, and if you’re looking for a specific general, just type the URL ‘https://ethanarsht.github.io/military_rankings/***.html’, where *** is the general’s name, exactly as it appears on Wikipedia.
UPDATE 12/11: Based on the feedback of many people whose work I respect quite a bit, I wanted to explicitly lay out a few caveats to the above analysis. First, this piece is intended as a fun thought experiment, not a definitive ranking, or a scholarly contribution to the field of military history. I believe some of the results from this project, especially Lee and Rommel, provide interesting data points for broader discussions of their tactical abilities. In no way do I claim my analysis provides the full picture, or anything close to it.
Furthermore, since I rely heavily on Wikipedia for data and the categorization of that data, there are holes and inconsistencies in my inputs. Given my personal lack of resources, it is implausible for me to conduct a project of this scale while checking the precision of each and every data point.
Finally, I must reiterate that my ranking is of a general’s tactical value added, not their overall strategic abilities, or who would win in a hypothetical head-to-head with equalized troops and equipment.
Again, I believe the vast majority of readers interpreted this piece as I intended — a thought experiment with interesting results and entertaining interactivity. Thank you to everyone that has read the piece and/or provided feedback.
UPDATE 12/8: By popular demand, the visualization now includes average WAR added per battle. Just hover over a general’s dot and ‘WAR per battle’ appears in the popup window.
Additionally, some people have had trouble navigating the mess of a GitHub repository associated with the project. I have put two important spreadsheets in this Google Drive: one with all of the troop number data, and another with the WAR outcomes for each battle.
UPDATE 12/6: I wanted to respond to a few reasonable and persistent strands of constructive criticism I’ve received in the past couple of days.
- Missing data! A number of people have accurately pointed out missing battles/generals in the data, particularly concerning the Mongols, including Genghis Khan and Subutai. This is a major problem, and stems from my reliance on Wikipedia’s lists of battles. This is something I should have caught sooner, and I plan on updating the dataset to include a larger number of battles. However, handling this data requires a great deal of manual data entry/cleaning, and it will take me considerable time before I can add a major update to the dataset.
- Strategy versus tactics: People have argued that a general was under/overrated because of the eventual outcome of their campaigns. I am very specifically concerned with a general’s tactical acuity, and not their strategic decision-making. So Napoleon shouldn’t lose credit for his disastrous Russia campaign, nor should George Washington gain credit for his strategic approach to the American Revolutionary War.
- Wins Above Replacement versus Wins Above Average versus Wins Probability Added: Those familiar with baseball sabermetrics have been quick to point out that my model doesn’t mirror the Wins Above Replacement approach in all ways, since baseball’s WAR uses a generic, top-tier, minor-league player as its baseline. I simply assigned the average quality as my replacement-level. Possibly not completely accurate, but I think it’s quite clear what I’m using as a baseline in the above methodology.
Thank you to those of you who have provided constructive criticism.
Please do not hesitate to provide feedback on twitter (@ethanarsht) or email@example.com. Code and data available here: https://github.com/ethanarsht/military_rankings. I apologize for the mess.