This is the first post in a series that explores and visualizes the Commonwealth War Graves Commission (CWGC) dataset. In previous posts, I spent some time discussing how I collected the data from the CWGC website and now I finally get to explore the data.
Deaths By Year
Perhaps the most obvious trend to explore in this data is the number of deaths over time. First I extracted the year from the date of death and then calculated the number of deaths per year for each war.
# Load data fileload("All_cwgc_graves_with_served_and_branch.rda")# Add in year of DoDfinal_cwgc$Year<-format(as.Date(final_cwgc$DoD_1,format="%Y-%m-%d"),'%Y')# Rename factorsfinal_cwgc$War<-revalue(final_cwgc$War,c("1"="WW1","2"="WW2"))# Number of deaths by year and by war ordered asc.final_cwgc_year<-setDT(final_cwgc)[,.(Deaths=.N),by=.(Year,War)][order(Year,War)]# Remove NAfinal_cwgc_year<-final_cwgc_year[complete.cases(final_cwgc_year$Year),]# View outputknitr::kable(final_cwgc_year,align='c')%>%kable_styling()
Year
War
Deaths
1914
WW1
42100
1915
WW1
151600
1916
WW1
237100
1917
WW1
295125
1918
WW1
286765
1919
WW1
38097
1920
WW1
14479
1921
WW1
5879
1939
WW2
6595
1940
WW2
69215
1941
WW2
85466
1942
WW2
111949
1943
WW2
113945
1944
WW2
164984
1945
WW2
76548
1946
WW2
18703
1947
WW2
12240
Table 1: Number of CWGC Dead by Year
Strictly speaking, the CWGC data also commemorates a number of individuals who were not from the Commonwealth countries (i.e. not from South Africa, Australia, Canada, India, New Zealand and the United Kingdom), so these numbers are a little inflated. What is immediately apparent is that the data show deaths after the cessation of hostilities - in the years after WWI (1919-1921) and WW2 (1946-1947) there were 90,473 deaths, which represents deaths from injuries suffered during the war.
The Times printed its daily Roll of Honour until well into 1919, as men continued to succumb to their wounds. In almost every street there were blind men, turning their sightless faces to the light, the maimed and disabled, standing with the arm of a jacket or a trouser leg flapping empty or hobbling on crutches down the street, and scarred or disfigured ex-servicemen – the French called them ‘men with broken faces’ and sculptors made metal masks to cover their ravaged features.
Hanson, N. (2019), Unknown Soldiers: The Story of the Missing of the Great War, Lume Books.
1
2
3
4
5
6
# War yearswar_years<-c('1914','1915','1916','1917','1918','1939','1940','1941','1942','1943','1944','1945')# Number of deaths after war endpost_war_deaths<-final_cwgc[!(final_cwgc$Year%in%war_years),]
If the data was available, it would be interesting to analyse the ‘excess’ deaths in the years after both the wars. There must have been countless lives cut short that are not recorded in the CWGC data, caused by ill health, substance abuse and suicide due to the trauma of the war. Plotting the CWGC war dead for each year is a simple enough matter using the plotly package.
# Legendl<-list(bordercolor="#D3D3D3",borderwidth=2,orientation='h',xanchor='center',y=0.8,x=0.7)# Deaths by year in WWIp1<-plot_ly(final_cwgc_year[final_cwgc_year$War=="WW1",],x=~Year,y=~Deaths,color=~War,type="bar",name=~War)%>%layout(yaxis=list(title="CWGC Deaths"),xaxis=list(title=""),legend=l)%>%config(displayModeBar=F)# Deaths by year in WWIp2<-plot_ly(final_cwgc_year[final_cwgc_year$War=="WW2",],x=~Year,y=~Deaths,color=~War,type="bar",name=~War)%>%layout(yaxis=list(title="CWGC Deaths"),xaxis=list(title=""),legend=l)%>%config(displayModeBar=F)p<-subplot(p1,p2,shareY=TRUE)p
Figure 1: Commonwealth Deaths by Year for WW1 and WW2
WWI had a far greater number of losses over a shorter time period in contrast with WWII. The year with the fewest deaths in WWI was 1914 - relative to the other years, there were only 5 months of action which mostly involved the small British Expeditionary Force, which later became known as the ‘Old Contemptibles’. The worst year in WWI was 1917 and included losses from the Battle of Arras and Vimy Ridge, the Third Battle of Ypres (better known as Passchendaele), the Battle of Messine Ridge and the Battle of Cambrai.
There were relatively few CWGC WWII deaths in 1939, which makes sense considering this was the period of the so-called ‘Phoney War’, also known by the Germans as the Sitzkrieg. Following the invasion of France in 1940, the losses gradually increased and peaked in 1944 with the Allied invasion of Europe and the subsequent push to Berlin. At this level of granularity - where the data is summarised by year - it is difficult to comment much further.
Deaths By Year and Branch
Taking the previous plot and drilling down by branch of service will hopefully give a clearer picture. I removed the ‘Miscellaneous’ category since it comprises such a small fraction of the overall total. I also split out the plots for each war.
# Remove 'miscellaneous' categoryfinal_cwgc<-final_cwgc[final_cwgc$Branch!="Miscellaneous",]# Number of deaths by year and branch ordered asc.final_cwgc_branch<-setDT(final_cwgc)[,.(Deaths=.N),by=.(Year,War,Branch)]# Re-order factor levelsfinal_cwgc_branch$Branch<-factor(final_cwgc_branch$Branch,levels=c("Army","Air+force","Civilian+War+Dead+1939","Navy","Merchant+navy"))# Rename factor levelslevels(final_cwgc_branch$Branch)<-c("Army","Air Force","Civilian","Navy","Merchant Navy")# Legendl<-list(bordercolor="#D3D3D3",borderwidth=2,xanchor='center',y=1,x=0.8)# Grouped bar by Branch for WWIplot_ly(final_cwgc_branch[final_cwgc_branch$War=='WW1',],x=~Year,y=~Deaths,type="bar",color=~Branch)%>%layout(yaxis=list(title="Deaths (Log Scale)",type="log"),xaxis=list(title=""),legend=l)%>%config(displayModeBar=F)
Figure 2: Commonwealth Deaths in WW1 by Service Branch
The losses for the Army dwarf those of the other service branches, making them difficult to distinguish from one other, so the solution is to use a log scale.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Legendl<-list(bordercolor="#D3D3D3",borderwidth=2,xanchor='center',y=0.8,x=0.8)# Grouped bar by Branch for WWIIplot_ly(final_cwgc_branch[final_cwgc_branch$War=='WW2',],x=~Year,y=~Deaths,type="bar",color=~Branch)%>%layout(yaxis=list(title="Deaths (Log Scale)",type="log"),xaxis=list(title=""),legend=l)%>%config(displayModeBar=F)
Figure 3: Commonwealth Deaths in WW2 by Service Branch
The grouped bar chart for WWII is a little clearer but still requires a log scale as before. The majority of the losses were sustained by the army, but also shows the increasing losses of the air force as the strategic bombing campaigns progressed.
An alternative to the bar chart is the filled area chart. It takes a little more effort to code and is helpful when comparing the relative percentages for each of the service branches, but does not show the absolute totals. The filled area chart has the same problem as the grouped bar chart when one of the categories is very dominant (e.g. Army in WWI). I made the two plots share the same legend colors for comparison purposes. The plot for WWII shows the greater contribution of the Air Force and Navy (relative to WW1) and has an additional category for Civilian deaths.
# Casualties by month# Add in year-month of DoDfinal_cwgc$YearMon<-format(as.Date(final_cwgc$DoD_1,format="%Y-%m-%d"),'%Y-%m')# Remove 'miscellaneous' categoryfinal_cwgc<-final_cwgc[final_cwgc$Branch!="Miscellaneous",]# Filter out deaths post-war and split by warcwgc_month<-final_cwgc[(final_cwgc$DoD_1<="1918-11-30"|final_cwgc$DoD_1>="1939-08-30"&final_cwgc$DoD_1<="1945-09-30"),]# Summarise by monthcwgc_month<-setDT(cwgc_month)[,.(Num=.N),by=.(War,YearMon)]# Legendl<-list(bordercolor="#D3D3D3",borderwidth=2,xanchor='center',orientation='h',y=0.6,x=0.7)# Barp1<-plot_ly(cwgc_month[cwgc_month$War=="WW1",],type="bar",y=~YearMon,orientation='h',x=~Num,name=~War)%>%layout(yaxis=list(title=""),xaxis=list(title=""))%>%config(displayModeBar=F)p2<-plot_ly(cwgc_month[cwgc_month$War=="WW2",],type="bar",y=~YearMon,orientation='h',x=~Num,name=~War)%>%layout(yaxis=list(title=""),xaxis=list(title=""))%>%config(displayModeBar=F)p<-subplot(p1,p2,shareX=TRUE,nrows=2)p
Figure 6: Commonwealth Deaths by Year & Month for WW1 and WW2
Considering both wars, only one month in WW2 - June 1944, which marked the invasion of France - makes it into the worst 20 months overall.
1
2
3
4
5
6
7
8
9
# Order by number of deadcwgc_month<-cwgc_month[order(-Num),]# Rename columnscolnames(cwgc_month)<-c("War","Month","Deaths")# View outputknitr::kable(head(cwgc_month,20),align='c')%>%kable_styling()
War
Month
Deaths
WW1
1916-07
60806
WW1
1917-04
45611
WW1
1918-10
44437
WW1
1917-10
39594
WW1
1918-03
39532
WW1
1918-04
39017
WW1
1916-09
38545
WW1
1918-09
37306
WW1
1918-08
31502
WW1
1917-08
28017
WW1
1917-05
27906
WW1
1915-05
27612
WW1
1917-11
26297
WW1
1918-11
25699
WW1
1916-10
24551
WW1
1917-09
24369
WW1
1916-08
23727
WW1
1917-07
23592
WW1
1915-09
22212
WW2
1944-06
22134
Table 2: Twenty Worst Months For Commonwealth Deaths in WW1 and WW2
A box plot is a handy way to compare the distribution of the monthly deaths in each war. The vertical line inside the box represents the median value. Only a handful of months in WW2 had more deaths than the median value for WW1. The data point far out to the right in WW1 is the battle of the Somme in July 1917.
This post has made the first steps in exploring the CWGC data by plotting the deaths by year, month and service branch. The next post will drill down to further explore the Commonwealth deaths over time.