Featured image of post Analysing the CWGC Data - Part 1

Analysing the CWGC Data - Part 1

Introduction

This is the first post in a series that explores and visualizes the Commonwealth War Graves Commission (CWGC) dataset. In previous posts, I spent some time discussing how I collected the data from the CWGC website and now I finally get to explore the data.

Deaths By Year

Perhaps the most obvious trend to explore in this data is the number of deaths over time. First I extracted the year from the date of death and then calculated the number of deaths per year for each war.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Load data file
load("All_cwgc_graves_with_served_and_branch.rda")

# Add in year of DoD
final_cwgc$Year <- format(as.Date(final_cwgc$DoD_1, format = "%Y-%m-%d"), '%Y')

# Rename factors
final_cwgc$War <- revalue(final_cwgc$War, c("1" = "WW1",
                                            "2" = "WW2"))

# Number of deaths by year and by war ordered asc.
final_cwgc_year <- setDT(final_cwgc)[, .(Deaths = .N), 
                                     by = .(Year, War)][order(Year, War)]

# Remove NA
final_cwgc_year <- final_cwgc_year[complete.cases(final_cwgc_year$Year),]

# View output
knitr::kable(final_cwgc_year, align = 'c') %>% 
       kable_styling()
Year War Deaths
1914 WW1 42100
1915 WW1 151600
1916 WW1 237100
1917 WW1 295125
1918 WW1 286765
1919 WW1 38097
1920 WW1 14479
1921 WW1 5879
1939 WW2 6595
1940 WW2 69215
1941 WW2 85466
1942 WW2 111949
1943 WW2 113945
1944 WW2 164984
1945 WW2 76548
1946 WW2 18703
1947 WW2 12240
Table 1: Number of CWGC Dead by Year

Strictly speaking, the CWGC data also commemorates a number of individuals who were not from the Commonwealth countries (i.e. not from South Africa, Australia, Canada, India, New Zealand and the United Kingdom), so these numbers are a little inflated. What is immediately apparent is that the data show deaths after the cessation of hostilities - in the years after WWI (1919-1921) and WW2 (1946-1947) there were 90,473 deaths, which represents deaths from injuries suffered during the war.

The Times printed its daily Roll of Honour until well into 1919, as men continued to succumb to their wounds. In almost every street there were blind men, turning their sightless faces to the light, the maimed and disabled, standing with the arm of a jacket or a trouser leg flapping empty or hobbling on crutches down the street, and scarred or disfigured ex-servicemen – the French called them ‘men with broken faces’ and sculptors made metal masks to cover their ravaged features.

Hanson, N. (2019), Unknown Soldiers: The Story of the Missing of the Great War, Lume Books.

1
2
3
4
5
6
# War years
war_years <- c('1914', '1915', '1916', '1917', '1918', 
              '1939', '1940', '1941', '1942', '1943', '1944', '1945')

# Number of deaths after war end
post_war_deaths <- final_cwgc[!(final_cwgc$Year %in% war_years),] 

If the data was available, it would be interesting to analyse the ‘excess’ deaths in the years after both the wars. There must have been countless lives cut short that are not recorded in the CWGC data, caused by ill health, substance abuse and suicide due to the trauma of the war. Plotting the CWGC war dead for each year is a simple enough matter using the plotly package.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Legend
l <- list(bordercolor = "#D3D3D3",
      borderwidth = 2,
      orientation = 'h',
      xanchor = 'center',
      y = 0.8,
      x = 0.7) 

# Deaths by year in WWI
p1 <- plot_ly(final_cwgc_year[final_cwgc_year$War == "WW1",],
        x = ~Year,
        y = ~Deaths,
        color = ~War,
        type = "bar",
        name = ~War) %>%
layout(yaxis = list(title = "CWGC Deaths"),
       xaxis = list(title = ""),
       legend = l) %>%
config(displayModeBar = F)

# Deaths by year in WWI
p2 <- plot_ly(final_cwgc_year[final_cwgc_year$War == "WW2",],
        x = ~Year,
        y = ~Deaths,
        color = ~War,
        type = "bar",
        name = ~War) %>%
layout(yaxis = list(title = "CWGC Deaths"),
       xaxis = list(title = ""),
       legend = l) %>%
config(displayModeBar = F)

p <- subplot(p1, p2, shareY = TRUE)
p
Figure 1: Commonwealth Deaths by Year for WW1 and WW2

WWI had a far greater number of losses over a shorter time period in contrast with WWII. The year with the fewest deaths in WWI was 1914 - relative to the other years, there were only 5 months of action which mostly involved the small British Expeditionary Force, which later became known as the ‘Old Contemptibles’. The worst year in WWI was 1917 and included losses from the Battle of Arras and Vimy Ridge, the Third Battle of Ypres (better known as Passchendaele), the Battle of Messine Ridge and the Battle of Cambrai.

There were relatively few CWGC WWII deaths in 1939, which makes sense considering this was the period of the so-called ‘Phoney War’, also known by the Germans as the Sitzkrieg. Following the invasion of France in 1940, the losses gradually increased and peaked in 1944 with the Allied invasion of Europe and the subsequent push to Berlin. At this level of granularity - where the data is summarised by year - it is difficult to comment much further.

Deaths By Year and Branch

Taking the previous plot and drilling down by branch of service will hopefully give a clearer picture. I removed the ‘Miscellaneous’ category since it comprises such a small fraction of the overall total. I also split out the plots for each war.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Remove 'miscellaneous' category
final_cwgc <- final_cwgc[final_cwgc$Branch != "Miscellaneous",]

# Number of deaths by year and branch ordered asc.
final_cwgc_branch <- setDT(final_cwgc)[, .(Deaths = .N), 
                                       by = .(Year, War, Branch)]

# Re-order factor levels
final_cwgc_branch$Branch <- factor(final_cwgc_branch$Branch,
                               levels = c("Army",
                                          "Air+force",
                                          "Civilian+War+Dead+1939",
                                          "Navy",
                                          "Merchant+navy"))

# Rename factor levels
levels(final_cwgc_branch$Branch) <- c("Army",
                                      "Air Force",
                                      "Civilian",
                                      "Navy",
                                      "Merchant Navy")
# Legend
l <- list(bordercolor = "#D3D3D3",
      borderwidth = 2,
      xanchor = 'center',
      y = 1,
      x = 0.8) 

# Grouped bar by Branch for WWI
plot_ly(final_cwgc_branch[final_cwgc_branch$War == 'WW1',],
        x = ~Year,
        y = ~Deaths,
        type = "bar",
        color = ~Branch) %>%
layout(yaxis = list(title = "Deaths (Log Scale)",
                    type = "log"),
       xaxis = list(title = ""),
       legend = l) %>%
config(displayModeBar = F)
Figure 2: Commonwealth Deaths in WW1 by Service Branch

The losses for the Army dwarf those of the other service branches, making them difficult to distinguish from one other, so the solution is to use a log scale.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Legend
l <- list(bordercolor = "#D3D3D3",
      borderwidth = 2,
      xanchor = 'center',
      y = 0.8,
      x = 0.8) 

# Grouped bar by Branch for WWII
plot_ly(final_cwgc_branch[final_cwgc_branch$War == 'WW2',],
        x = ~Year,
        y = ~Deaths,
        type = "bar",
        color = ~Branch) %>%
layout(yaxis = list(title = "Deaths (Log Scale)",
                    type = "log"),
       xaxis = list(title = ""),
       legend = l) %>%
config(displayModeBar = F)
Figure 3: Commonwealth Deaths in WW2 by Service Branch

The grouped bar chart for WWII is a little clearer but still requires a log scale as before. The majority of the losses were sustained by the army, but also shows the increasing losses of the air force as the strategic bombing campaigns progressed.

An alternative to the bar chart is the filled area chart. It takes a little more effort to code and is helpful when comparing the relative percentages for each of the service branches, but does not show the absolute totals. The filled area chart has the same problem as the grouped bar chart when one of the categories is very dominant (e.g. Army in WWI). I made the two plots share the same legend colors for comparison purposes. The plot for WWII shows the greater contribution of the Air Force and Navy (relative to WW1) and has an additional category for Civilian deaths.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Filled area
final_cwgc_branch <- pivot_wider(final_cwgc_branch, 
                                 names_from = Branch,
                                 values_from = Deaths)

# Rename variables
colnames(final_cwgc_branch)[5] <- 'AirForce'
colnames(final_cwgc_branch)[6] <- 'MerchantNavy'

# Legend
l <- list(bordercolor = "#D3D3D3",
      borderwidth = 2,
      xanchor = 'center',
      orientation = 'h',
      y = 1.1,
      x = 0.5) 

# WWI
plot_ly(final_cwgc_branch[final_cwgc_branch$War == 'WW1',],
        x = ~Year,
        y = ~Navy,
        name = 'Navy',
        type = 'scatter',
        mode = 'none',
        stackgroup = 'one',
        fillcolor = '#2ca02c',
        groupnorm = 'percent') %>%
  
add_trace(y = ~AirForce, name = 'Air Force', fillcolor = '#ff7f0e') %>%
add_trace(y = ~MerchantNavy, name = 'Merchant Navy', fillcolor = '#d62728') %>%
add_trace(y = ~Army, name = 'Army', fillcolor = '#9467bd') %>% 
layout(yaxis = list(title = ""),
       xaxis = list(title = ""),
       legend = l) %>%
config(displayModeBar = F)
Figure 4: CWGC WWI Percentage Deaths by Service Branch

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
plot_ly(final_cwgc_branch[final_cwgc_branch$War == 'WW2',],
        x = ~Year,
        y = ~Navy,
        name = 'Navy',
        type = 'scatter',
        mode = 'none',
        stackgroup = 'one',
        fillcolor = '#2ca02c',
        groupnorm = 'percent')  %>%

add_trace(y = ~Civilian, name = 'Civilian', fillcolor = '#1f77b4') %>%
add_trace(y = ~AirForce, name = 'Air Force', fillcolor = '#ff7f0e') %>%
add_trace(y = ~MerchantNavy, name = 'Merchant Navy', fillcolor = '#d62728') %>%
add_trace(y = ~Army, name = 'Army', fillcolor = '#9467bd') %>% 
layout(yaxis = list(title = ""),
       xaxis = list(title = ""),
       legend = l) %>%
config(displayModeBar = F)
Figure 5: CWGC WWII Percentage Deaths by Service Branch

Deaths By Month

Plotting the deaths by month gives a little more detail, albeit at the risk of looking a bit cluttered.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Casualties by month
# Add in year-month of DoD
final_cwgc$YearMon <- format(as.Date(final_cwgc$DoD_1, format = "%Y-%m-%d"),
                             '%Y-%m')

# Remove 'miscellaneous' category
final_cwgc <- final_cwgc[final_cwgc$Branch != "Miscellaneous",]

# Filter out deaths post-war and split by war
cwgc_month <- final_cwgc[(final_cwgc$DoD_1 <= "1918-11-30" |
                           final_cwgc$DoD_1 >= "1939-08-30" & 
                            final_cwgc$DoD_1 <= "1945-09-30"),]

# Summarise by month
cwgc_month <- setDT(cwgc_month)[, .(Num = .N), by = .(War, YearMon)]

# Legend
l <- list(bordercolor = "#D3D3D3",
      borderwidth = 2,
      xanchor = 'center',
      orientation = 'h',
      y = 0.6,
      x = 0.7) 

# Bar
p1 <- plot_ly(cwgc_month[cwgc_month$War == "WW1",],
        type = "bar",
        y = ~YearMon,
        orientation = 'h',
        x = ~Num,
        name = ~War) %>%
layout(yaxis = list(title = ""),
       xaxis = list(title = "")) %>%
config(displayModeBar = F)

p2 <- plot_ly(cwgc_month[cwgc_month$War == "WW2",],
        type = "bar",
        y = ~YearMon,
        orientation = 'h',
        x = ~Num,
        name = ~War) %>%
layout(yaxis = list(title = ""),
       xaxis = list(title = "")) %>%
config(displayModeBar = F)

p <- subplot(p1, p2, shareX = TRUE,nrows = 2)
p
Figure 6: Commonwealth Deaths by Year & Month for WW1 and WW2

Considering both wars, only one month in WW2 - June 1944, which marked the invasion of France - makes it into the worst 20 months overall.

1
2
3
4
5
6
7
8
9
# Order by number of dead
cwgc_month <- cwgc_month[order(-Num),]

# Rename columns
colnames(cwgc_month) <- c("War", "Month", "Deaths")

# View output
knitr::kable(head(cwgc_month, 20), align = 'c') %>% 
       kable_styling()
War Month Deaths
WW1 1916-07 60806
WW1 1917-04 45611
WW1 1918-10 44437
WW1 1917-10 39594
WW1 1918-03 39532
WW1 1918-04 39017
WW1 1916-09 38545
WW1 1918-09 37306
WW1 1918-08 31502
WW1 1917-08 28017
WW1 1917-05 27906
WW1 1915-05 27612
WW1 1917-11 26297
WW1 1918-11 25699
WW1 1916-10 24551
WW1 1917-09 24369
WW1 1916-08 23727
WW1 1917-07 23592
WW1 1915-09 22212
WW2 1944-06 22134
Table 2: Twenty Worst Months For Commonwealth Deaths in WW1 and WW2

A box plot is a handy way to compare the distribution of the monthly deaths in each war. The vertical line inside the box represents the median value. Only a handful of months in WW2 had more deaths than the median value for WW1. The data point far out to the right in WW1 is the battle of the Somme in July 1917.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Bar
plot_ly(cwgc_month,
        type = "box",
        boxpoints = "all",
        x = ~Deaths,
        color = ~War,
        hoverinfo = 'text',
        text = ~paste('</br> Month: ', Month,
                      '</br> Deaths: ', Deaths)) %>%
layout(yaxis = list(title = ""),
       xaxis = list(title = ""),
       showlegend = FALSE) %>%
config(displayModeBar = F)
Figure 7: Comparison of Monthly Deaths

Conclusion

This post has made the first steps in exploring the CWGC data by plotting the deaths by year, month and service branch. The next post will drill down to further explore the Commonwealth deaths over time.

Licensed under CC BY-NC-SA 4.0
Last updated on Mar 28, 2022 00:00 UTC
Built with Hugo
Theme Stack designed by Jimmy