“The Commonwealth War Graves Commission (CWGC) honours the 1.7 million men and women of the Commonwealth forces who died in the First and Second World Wars, and ensures they will never be forgotten. Our work commemorates the war dead, from building and maintaining our cemeteries and memorials at 23,000 locations in more than 150 countries and territories to preservation of our extensive records and archives. Our values and aims, laid out in 1917, are as relevant now as they were 100 years ago.” — https://www.cwgc.org
The website includes a comprehensive, publicly available and searchable database that includes records of all those who died in both wars and details of the memorials and cemeteries. For people who are starting their journey to trace a lost relative, the CWGC website has fundamentally improved the process. Using the database, it is a simple matter to trace the location of a grave or memorial, and additional documents regarding exhumation and grave registration are available in some cases. These supplementary documents are not available for those who died in WWII, but I imagine they could be requested by a direct descendant.
I have a personal interest in the CWGC that pre-dates their website. One of the 1.7 million graves recorded and maintained by the CWGC belongs to my uncle, whom I was named after. Before the Internet and cheap international travel, the only link I had to my uncle’s grave was a book sent to my grandmother by the authorities in 1961, which detailed the cemeteries and grave locations in the country where he is buried. A tin box containing some medals and the brief listing of a distant cemetery plot haunted me for many years.
Target Audience & Tools
I am interested in datasets with a historical or military theme and I was surprised to find very little analysis using this data. This project is a fusion of history and data science, and will hopefully be of historical interest or technical assistance to others. The posts are long-form, so if you are looking for a quick read you might be disappointed. I wanted to explore and discuss my solution as it unfolded and not just produce chunks of working code. For anyone interested solely in the historical aspects, you might want to skip to the later posts or come back later when the project is completed. The technical content is pitched at the intermediate level. If you want to follow along and execute the code for yourself, you will need a recent version of the R language, Rstudio and Google Chrome which are freely available online. You should preferably be familiar with R and have a basic understanding of HTML and CSS.
Legal Stuff
According to the CWGC website:
“Copyright and database rights in all material on this site are the property of the Commonwealth War Graves Commission unless otherwise stated. This material may be reproduced free of charge in any format or medium for personal use or for internal circulation at an educational establishment, provided it is not altered or used in a misleading context and the Commonwealth War Graves Commission is acknowledged as the source of the material.”
Any data obtained from the CWGC website remains the property of the CWGC, and I hereby acknowledge the CWGC as the source of the material used in this work. Whilst collecting the data, I took particular care not to perform any actions that would impact the performance of the CWGC website, or that might impede the ability of other users to access their service.
Other Reading
Here are a few online resources I found useful:
- R Markdown Good reference for generating content and publishing using the rmarkdown package.
- Scraping in R with rvest Introduction to web scraping and using the rvest package.
- Historical Study Great project that uses CWGC data to analyse WWI deaths from Co. Limerick in Ireland.
- Web Scraping Ethics Some pointers on the ethics of web scraping.