Recently I captured 180,000 photos in 13 hours, of Dundee’s historical burial grounds. This was achieved with 8 continuously capturing Raspberry Pi cameras that were attached to a home made camera rig. These photos were then put through a 3D scanning software, 3DF Zephyr, to generate a virtual 3D model of the area. This model will be used in cartography, digital heritage, and virtual reality.
In this blog post I discuss:
- why I performed this experiment and what I hoped to learn;
- the equipment used;
- how the capture method evolved during the experiment and how it could be improved further;
- the concept of the 3D model and its relevance to subsequent Howff digitisation efforts.
Experiment history and aims
At the end of last year and the start of this year I spent some time in Dundee city centre using a handheld DSLR camera to capture 40,000 photos. These photos were put through a photogrammetry (3D-scanning) software, 3DF Zephyr, to generate a virtual model of the city centre, for example as shown in this flythrough of a point cloud made of 1 billion points:
Also generated were a series of orthographic projections, derived from solid textured meshes, for example this architectural elevation of 38 Seagate:
I am currently engaged in the architectural and geometrical analysis of the buildings in this area, working towards the reproduction of the city centre area as a full scale, high quality CAD (computer assisted design) model – the type of model that could be used in real time applications such as virtual reality (VR). The type of image shown directly above serves as a scale reference in authoring the building models that will populate that final VR environment.
The photo capture for the 40,000 DSLR photos was around 1,000 per hour, the total capture taking around 40 hours. So part of my motivation in performing the Howff Pi capture has been investigating how to increase capture rate and decrease capture duration (i.e. how to take more photos in less time). Straight off, there are two obvious ways to increase capture rate, being (1) use more cameras simultaneously, and (2) increase the capture rate per camera.
The very first day I tried photographing the Howff with a single handheld DSLR (26th July 2019) I captured 11,500 photos in 3 hours. These photos turned out to be non-viable due to poor choice of capture location (too great a distance between respective camera positions), so that was a 3 hour learning experience. But it showed that it was physically possible to sustain a DSLR capture rate of at least 3,500/hour (which would maybe realistically come down to 2,500/hour when accounting for time taken to change camera position).
Additionally, I was interested to see how viable Raspberry Pi camera photos would be for use in photogrammetry. It was clear that DSLR photos worked fine, and I had previously produced a model of Edinburgh’s New Town area derived from 15,000 photographs captured with a Samsung S7 smartphone (see https://sketchfab.com/DanielMuirhead/collections/edinburgh-city-centre-2018-q4):
I am very keen on using equipment that is financially accessible to as many citizens as possible, and a unit comprised of Raspberry Pi and camera is about as low cost as they come. The other comparable low budget option would be a cheap action camcorder, recording a 1080p video (that could be decomposed into a series of stills to serve as ‘photo’ inputs). A further consideration that guided me towards Raspberry Pi is that I am a STEM Ambassador. So anything involving Raspberry Pi hardware and coding that culminates in an experiment that may be replicated by a community or school group, gets the thumbs up. Also, prior to this experiment I had zero experience or training in either Raspberry Pi or coding, so this was a great learning opportunity!
The last issue I was seeking to explore through this experiment, and one that I am still actively researching and developing, is how to quantify an optimal capture trajectory for a given camera rig when encountering a given subject geometry. By way of explanation: photographing a spherical subject (for photogrammetry) will involve a different capture approach than photographing something shaped like a plane (e.g. a long and tall brick wall), and in turn, capturing a subject with multifaceted and complex geometry will demand a more complex camera trajectory.
If capturing photos for photogrammetry is a game, and we consider each photograph captured as a move in that game (whose capture involves expenditure of time and labour), and the winner is the player who makes the least moves, then how do we get better at playing that game (how do we spend as least time and energy in capturing all the required photos)?
I have been using Blender to run simulations of multi-camera rigs encountering subjects with simple geometry, tweaking the rig trajectory for each simulation, then quantifying the fidelity of the reconstruction as a function of that trajectory. Running this Raspberry Pi multi-cam capture experiment marks my transition from running a simulation to field-testing in the real world. This single experiment is not going to solve that problem of how to most efficiently perform photo capture for photogrammetry, but it is a step in that direction. Here is a screenshot of one of my simulations set up in Blender v2.79:
So to summarise, through this experiment I aimed to:
- explore the use of a multi-camera rig for faster photo acquisition (compared to single camera);
- gauge the viability of Raspberry Pi cameras for use in photogrammetry;
- start real world field-tests of optimal capture trajectory.
Here is a photo of the camera rig in its most recent iteration, taken after the last capture session:
- 8x Raspberry Pi Zero W
- 4x Raspberry Pi Camera Module V2 (max. resolution 3280*2464px/8MP, later reduced to 2592*1944px/5MP to match Wide Angle cameras)
- 4x Raspberry Pi Zero Camera Module, Wide Angle (by Zerocam) (resolution 2592*1944px/5MP)
- 8x 32GB MicroSD cards (later upgraded to 8x 64GB, prep. of larger sized microSD informed by this webpage)
- 3x portable USB battery charger packs (i.e. the type used to charge a mobile phone when out and about), each with 3 USB outputs
- 8x 1 metre USB to micro-USB cables (to connect Pi to battery) (later upgraded to 2 metre cables).
Each Pi was programmed with a timelapse photo capture that activated as soon as the Pi was powered on (i.e. when out in the field, as soon as it was plugged into the battery pack). My programming of the cameras was only made possible thanks to the vast amount of tutorials and guides other people have helpfully made available for free online. Here is a screenshot of my internal documentation regarding programming the Pi cameras:
After each capture session, the contents of each Pi card were transferred across to a Windows PC via a microSD card reader through the free software ‘ext2explore’ (which took approx. 40 mins per 32gb card, 120 mins per 64gb card). After that data had been copied over, and prior to the next capture session, each card was booted into a camera-less Pi (i.e. so the automatic timelapse capture could not run) for those image files to be manually deleted in order to free up storage space (which took approx. 5 mins per card). (I have no doubt that there will be faster and more efficient ways of achieving data transfer and wipe, e.g. via SSH, but it was beyond my skill and knowledge at the time.)
I needed a non-destructive method for attaching each Raspberry Pi to the rig, so I mounted each Pi onto a plastic board (a.k.a. a chopping board cut into smaller pieces). Then I stuck some sticky-backed hook+loop (i.e. velcro) onto the back of that plastic board, for attaching to the corresponding location on the rig head … however, the sticky-backed element kept on unsticking from the plastic backing, so I attached some card loops to the plastic backing, then stuck the hook+loop to that card. The whole thing is held together by friction, both in the sense of the hook+loop and the card loops – but after that initial revision it thankfully has maintained cohesion!
The standard v2 cameras were secured to the Pi Zero boards via zip ties (which is undoubtedly a terrible idea – please do not replicate this step as it will probably damage the Pi or camera parts!). The smaller headed wide-angle cameras were each sheathed in a piece of card that was folded over and stapled, again that card intervention was to facilitate the sticky-backed hook+loop attachment.
- polystyrene ball (the ‘head’), 15cm diameter, purchased from a craft store
- thin lengths of wood (the ‘handle’), purchased from a hardware store
- a knitting needle (courtesy of my Mum), skewering a plastic food lid, then skewering down through the centre of the ball and protruding below, for attaching the ‘head’ to the ‘handle’
- a selection of binds, including:
- zip ties, for holding the 2-piece handle together and attaching the knitting needle to the handle;
- rubber bands, for routing the power cables down the handle;
- double sided hook+loop (i.e. hooks on front, loops on back of same strip) for cable routing and for securing a cowl (a.k.a. a plastic bag) over the rig head while in transit
- lastly, a dynamic enclosure for the power units (a.k.a. a toiletries bag in which the batteries were stuffed, suspended from around my neck by a length of string).
The choice of a sphere for the head was intentional, so that it would be possible in theory to position a camera at any desired orientation. The flat-backed camera (a straight line) would intersect with the sphere (a circle) at its tangent, and the sphere is a circle on every axis so that intersection could in principle occur at any location on its surface. With the hook+loop attachment, this offers a (low-tech) ‘infinite variety’ of camera orientations.
The Raspberry Pi camera unit comprises a camera, attached via cable to a computer, attached via cable to a power source. I originally investigated the use of camera extension cables, namely 2 metre cables between each camera and its corresponding computer. All this so it would be possible to have a multi-cam head with small dimensions (e.g. 8 sensors on a 5cm sphere, preparing the ground for 24+ sensors attached to a 30cm sphere, mountable atop an autonomous terrestrial vehicle and/or underside an autonomous aerial vehicle to facilitate rapid surveying). This was fine for one camera, unfortunately for cable routing purposes when 8 of those 2m cables were clumped together this resulted in significant signal interference. It would be possible to counter that interference through superior cable management (i.e. not just bundling them all together and running them down the handle in a single clump). Due to time constraints I opted to directly attach the computer+cameras to the rig head, and instead employ lengthy power cables so at least the weighty power units could be decentralised from the rig head+handle.
For this experiment I was using a premium commercial software called 3DF Zephyr. This is the same software that was used to produce the 1 billion point cloud model referenced above. 3Dflow are the Italian company that produce this software, and every year they run a photogrammetry competition, called the 3Dflow World Cup. Last year (2019) I placed joint second in that competition, for which 3Dflow gifted me 1 year access to the premium version of their software.
The competition has come around again! And again I have been fortunate to progress to the Final Round. Some of my submissions to the previous round included using the Free version of their software (limited to 50 inputs), 3DF Zephyr Free, to generate high quality photogrammetry models from a low amount of photos, like the model shown below:
For my main entry to the Final Round I have gone in the other direction (from minimum input count to maximum) and am submitting a photogrammetry model generated in 3DF Zephyr from tens of thousands of photos:
Regardless of how I fare in the final results, it has been good fun to participate, so many thanks to 3Dflow for running the competition!
- power on all cameras (thereby starting continuous capture)
- keep cameras covered by cowl until arrive at first capture position, then remove cowl
- stand in first position, rig raised above head, internally count to 10 seconds, then advance to next capture position, count to 10 seconds, move to next position, etc
- advance across the area in a trajectory of rows+columns:
- if position01 is 0x0y, position02 will be 1x0y, position03 will be 2x0y etc.,
- then (eventually) take a sidestep and start moving backwards, so position11 will be 10x1y, position12 will be 9x1y, position13 will be 8x1y.
Attempt 2 (same as attempt 1 except):
- instead of counting to 10 internally, listen to looping audio track that emits a beep every 6 seconds (1 second to change position, 5 seconds to stand still)
- for every change in position, rotate the camera rig by +45d on the vertical axis, then by -45d for the subsequent position, alternating the rotation back and forth every second position (so that position01 has regular cameras oriented to N/E/S/W and wide angle cameras oriented to NE/SE/SW/NW, then position02 has regulars oriented to NE/SE/SW/NW and wides oriented to N/E/S/W, and so on).
The audio track mentioned above (comprised of a double-beep followed by 5 seconds of silence) was authored in Audacity (Generate>Tone, Sine/440/0.8/0.4s, then manually copy-pasted to form the double-beep), then exported to my Android smartphone as an MP3, where I used Vanilla Music Player to listen to that ‘6 second’ track in a continuous loop for the duration of each subsequent capture (so I continuously listened to that beep track for a cumulative ~10 hours). Here is a screenshot of the audio track being authored in Audacity:
Later attempts used pretty much the same method with a few amendments:
- supplanted 1m power cables with 2m, to give greater range to the camera rig and therefore greater flexibility in its articulation (otherwise the rig movement was constrained by being tethered, via the 1m cables, to the heavy power packs which I was carrying on my person)
- supplanted the 32GB microSD cards with 64GB, thereby making it possible to stay in the field for up to 7 hours, in the process capturing over 100,000 photos, up from the 3.5 hours/50,000 5MP photo capacity of the 32GB cards.
The Howff can be intuitively divided into 10 ‘blocks’ with reference to its network of paths (at least, that is what I have been doing!). Photo capture proceeded with reference to those blocks, as in, for a given capture session I tried to capture one or two discrete blocks. Efforts were made during each capture session to try to divide up the data into manageable chunks, specifically to divide up each block into a set of sub-blocks.
The problem is that my desktop computer (with spec. equivalent to a mid-range gaming PC but with added RAM) can manage maybe 10,000 photos per 3DF file. (This constraint is determined by the system resources, specifically the RAM, used during the SfM stage, since both the Dense and Mesh stage in 3DF as of v5.000 are performed out of core). Even that is pushing it because it is more efficient to process those 10,000 as a set of smaller files, for example 5x 2,000 … at which point the problem becomes how to efficiently merge those 5*2,000s back into a unitary 1*10,000 (which, to be clear, is entirely possible, just more manual-labour intensive compared to the 1x 10,000 option). The way I was surveying the Howff meant that I was generating more than 15,000 photos per block.
With the benefit of hindsight, what I should have done in advance was develop an efficient method for splitting up the data during capture, for example by dividing up the area into a series of overlapping 3,000 photo patches. But I never managed to efficiently achieve that, and ended up resorting to capturing each block as a >15,000 photo set, which I then let 3DF Zephyr chunter through. To be clear, 3DF Zephyr is very capable of working with huge data sets (e.g. tens of thousands of photos), provided the spec. of the processing computer is up to the task. However I was working to a deadline, and due to a combination of the SfM processing duration and my sub-optimal data collection/organisation, I ran out of time for processing all the >150,000 photos that were captured over the course of the week.
Improvements (if I was doing it all again, here are some obvious opportunities for improvement):
- photo capture activated by button press, not timelapse script;
- camera rig attached to gimbal;
- tight adherence to a rigorous fore-planned capture trajectory;
- integration and synchronisation with a Raspberry Pi GNSS unit (e.g. Aldebaran by Dr Franz Fasching) or integration of GPS at least;
- produce a solid/static camera rig (e.g. via CAD design in Blender then 3D printing), to explore the feasibility of exploiting 3DF Zephyr’s ‘Fixed Rig’ function for facilitating the SfM stage;
- there are no doubt many other opportunities for improvement!
The 3D model
I have two goals for the final 3D model:
- high resolution 2D ground plan, to serve as a base layer in a 2D visualisation of the location of the ~1,500 funerary monuments that are present in the Howff;
- medium resolution 3D model, to serve as a base layer (skeleton, canvas) for attaching individual high resolution DSLR+ scans of each monument.
The Howff is an old burial ground in Dundee city centre that was sepulchrally active from medieval times until fairly recently, and is populated by around 1,500 separate gravestones and tombs. A graveyard is a microcosm of the built human environment, incorporating a mixture of organic and human-made structures and materials. As a technical challenge, the prospect of the digitisation of the Howff is a tantalising opportunity to experiment, research and develop a set of production methods that could be scaled up and applied to other human-nature composites, for example whole towns or cities.
By ‘digitisation’ is meant the reproduction of the physical world as a 3D model, but also – more significantly – the analysis and indexing of all the data embedded into or associated with that physical artefact. For example: any given gravestone is built to a specific design, incorporating named architectural features; the gravestones at the Howff often incorporate symbols that signify which of the ‘nine trades’ the grave occupant was associated with; the gravestone is inscribed with biographical information; and so on. In this sense, the 3D scanning of a real world subject such as Dundee Howff is only one part of a larger complex process, but hopefully a useful part that can facilitate the wider effort!