GeoSpock: making extreme data easy

Steve Marsh, founder and CTO of GeoSpock, talks to Sam Fenwick about how extreme-scale data platforms can be used to allow humans and AI to make sense of smart city sensors and improve our society

February’s big interview briefly discussed the lessons we can learn from biology and apply to smart city projects. Here we explore another link between the two as Steve Marsh, founder and CTO of GeoSpock, began his career in academia where he worked to develop a custom supercomputer to carry out real-time simulation of brain-scale neural networks. As he and his team were up against heavyweights like IBM, whose resources permitted a brute force approach, Marsh chose a different strategy, seeing the challenge as a communication problem. Eventually they were able to simulate one second of neural activity in one second of computer time, compared with two weeks for the competition.

Marsh then tweaked the algorithms to allow the simulation to run on commodity hardware, but then hit a brick wall. Enamoured by the rise of smartphones and their GPS functionality, Marsh expected that the world would make a “monumental shift” to machine-generated extreme-scale data, the majority of which would come from physical sensors, and if all this data could be understood, we might use it to change the physical world for the better.

He and his colleagues then spent six years “building a bleeding-edge spacial big data platform and, coupled with [our] knowledge of how [to] get supercomputer performance out of commodity hardware, it was quite a unique advantage in the market”.

Marsh is a science fiction fan, and when it came to naming his company, he turned to Star Trek for inspiration. “It’s entirely logical, of course. Spock was a scientist, but he was also an explorer. He’d beam down to an alien world, pull out his mobile device [tricorder] and, using a few sensors, would take in all of the contextual data from that environment and would instantly be able to understand what was happening and be able to make the smartest decision he could. That seemed like a logical thing that we should all aspire to have.”

Marsh adds: “The core of GeoSpock’s innovation is the way we approach, index, store and query spacial big data. Regardless of the size of the dataset (we’re already operating at the petabyte scale, with the aim to [soon] operate at exobyte) we always get sub-second insights of it.” As it was difficult for potential investors and customers to get excited about “indexing strategies in next-generation spacial big data platforms”, GeoSpock hired a team of ex-computer game developers to give its platform the ability to visualise the data it curates and manages, allowing humans to validate it, derive insights and easily spot anomalies. The underlying engine is running on Amazon Web Services on commodity hardware. The system’s main appeal stems from “programmatic access” and how this is “going to be used by potentially thousands of applications across IoT, smart cities, maritime, logistics, [etc]”.

Further reading:
GeoSpock and Smart Cambridge form commercial smart city partnership
Smart cities: people first, technology second
Connexin: from medicine to smart cities

Smarts, not brawn
Marsh explains that current AI and data processing models are very naïve in that they look at all the data they are supplied with and brute force their way through it. However, this approach “doesn’t scale, because as the data size increases, the compute cost increases along with it”. GeoSpock has found a way around this by allowing AI developers to narrow down the data that their models will be trained on. “If you’re training an AI model, you may have the entire world and its history of data, but that AI may only really need to understand what’s going on at a certain place at a certain time. For example, if we are helping an autonomous vehicle company train an AI model, they may wish to just pull out the data of a certain crossroads in rain, hail and snow at dusk, in which case the input subset of data is significantly smaller than the total dataset.”

Marsh highlights the fact that GeoSpock has recently raised£10m in additional funding and adds: “We’re in the process of scaling up the team, so we doubled the size of the company over the past 12 months, we’re up to about 56 people now. We’re working [on] potentially some [of the most] interesting use-cases I’ve ever come across. We’re kind of the first platform that has a spatial big data platform with the ability to process petabyte and upwards-scale datasets, and the number of those companies that are ready to harness that power is quite small at the moment…[so] there’s [lots] of opportunities to be had. [GeoSpock doesn’t] generate data. We don’t even like housing data, we do that on behalf of some customers, but the data we house is generally encrypted, so even we can’t look at it – we’ve done some things around New York taxi analysis and open source datasets in the UK to demonstrate the capability.”

Marsh says GeoSpock is helping the city of Cambridge de-silo the data it is collecting from around 85 different data sources, including smart traffic lights and streetlights, Bluetooth flow analysis for traffic, along with microclimate and pollution sensors, plus crime and high-level public health data.

“The amount of insights you can get out of [any one of them] is fundamentally limited. [We] allow them to de-silo that data, we ingest it all and index it; it doesn’t have to be in our system, it could be in their own virtual private cloud. We [can then] start correlating across those datasets and [as] none of the data has to be re-indexed, you can keep growing each dataset independently and you can keep adding new layers and new sources of data. It’s only [when the system is queried that it goes] off and does the correlations, so we’re able to do quite complex root-cause analysis.”

Marsh gives an example of the kind of insight that the system can generate. “Cambridge is a cycling city, so we can say ‘OK, when it rains in Cambridge, traffic will go up 300 per cent because people in these particular regions switch mode of transport from bikes to cars’.” This increases congestion and pollution, and GeoSpock has found that extensive lines of trees by the roadside “exacerbate that situation because they can lock in pollution and people who live in those slow-moving traffic streets with a tree line are more likely to suffer respiratory illness. Each one of those datasets is completely uncorrelated, but we can bring them all together and do that root-cause analysis.”

He adds that it is possible to go beyond this kind of root-cause analysis to use such a system to optimise a city’s traffic flows, for example by making adjustments to mitigate rush hour congestion and in the future incentivise people to use shared autonomous vehicles. “Rather than trying to make the vehicle intelligent, we’re trying to send intelligence to the vehicle – [the historical and wider context]. We see ourselves as more of the hive mind, the queen bee [that makes] sure that all the drones are operating for the good of the whole colony rather than each one being selfish.”

More light!
In Cambridge, Marsh says GeoSpock has overlaid the city’s open source national crime data with the locations of its street lights and correlated the two – “You can see where a streetlight is present and on, which is quite important, especially at night; the rates of certain types of crime drop drastically – so a way to make areas safer, one really simple solution would be to put more lights up.”

Turning to the maritime sector, Marsh says GeoSpock can generate insights around things such as delays and piracy. “If you notice that [an oil tanker] is going to be delayed, you can [do] forward analysis [and work out the knock-on effects to things] like the manufacturing of certain plastics – perhaps that means the amount of shampoo bottles that get made, [and that] might affect P&G’s quarterly profit in six months’ time.”

Marsh says GeoSpock is working with location-based mobile advertising companies and that together they can determine the demographics that visit certain locations, and “start bringing the type of insights that companies like Amazon are generating online to bricks-and-mortar stores [and] allow them to cater to their audience more effectively”. He adds it is also possible to use this approach to detect when places become “cool and hip” and changes in customer footfall across different zones of a city.

On the cellular network side, Marsh adds that GeoSpock’s platform could be used in combination with smart city data to help mobile network operators better optimise their use of spectrum and beamforming.

Here at Land Mobile, we often joke about solutions looking for problems. It’s refreshing then to hear about a solution to a problem we didn’t know existed. It will be interesting to see how our cities change in the years to come thanks to the insights gleaned from AI and the big data platforms they are trained on.

CV – Steve Marsh
A technology entrepreneur, Marsh graduated from The University of Cambridge with a PhD in Computer Science in 2013. Marsh’s PhD research led him to build custom supercomputer architectures for the real-time simulation of human brain function. It was Marsh’s extensive PhD research that inspired GeoSpock’s technology.

An Information Age UK Data Entrepreneur of the Year 2017 winner, Marsh
is a member of Forbes’ 30 under 30 2016, and an alumnus of the Techstars Winter 2014 cohort in London.

While reading his PhD at Cambridge University, Marsh founded Collide, a location-aware mobile application (iOS and Android), winning both the ‘Silicon Valley Comes to the UK’ Cambridge Appathon, and Cambridge University Entrepreneur of the Year 2012. Marsh also holds an MEng in Computer Science from The University of Manchester.