We talk with Alex Rutherford, Freelance data scientist and researcher obout his work developed at UNICEF Innovation. Alex works in the intersection of research, data science, development and data privacy. He was a Physicist in the past, learnt Arabic and worked in the Middle East, he also spent several years researching how information spreads on social networks and I worked in the UN Secretary General’s executive office on data science. As he described himself “I’m not exactly how to describe what I do beyond saying it’s strongly inter-disciplinary. What I do know is that I’m super lucky to be able to make a living by applying all of this knowledge to help vulnerable populations through the use of novel sources of data and technical methodologies”.
Alex always enjoyed technical work involving computer programming, statistics and AI but he often felt that a lot of the world’s smartest people ended up on things that had little immediate impact. “I wanted to see as much of the world as possible and hopefully leave it in a better state than I found it. The path to do that was not always clear to me so I am very happy that I can work in using data science for development”.
1 Can you tell us how UNICEF’s Innovation Unit was born and what the idea is behind it?
The Innovation Unit exists to mainstream new disruptive technologies into UNICEF. As such this requires a lot of specific skillsets that aren’t always found within the UN system, in addition the skillset that is required is constantly changing, so it’s a very agile institution. The Innovation Unit was founded back in 2007 and I joined in 2015.
2 How did you help identifying, prototyping, and scaling technologies and practices that strengthen UNICEF’s work for children?. Can you tell us any success stories?
My work as a research scientist involves using novel streams of data, invariably from people’s use of digital services such as social media or cell phones. This makes use of cutting-edge techniques from network science, artificial intelligence and data science that are well defined in academia and used by businesses in the Global North, but have great potential to be repurposed to help UN agencies pursue its mandate more efficiently and in a more timely fashion.
A good example of this is the use of travel patterns captured by travel intelligence company Amadeus. Air travel is the main way that diseases spread leading to pandemics such as the Zika virus. By combining data on previous confirmed cases and mosquito prevalence with these travel patterns we can make predictions of high-risk places through epidemiological models. Amadeus approached the Innovation Unit regarding a partnership.
While these partnerships between big organisations and UN agencies are very welcome, they can take a long time to set up and it can be hard to get ahead of a pandemic. In addition, donations of data require lots of physical, human and analytical infrastructure in the form of cloud servers to store data, technical personnel who can clean and analyse that data and find insights that make sense to program officers within affected countries.
More broadly these kinds of partnerships raise a lot of questions about data privacy, the value of data and how to balance these through aggregation and access control. In order to make use of all of this data from private companies and scale data science within UNICEF beyond these one-off efforts, we began to develop a platform with the codename Magic Box. This is a system for storing, analysing and visualising and combining various streams of data in a privacy preserving way. That effort is growing as more data and partners are added to the initiative and we find new areas of application.
3 You are a research scientist, How did you join the UNICEF team?
I was previously working for Global Pulse, an initiative in the Executive Office of the Secretary General (at the time the position held by Ban Ki Moon, and presently Antonio Gutierres). This was a wonderful experience that gave a great high level and top down exposure to the whole UN system, from the World Health Organisation to the Universal Postal Union; the UN agency for postal mail! However, I was attracted by the quality of the data science work set up by my colleague Manuel Garcia-Herranz over at UNICEF and the opportunity to work for an organisation with such a strong reputation and mandate attracted me.
4 In your opinion how can the scientific use of open and free technology (Data and Source) contribute to protection of vulnerable populations?
The remarkable thing to me is the disparity between the sophistication of the tools and data available if you want to solve a problem like getting people to spend more time on your iPhone app compared to a programme officer in the Ministry of Education in a low income country looking to build a set of new schools in the most optimal location. Public sector entities see ‘data’ exclusively as manually collected survey data that takes a long time to compile and is expensive to collect.
It is vital that these two visions of data move closer to each other. To do that, we need to surface everything that we already know so others can build upon it to tackle the problems that matter for vulnerable populations. That means, for example, making the most up to date survey data available in a format (please no more data in PDFs!) that can be easily used to test remote approximation of census and poverty through satellite imagery.
5 You work with data for development. Which specific projects are the UNICEF team developing facing the Sustainable Development Goals?
One of my main directions has been looking at the data that UNICEF does have access to: documents! Lots of documents in the form of speeches, reports, memoranda and so on. These documents are written so that the UN can be transparent about how public money is spent, as it rightly should. However few people read more than one or two of them, let alone all of them to get a bird’s eye view of what is happening, even though they contain a lot of useful information.
This is where data science can really add value, computational methods make it easy to get a computer to ‘read’ hundreds of documents very quickly and extract patterns that that you wouldn’t necessarily be able to see reading yourself.
To demonstrate this idea, we took the constitutions of every country in the world and fed them into a computer to analyse how they are written, how they change and, most importantly, how laws affecting vulnerable children are incorporated. What we found was that social rights for marginalised groups are not adopted straight away, there is a logical and incremental ordering. For example, once adults are protected through trade unions, then child labour protections follow.
This has a lot of implications for how we think about advocacy of young countries and aligns with SDGs #4 and #12 (Quality education and Responsible consumption and production). Rather than giving young countries a full set of laws that should be incorporated from day one, we should recognise that rights tend to be adopted more naturally in a sequential order. We are also looking at how we can extend this idea to how companies encourage each other to adopt responsible policies towards employing children as the next level of analysis down from nations.
6 How can Big Data impacts in identifying innovative solutions that benefit disadvantaged children populations?
The main catch with ‘Big Data’ is the problem of representativity; if we measure the proportion of people talking about vaccine efficacy on social media, is that just the view of rich people? I think of Big Data sources on a spectrum of representativity. One on extreme there are social media platforms that are English and only work on smart phones; this only has penetration in rich populations in major cities.
Somewhere in the middle you have anonymised cell phone data that covers an ever increasing part of the population as prices for plans and handsets go down; now only parts of the country that lack coverage are excluded. Then somewhere at the other extreme you have satellite imagery and other remote sensing tools that really don’t discriminate based on wealth.
The point I am trying to make here is that we should approach these kinds of problems with humility and not impose technologies or solutions that made sense in another context. A big part of our philosophy, enshrined in the principles of innovation, is to design with the user and Big Data is no different.
7 Can you share with us positive results your research has reached so far?
When I was working at Global Pulse, part of my job was to take stock of the data available in the UN system. Not many people have heard of it but the Universal Postal Union is a UN agency dedicated to national postal systems; and it’s over 100 years old! As a result, UPU has a very rich dataset on postal flows between countries. We viewed this as a network of flows between countries and using concepts from network science, we were able to proxy socio-economic indicators like the Human Development Index.
This holds a huge amount of promise when we consider the huge challenge of measurement that the SDGs represent: 12 goals, 169 targets and 304 indicators to be measured per country. In particular, we need innovative ways of measuring important indicators, particularly in data poor countries as I mentioned above.
8 As an expert in Public Policy Research: Do you think Governments are ready for radical innovation?
If they aren’t ready now, they soon will be. One of the challenging aspects, as a scientist, of applying science to policy problems is that policy is changeable and sometimes unpredictable. A good idea could succeed or fail depending on electoral cycles, organisational timeframes, funding cycles or other factors, which is anathema to the idea of science being pure and objective; a good idea should always be a good idea! We have to accept that innovation means doing something new and that can be uncomfortable.
Moreover, data science requires us to adjust how we think about personal data which can also challenge our norms. That said, increasingly governments recognise that they cannot bury their heads in the sand and that the benefits clearly outweigh the risks.
9 Invite Citizens and Govs to understand how can Data Science Generates Public Good.
This is truly a wonderful time for data science, many of the barriers for learning have disappeared. Huge amounts of open data are freely available, online courses allow you to learn statistics or programming and servers can be spun up for a few dollars. The main challenge is knowing where to start! My message to governments is to consider hiring more staff that are technical.
Policy makers have to judge on extremely complex issues and it is time to bring those expertise in house in the form of professional training in data science, Chief Data Officers and teams of data scientists. Data describes the world we live in, and having data literacy encourages us to be more inquisitive and responsible.
Notes and references / Notas y referencias
UNICEF magic box promo: https://www.youtube.com/watch?v=TF-1dP0IW6o