Big Data Economics asks the unasked question of: who owns the data, the rights on the data, and how are these rights enforced or enforceable? What price could be reasonably allocated to using data rights? How could platform recognise and remunerate data owners?
When decision is needed for a data driven project to go ahead, a business plan is made with:
-expected revenues from all services using the data input at stake
-complete playout of anticipated business launch and uptake, with scenarios, which determine the maximum acceptable price one would pay for the data input, either as:
-The cost of collecting the target data set
Specific sensor and infrastructure needed
Services consumed (network, installation, site visits)
-Alternatively, the price paid to a third party collecting or having collected the target data set
Act now or wait?
If you are in no hurry, Moore’s Law which drives the evolution of capacity/price in the digital domain, will reduce significantly over time the digital electronic/Information Technology/Network costs of getting the input data. In the long term, this cost decreases exponentially, whereas other costs involved in this, which are non-digital, brick-and-mortar, real-world dependent, may not decrease in the same way.
In the heat of competition, or with the pressure of needs, one may not have the time to wait for the cost of data input to decrease as above. Furthermore, competitors may not want to share critical data leading to competitive advantage in their view. I saw with my eyes the competing programmes of French Television not wanting to share the image capture and satellite uplink during the Euro92 event in Norway, leading to duplicated infrastructure and services operated for them by the leading French network operator.
In some cases, sharing is not allowed: Competition lawyers advise firms on Anti-trust rules for business touching the territory of the USA in any way, and EU Competition rules, or any other depending on the region where the distributed business takes place.
If data can be shared, and organisations are willing to share the same input data sets, economic agents in the data value chain participate to a data market place.
This can take in particular two forms:
-Open Data: where one collector, collects data sets on behalf of all, and grants access for free, however possibly with some contractual conditions for the allowed domain and scope of use.
-Commercial Data: a license is granted to data users by the data collector, with a fee, and contractual conditions of use.
This book addresses with some equations, and the needed ecosystems/value chain description the microeconomics of data in the context of big data. Roles and relations in the data ecosystem, as well as business models for data transactions are presented, and discussed.
The book has been kept short, to allow for easier reading, and for everyone to form a sound opinion on how they want to proceed with the economic aspects of the data they own, use, or could sell.
It all starts with maps, and continues with time tables of transport services (multi-modal...).
In other terms, geo-localised data, and time stamped data, can give raise to location and context dependent services, which are very valuable to users experiencing constraints:
being a passenger, you are bound to a train carriage, bus, aircraft
having to go from A to B, and reach B before time tB
constrained by a budget, not being able to spend more than b
well, being tired and hungry...
There is definitely a lot of potential in data for/with transport:
-data supporting a smoother transport experience
-data generated with/in transport.
This leads back to the economics of data, framed here in the particular context of transport where value may become more easily salient to suppliers of service of/in transport, and users of transport who become consumers of service in transport.
The expected output data of a transport big data system may include:
-for the Traveller, Quality of Experience and Safety
-for the Transport Operator
Volume, cost, efficiency targets, in particular MTBF (Mean Time Between Failure), avoidance of delay due to “signal failure” = poor preventive maintenance
The expected usable input of a big data for transport system includes, for a better Operation:
(Supply-side, including trade unions intentions to strike)
MIT professor Erik Demaine shows in public domain curriculum
how forward time travel projection react with advanced data structures, and how
past variations impact the present. Idem with space domain. This is a crucial
variational view on big data representation in space and time. http://courses.csail.mit.edu/6.851/spring12/
Worthwhile analysing this Information Science modeling from
a big data economics perspective: how would the price of data represented in
structure considered evolve? How can the data structure support one or another
business model over time? Over space? Over statistical propulations?Imagine you
want to test the stability of a pricing model against changes. You can
introduce variations in the past versions of data structures, and see how they
propagate and impact the present data structures. This is already a degree of
abstraction, a layer of business and pricing model (linked/on top of data
structure) and and how it may evolve, subject to changes happening in the
Worthwhile analysing further...
The MIT course is highly commendable, deep notions are
presented in a very attractive way.
6.851: Advanced Data Structures (Spring'12)
TIME TRAVEL We can remember the past efficiently (a
technique called persistence), but in general it's difficult to change the past
and see the outcomes on the present (retroactivity). So alas, Back To The
Future isn't really possible. MEMORY...
Avoiding or reducing the impact of human made or
human-linked (epidemics) catastrophes like the Great Fire of London of 1666, or
the last cholera epidemics in the same city, and protecting populations against
known and monitored risks, depend on two data streams:
-monitoring and detecting events
in real time
-warning in real time.
The value attached to these is the protection of lives and
Insurance companies have their methodology to evaluate risks
and acceptable costs to prevent or reduce those risks. They sell insurance
products to individuals and organisations.
Governments have their own policy to manage, eliminate or
mitigate risks. Politicians are judged on their ability to manage risk
avoidance schemes, and when a catastrophe happens, on how they manage a crisis. Tuning the detection scheme pessimistically leads to
overprotecting, and high cost for no added benefit. Tuning the detection scheme optimistically may overlook
risky situations and under-dimension the response scheme. Instead of mathematical models of assumed probability
distribution under this hypothesis, multiple scenarios have to
be considered, and risk must be bounded with a lower and an upper bound,
leading to mathematical inequalities and multiple stochastic models.
Railway companies use a standard model, jointly developed by
them at ISO, where risk is categorised by the potential impact, the highest
being many lives at risk.
They can build on a long history, which has led to safe
railway journeys, with now very few accidents.
Here is a spectacular one of 1895, which claimed only one
The article below addresses
rights as they are observed in movies, and works or arts, exploring how the
underlying concepts may apply for big data economics.
The movie business model can be summarised as a succession
of windows of exploitation, and within each window rights can be sold and
bought with conditions of use attached (time interval, geography, potentially
number of users, platform of rendering). Typically a movie is released in one
or a handful of Premiere cinema theatres, then in exclusivity to a number of
cinema theatres in a given geography, then to all cinemas, then pay TV and
packaged media like Bluray or online pay service, then broadcast commercial or
public free to air TV.
[Of course it is more than this]
Assume an individual grants access to part of his/her personal data, say
biological and health parameters. The data set could be accessed under a
contract granting specified rights: scope and time window of use, with robust
anonymisation requirements (say that the data has to be used as part of a set
comprising at least xxx other subjects at each processing step).
Is this "right" business model, and its underlying organisation of
the market place robust to:
-reselling data set for later use, within agreed scope?
-retrieving subjects for later negotiation of changed scope (e.g. a
food&beverage company having interest to access a data set previously used
for health analysis)
-auditing the proper use by processing companies and their clients
-inserting mechanisms for deleting data sets after "do not use after"
In the digital world, data can be reproduced at negligible cost, hence what
matters is not the instantiation of a data parameter, but the source
"blueprint", the equivalent of a manuscript and not of the thousands
of printed books derived from this manuscript.
WORK OF ART
This leads us to
the model of a work of art, say a Van Gogh painting.
The asset can be made available to museums for exhibition (use limited in time
and geography, associated with an audience, or number of visitors, targeted or
recorded). It can also be "transcoded" into different
representations, as authorised photographs, reproductions, etc...
A single work of art can be valued over time as "junk", zero or
little, and up to enormous values.
As a category, French Impressionists were often not valued in France, but had
some early customers in the USA. Many years later, works despised earlier
reached huge values at auctions.
However, data seen as information may be more valuable at an early stage of its
life than at a later stage. A bottle of milk loses its whole value on the
"best before" date: it usually gets heavily discounted on the day,
and discarded at the end of that day.
News are normally expected to be fresh. For instance the current temperature is
useful to me now, the 3-days weather forecast is of interest to chose clothes
for a trip. After the trip, this past information has lost value. However, the
long tail business model for the exploitation of entertainment content like
movies or music recording, may also apply: the time series of the values of
source data may be of interest as history, and based on history, some forecast
estimates can be proposed (with associated uncertainty).
An electrocardiogramme database of people living in the 1950s may be
interesting to revisit in the 2050s, hence it should not be discarded.
There is probably a distinction to be made between the value of some freshly
acquired data, stored in cache memory, and the value of an archive.
Keeping and maintaining an archive has a cost, for instance transcoding from
legacy formats and systems to current ones for new use.