Data INPUT: Collect or Buy ?


Collect or Buy ?

DATA INPUT: is the business worth it?

When decision is needed for a data driven project to go ahead, a business plan is made with:
-expected revenues from all services using the data input at stake
-complete playout of anticipated business launch and uptake, with scenarios, which determine the maximum acceptable price one would pay for the data input, either as:

-The cost of collecting the target data set
  • Specific sensor and infrastructure needed
  • Services consumed (network, installation, site visits)
-Alternatively, the price paid to a third party collecting or having collected the target data set

Act now or wait?

If you are in no hurry, Moore’s Law which drives the evolution of capacity/price in the digital domain, will reduce significantly over time the digital electronic/Information Technology/Network costs of getting the input data. In the long term, this cost decreases exponentially, whereas other costs involved in this, which are non-digital, brick-and-mortar, real-world dependent, may not decrease in the same way.

Competing firms

In the heat of competition, or with the pressure of needs, one may not have the time to wait for the cost of data input to decrease as above. Furthermore, competitors may not want to share critical data leading to competitive advantage in their view. I saw with my eyes the competing programmes of French Television not wanting to share the image capture and satellite uplink during the Euro92 event in Norway, leading to duplicated infrastructure and services operated for them by the leading French network operator.
In some cases, sharing is not allowed: Competition lawyers advise firms on Anti-trust rules for business touching the territory of the USA in any way, and EU Competition rules, or any other depending on the region where the distributed business takes place.

Sharing data

If data can be shared, and organisations are willing to share the same input data sets, economic agents in the data value chain participate to a data market place.
This can take in particular two forms:
-Open Data: where one collector, collects data sets on behalf of all, and grants access for free, however possibly with some contractual conditions for the allowed domain and scope of use.
-Commercial Data: a license is granted to data users by the data collector, with a fee, and contractual conditions of use.

 Further reading:


 Several detailed models, withe roles descriptions and equations, can be found here


Microeconomics for big data: THE BOOK!

I have just uploaded my latest book on the Publisher's platform.
It is available soon from Amazon/Kindle Editions.

This book addresses with some equations, and the needed ecosystems/value chain description the microeconomics of data in the context of big data. Roles and relations in the data ecosystem, as well as business models for data transactions are presented, and discussed.
The book has been kept short, to allow for easier reading, and for everyone to form a sound opinion on how they want to proceed with the economic aspects of the data they own, use, or could sell.


Big Data Economics for Transport

Data and Transport?

It all starts with maps, and continues with time tables of transport services (multi-modal...).
In other terms, geo-localised data, and time stamped data, can give raise to location and context dependent services, which are very valuable to users experiencing constraints:

  • being a passenger, you are bound to a train carriage, bus, aircraft
  • having to go from A to B, and reach B before time tB
  • constrained by a budget, not being able to spend more than b
  • well, being tired and hungry...

 There is definitely a lot of potential in data for/with transport:
-data supporting a smoother transport experience
-data generated with/in transport.

This leads back to the economics of data, framed here in the particular context of transport where value may become more easily salient to suppliers of service of/in transport, and users of transport who become consumers of service in transport.

The expected output data of a transport big data system may include:
-for the Traveller, Quality of Experience and Safety
-for the Transport Operator

  • Safety
  • Volume, cost, efficiency targets, in particular MTBF (Mean Time Between Failure), avoidance of delay due to “signal failure” = poor preventive maintenance

The expected usable input of a big data for transport system includes, for a better Operation:
-Traveler data
(Demand side)
-Operator(s) data
(Supply-side, including trade unions intentions to strike)


The Economics of Data in IoT?

Starting with very different cases:

and developing the approach towards a Data Market Place

Big Data Economics: the book


Advanced Data Structures:

representations & methods,

and their contribution to big data economics.

MIT professor Erik Demaine shows in public domain curriculum how forward time travel projection react with advanced data structures, and how past variations impact the present. Idem with space domain. This is a crucial variational view on big data representation in space and time. http://courses.csail.mit.edu/6.851/spring12/

Worthwhile analysing this Information Science modeling from a big data economics perspective: how would the price of data represented in structure considered evolve? How can the data structure support one or another business model over time? Over space? Over statistical propulations?Imagine you want to test the stability of a pricing model against changes. You can introduce variations in the past versions of data structures, and see how they propagate and impact the present data structures. This is already a degree of abstraction, a layer of business and pricing model (linked/on top of data structure) and and how it may evolve, subject to changes happening in the environment.
Worthwhile analysing further...
The MIT course is highly commendable, deep notions are presented in a very attractive way.

6.851: Advanced Data Structures (Spring'12) courses.csail.mit.edu

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible. MEMORY...


Avoiding or reducing the impact of human made or human-linked (epidemics) catastrophes like the Great Fire of London of 1666, or the last cholera epidemics in the same city, and protecting populations against known and monitored risks, depend on two data streams:
-monitoring and detecting events in real time
-warning in real time.

The value attached to these is the protection of lives and assets.
Insurance companies have their methodology to evaluate risks and acceptable costs to prevent or reduce those risks. They sell insurance products to individuals and organisations.

Governments have their own policy to manage, eliminate or mitigate risks. Politicians are judged on their ability to manage risk avoidance schemes, and when a catastrophe happens, on how they manage a crisis. Tuning the detection scheme pessimistically leads to overprotecting, and high cost for no added benefit. Tuning the detection scheme optimistically may overlook risky situations and under-dimension the response scheme. Instead of mathematical models of assumed probability distribution under this hypothesis, multiple scenarios have to be considered, and risk must be bounded with a lower and an upper bound, leading to mathematical inequalities and multiple stochastic models.

Railway companies use a standard model, jointly developed by them at ISO, where risk is categorised by the potential impact, the highest being many lives at risk.
They can build on a long history, which has led to safe railway journeys, with now very few accidents.
Here is a spectacular one of 1895, which claimed only one life:


Source data: trading rights?

The article below addresses rights as they are observed in movies, and works or arts, exploring how the underlying concepts may apply for big data economics.


The movie business model can be summarised as a succession of windows of exploitation, and within each window rights can be sold and bought with conditions of use attached (time interval, geography, potentially number of users, platform of rendering). Typically a movie is released in one or a handful of Premiere cinema theatres, then in exclusivity to a number of cinema theatres in a given geography, then to all cinemas, then pay TV and packaged media like Bluray or online pay service, then broadcast commercial or public free to air TV.
[Of course it is more than this]

Assume an individual grants access to part of his/her personal data, say biological and health parameters. The data set could be accessed under a contract granting specified rights: scope and time window of use, with robust anonymisation requirements (say that the data has to be used as part of a set comprising at least xxx other subjects at each processing step).
Is this "right" business model, and its underlying organisation of the market place robust to:
-reselling data set for later use, within agreed scope?
-retrieving subjects for later negotiation of changed scope (e.g. a food&beverage company having interest to access a data set previously used for health analysis)
-auditing the proper use by processing companies and their clients
-inserting mechanisms for deleting data sets after "do not use after" date.

In the digital world, data can be reproduced at negligible cost, hence what matters is not the instantiation of a data parameter, but the source "blueprint", the equivalent of a manuscript and not of the thousands of printed books derived from this manuscript.


This leads us to the model of a work of art, say a Van Gogh painting.
The asset can be made available to museums for exhibition (use limited in time and geography, associated with an audience, or number of visitors, targeted or recorded). It can also be "transcoded" into different representations, as authorised photographs, reproductions, etc...
A single work of art can be valued over time as "junk", zero or little, and up to enormous values.
As a category, French Impressionists were often not valued in France, but had some early customers in the USA. Many years later, works despised earlier reached huge values at auctions.
However, data seen as information may be more valuable at an early stage of its life than at a later stage. A bottle of milk loses its whole value on the "best before" date: it usually gets heavily discounted on the day, and discarded at the end of that day.
News are normally expected to be fresh. For instance the current temperature is useful to me now, the 3-days weather forecast is of interest to chose clothes for a trip. After the trip, this past information has lost value. However, the long tail business model for the exploitation of entertainment content like movies or music recording, may also apply: the time series of the values of source data may be of interest as history, and based on history, some forecast estimates can be proposed (with associated uncertainty).

An electrocardiogramme database of people living in the 1950s may be interesting to revisit in the 2050s, hence it should not be discarded.
There is probably a distinction to be made between the value of some freshly acquired data, stored in cache memory, and the value of an archive.
Keeping and maintaining an archive has a cost, for instance transcoding from legacy formats and systems to current ones for new use.