samedi 29 août 2015

Microeconomics for big data: THE BOOK!


I have just uploaded my latest book on the Publisher's platform.
It is available soon from Amazon/Kindle Editions.

This book addresses with some equations, and the needed ecosystems/value chain description the microeconomics of data in the context of big data. Roles and relations in the data ecosystem, as well as business models for data transactions are presented, and discussed.
The book has been kept short, to allow for easier reading, and for everyone to form a sound opinion on how they want to proceed with the economic aspects of the data they own, use, or could sell.

jeudi 16 avril 2015

Big Data Economics for Transport


Data and Transport?

It all starts with maps, and continues with time tables of transport services (multi-modal...).
In other terms, geo-localised data, and time stamped data, can give raise to location and context dependent services, which are very valuable to users experiencing constraints:

  • being a passenger, you are bound to a train carriage, bus, aircraft
  • having to go from A to B, and reach B before time tB
  • constrained by a budget, not being able to spend more than b
  • well, being tired and hungry...

 There is definitely a lot of potential in data for/with transport:
-data supporting a smoother transport experience
-data generated with/in transport.

This leads back to the economics of data, framed here in the particular context of transport where value may become more easily salient to suppliers of service of/in transport, and users of transport who become consumers of service in transport.

The expected output data of a transport big data system may include:
-for the Traveller, Quality of Experience and Safety
-for the Transport Operator

  • Safety
  • Volume, cost, efficiency targets, in particular MTBF (Mean Time Between Failure), avoidance of delay due to “signal failure” = poor preventive maintenance

The expected usable input of a big data for transport system includes, for a better Operation:
-Traveler data
(Demand side)
-Operator(s) data
(Supply-side, including trade unions intentions to strike)


mercredi 25 mars 2015

The Economics of Data in IoT?

Starting with very different cases:



and developing the approach towards a Data Market Place



Big Data Economics: the book

vendredi 20 mars 2015

Advanced Data Structures:

representations & methods,

and their contribution to big data economics.



MIT professor Erik Demaine shows in public domain curriculum how forward time travel projection react with advanced data structures, and how past variations impact the present. Idem with space domain. This is a crucial variational view on big data representation in space and time. http://courses.csail.mit.edu/6.851/spring12/

Worthwhile analysing this Information Science modeling from a big data economics perspective: how would the price of data represented in structure considered evolve? How can the data structure support one or another business model over time? Over space? Over statistical propulations?Imagine you want to test the stability of a pricing model against changes. You can introduce variations in the past versions of data structures, and see how they propagate and impact the present data structures. This is already a degree of abstraction, a layer of business and pricing model (linked/on top of data structure) and and how it may evolve, subject to changes happening in the environment.
Worthwhile analysing further...
The MIT course is highly commendable, deep notions are presented in a very attractive way.

6.851: Advanced Data Structures (Spring'12) courses.csail.mit.edu

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible. MEMORY...

DETECTION



Avoiding or reducing the impact of human made or human-linked (epidemics) catastrophes like the Great Fire of London of 1666, or the last cholera epidemics in the same city, and protecting populations against known and monitored risks, depend on two data streams:
-monitoring and detecting events in real time
-warning in real time.

The value attached to these is the protection of lives and assets.
Insurance companies have their methodology to evaluate risks and acceptable costs to prevent or reduce those risks. They sell insurance products to individuals and organisations.

Governments have their own policy to manage, eliminate or mitigate risks. Politicians are judged on their ability to manage risk avoidance schemes, and when a catastrophe happens, on how they manage a crisis. Tuning the detection scheme pessimistically leads to overprotecting, and high cost for no added benefit. Tuning the detection scheme optimistically may overlook risky situations and under-dimension the response scheme. Instead of mathematical models of assumed probability distribution under this hypothesis, multiple scenarios have to be considered, and risk must be bounded with a lower and an upper bound, leading to mathematical inequalities and multiple stochastic models.

Railway companies use a standard model, jointly developed by them at ISO, where risk is categorised by the potential impact, the highest being many lives at risk.
They can build on a long history, which has led to safe railway journeys, with now very few accidents.
Here is a spectacular one of 1895, which claimed only one life:


http://en.wikipedia.org/wiki/File:Train_wreck_at_Montparnasse_1895.jpg

Source data: trading rights?

The article below addresses rights as they are observed in movies, and works or arts, exploring how the underlying concepts may apply for big data economics.

MOVIE RIGHTS

The movie business model can be summarised as a succession of windows of exploitation, and within each window rights can be sold and bought with conditions of use attached (time interval, geography, potentially number of users, platform of rendering). Typically a movie is released in one or a handful of Premiere cinema theatres, then in exclusivity to a number of cinema theatres in a given geography, then to all cinemas, then pay TV and packaged media like Bluray or online pay service, then broadcast commercial or public free to air TV.
[Of course it is more than this]

Assume an individual grants access to part of his/her personal data, say biological and health parameters. The data set could be accessed under a contract granting specified rights: scope and time window of use, with robust anonymisation requirements (say that the data has to be used as part of a set comprising at least xxx other subjects at each processing step).
Is this "right" business model, and its underlying organisation of the market place robust to:
-reselling data set for later use, within agreed scope?
-retrieving subjects for later negotiation of changed scope (e.g. a food&beverage company having interest to access a data set previously used for health analysis)
-auditing the proper use by processing companies and their clients
-inserting mechanisms for deleting data sets after "do not use after" date.

In the digital world, data can be reproduced at negligible cost, hence what matters is not the instantiation of a data parameter, but the source "blueprint", the equivalent of a manuscript and not of the thousands of printed books derived from this manuscript.


WORK OF ART


This leads us to the model of a work of art, say a Van Gogh painting.
The asset can be made available to museums for exhibition (use limited in time and geography, associated with an audience, or number of visitors, targeted or recorded). It can also be "transcoded" into different representations, as authorised photographs, reproductions, etc...
A single work of art can be valued over time as "junk", zero or little, and up to enormous values.
As a category, French Impressionists were often not valued in France, but had some early customers in the USA. Many years later, works despised earlier reached huge values at auctions.
However, data seen as information may be more valuable at an early stage of its life than at a later stage. A bottle of milk loses its whole value on the "best before" date: it usually gets heavily discounted on the day, and discarded at the end of that day.
News are normally expected to be fresh. For instance the current temperature is useful to me now, the 3-days weather forecast is of interest to chose clothes for a trip. After the trip, this past information has lost value. However, the long tail business model for the exploitation of entertainment content like movies or music recording, may also apply: the time series of the values of source data may be of interest as history, and based on history, some forecast estimates can be proposed (with associated uncertainty).

An electrocardiogramme database of people living in the 1950s may be interesting to revisit in the 2050s, hence it should not be discarded.
There is probably a distinction to be made between the value of some freshly acquired data, stored in cache memory, and the value of an archive.
Keeping and maintaining an archive has a cost, for instance transcoding from legacy formats and systems to current ones for new use.

**When ownership gets transferred**


*Smooth case
I recently moved offices, and did not want to move my desktop displays: I had been offered new displays in the new location. I just had to administratively transfer the ownership of the displays I was leaving to another department at that location. Done, moved on.



*What silver coins teach us
My grandmother gave me once a silver coin valued 5 French Francs. I thought she had given me 5 FR I could spend. She got into the habit of giving me more such coins, now and then, a few times per year.
I kept the coins in a drawer, thinking they were accumulated as pocket money does, and I could spend them when the occasion would occur. I remember that in those years you could get very nice vinyl records for 15 FR in a Montparnasse shop (central Paris).
Now we got to speak about it with my grandmother, and it became apparent that it was not the view nor intention of my grandmother that I would spend this pocket money. The coins were given to me for… KEEPING. She saw these as collection items. Silver coins with currency value were ambivalent: they could be seen as 5 FR, or as a weight of silver valued as such. It was not meant as a coin like any other,  but as a personally TRANSFERRED FROZEN ASSET from my grandmother as the GIVER to me as the KEEPER, not exactly the happy recipient.
Much later, I read about the Bretton-Woods agreement, and the following history of suspending the convertibility of currencies to gold, starting with the dollar in 1971. The veil of the money and the veil of metal convertibility of money, are wise explanations from economists for real microeconomic situations.
A main thing I would take for big data from this observed case, is that when transferring source data, from one producer or other seller to a buyer or user, the data as IT record of file is transferred, but it is also transferred with economic and contractual/legal expectations and rules of use. This also happens if the data is open or free.

*From music
Recorded music has shown us different patterns of transfer:
-open market, with competing publishers, and users able to shop around as I did in Montparnasse
-direct peer to peer online, possibly illegal
-closed market places as iTune, working as an integrated value chain.


*From organ donation
Organ donation brings us closer to personal data and parameters (such as biological measurements):
-one donates a part of their body
-one expects with right that this body part be used very carefully, with a genuine best effort to save someone’s life.
To avoid the grandmother’s syndrome above, organ donations are anonymised, except in obvious cases such as direct and immediate donation to a family member.
Well, closer even to real-time big data, flowing, blood donations are also very carefully managed, end to end.

*And now, for big data?
The question this leads us to is: what happens to data once it is transferred from one economic agent to the next? What “ownership” with rights and responsibilities gets transferred? What is then a fair transfer price?

By the way, as long as the source data does not result (at the processing stage considered) into an end-user service being sold, it is free of VAT J.

Big Data: source data as a tradeable commodity 



Big Data business builds on data as THE key input. This raw material, data sets, a new commodity, has received less attention from the Economists than raw material, called commodity. What can we learn from Commodity Trading? Market Places for Commodity, produced by farming or extracted by mining for instance, provide for a longer Economic History span than primary or source Data.
Cocoa has been studied by Economists, from the early use as a currency in pre-Colombian America, then an energy drink used by Spanish explorers, brought back to Europe and enjoyed as a precious drink from XVIth to XVIIIth century, then mixed with milk in a Swiss process from XIXth century.

Minerals extracted through mining give a good analogy for data coming from sensors. By the way the Schlumberger brothers were forerunners in exploratory big data targeting mineral resources.
The data quality, the authenticity of the source, the availability of data, are common features with raw minerals.

What can we learn from Commodity Industries and Commodity Trading? Extraction costs? Mechanisms determining their value?


Business books reviewed

 
Reviewing Economics & Business books, 

as input for Big Data Economics



Let me share with you some titles:

-1) Planet Google by Randall Ross
This book reviews the growth of Google, as an addition of goals followed through, starting with indexing the information of the Internet to make it searchable, and going through Youtube, Googlemap, etc...
The book is quite well structured, allowing to understand the systematic pursuit of business objectives rooted in facts (science & engineering, market). The exploration and chartering of the world's information, as completely as possible, performed by google is an amazing piece of work, and this book describes it very well.
Interesting read for anyone interested in the economics of big data, naturally...

-2) Googled, the end of the world as we know it, by Ken Auletta
This book has a very different approach to the one above. It is more a classical business story told well, with details of interest. I have liked the beginning where the author sketches a biography of the two founders, and their family background in advanced mathematics for one (father lecturing on Riemannian geometry, mother with advanced mathematics & biology degrees) and computer science for the other (two parents university professor & lecturer).

-3) An Introduction to Sustainable Development, by Peter P Rogers, Kazi F Jalal & John A Boyd
This book has a few chapters which connect well with the problems of starting economic analysis where market prices may not be available or not be the only criterion:
their chapter 9 on the economics of sustainability, chapter 10 on externalities, valuation and time externalities, and chapter 11 on natural resource accounting.
However, it is not a toolbox from which one can extract what we need for Big Data Economics, at best an eye opener, and an encouragement to develop models in certain directions, proven to be usable in a domain different from Big Data, with the commonality that it still has some "terra incognita" features yet to be explored and mapped.

-4) Fighting the banana wars and other Fairtrade battles by Harriet Lamb
This book may interest you because the Fairtrade scheme brings a new set of economic standards and criteria in the food market ( and other) arena: respect the planet, respect the people (producers or consumers), introduce sustainability and risk reduction in an otherwise fierce competition with commodity price volatility.
Why is it relevant to the analysis of economics for big data? For the reasons above, but also and probably more importantly because big data as source data (source data sets, flow) is the commodity of the digital age, and it is interesting to build on the experience gained in the area of physical commodities and ways to address their price volatility (and potentially chaotic availability depending on crops, good or bad weather, natural disasters).

-5) Marx, the key ideas, by Gill Hands ("teach yourself" series)
Do not smile before you know why I put this book here.
I started from the reflection that today the economics of the digital markets is governed by a production equation adding the costs of software to the costs of networks, and most of the time ignoring data costs or not paying much attention to them.
Marx may be criticised: his ideas may have led to human catastrophes. However as an economist he managed to convince everyone that Labour aspects (labour costs, workers' condition, etc) needed extra care in the age of the industrial revolution. He added Labour as a key variable into the production equation where other costs could be called "Kapital" and Assets.
Hence if we want to highlight "source data" in a context where "my software is valuable and your data needs to be free to me" or where yet another conflicting view says "my network is valuable, and your software needs to pay for consuming it", we may learn from how Labour as an economic parameter was recognised as a key driver of the coal & steam age.

jeudi 5 mars 2015


DATA OWNER

This description of a data "input" value chain assumes that data is owned by someone or by an organisation. The ISO-IEC JTC1 Study Group on Big Data has been very clear that there should be a universal attribute to data specifying its owner(s).

The data owner could be an individual: for instance, consider the case of personal data owned by a person. More broadly the data generated by objects owned by a person are likely to be owned by this person: for instance the current geographic position of my car. This means that there are expanding circles around people, with data in such circles. This creates a natural link across the areas of the Internet of People, where people communicate and interact with each other or with "the Internet", and the Internet of Things (IoT) with sensors and actuators, and machine intelligence all connected to serve (hopefully) the needs of humans.

It starts with the core, the body, with body area sensors, continues with anything wearable, and beyond to anything owned, within physical or virtual reach.
The data ownership could be shared by a group, call it social, with a defined aim, for instance producing CBPP (Common Based Peer Production) as in the world famous Wikipedia. Note that the P2P value project of the EU addresses the topic of organisation and mechanisms at play in CBPP, with over 300 such social groups studied.
An other interesting case of data ownership is the Industrial Internet, where companies generate for their own operation data, which they use internally (mostly), in schemes such as a supervised distributes system, using typically a control room. Today a telecom network, a transport network (railways in particular, but also metro, road air and sea transport), an energy network are in this category. Some subsets of such operations data may be eligible for the company to release it for specific use.

Data generated by wearable devices is also a category of interest to the business and consumer communities, with multiple purposes being envisaged already (sports, well-being, health, new forms of communications) and many more to come.

DATA COLLECTOR

This can be one of the few large Internet brands. This can also be any company in operations such as the ones above. This can also be an individual aggregating their own data in multiple ways, for multiple purpose: current and future (forensic data, the extension of the collection of post cards and pictures into the general data domain).
Governments are data collectors. Organisations: public or private, acting in pursuit of business or social goals are data collectors.
Even when the data is accepted as not being subject to a price tag, its use must conform to established rules and laws.

A data collector builds consistent and structured sets from individual potentially unstructured data vectors.
This entails quality control of the source, or to use other language the "veracity" of the information. 
The aim is to prepare the input of an efficient data processing.

DATA USER

A data user is typically an organisation or an individual performing analytics on data sets. For this purpose they need to either directly collect data sets, or buy rights to access such dats sets for their defined purpose and scope from data suppliers, which are the data collectors, or data brokers acting on behalf of the data collectors (retail role).
Data users needs data sets suitable for their need. This is the demand-side in economic terms, and the data collector or data broker is the supply-side.

Note that the use of data through analytics may lead to decisions, with in turn such decisions producing data sets in the command domain, for remote and distributed execution of such commands implementing the decision taken.
For instance real time systems with a feedback loop, also called automated systems, or optimal control, do not only "observe the world" through IoT sensors, but they act on the world through actuators, and supervised control, typically with supervision in a control room, as explained above.
SCADA systems (supervisory control and data acquisition) are an important case of operational data use.


DATA PATH
Naturally, when the data collector gathers data, and forms data sets, initial data D0 is transformed into D1 and the set accessed by a user for a specific purpose and scope is D1* (optimised or limited for this use).
Hence a data path from extraction, collection, shaping, homogeneising and fitting to user purpose.

PAYMENT PATH
Users purchase rights to use the data for their own purpose and scope, and payment flows possibly through the data collector, with part of the payment remunerating the data owners.
The organisation of payment and retail is being studied, and a publication addressing this subject is being prepared.

Copyright R. Di Francesco, 2015 

dimanche 1 février 2015

VALUE: Progressing the economics of source data

I will discuss briefly here how the fuel of any big data system, source data, can receive the needed attention, from an economic modelling perspective.
Interested readers are referred to:
Naturally, big data systems are engineered as Information Technology solutions, with the associated cost engineering, on a project by project basis. My assumption is that as big data becomes pervasive, the re-use of data and sharing across multiple use ranges, will make a lot of sense, and let the suppliers and users benefit from economies of scale and critical mass effects.

QUESTION 1: THE VALUE of data
The first question to ask is about the value: are these data I am using of value? What is the value of the data I have extracted? What would be the value of additional data, and where could I get them?
Economists following in the steps of Adam Smith distinguish two types of value:
  • the value of use
  • the exchange value

Obviously, a good which you need for a certain purpose, has a use value to you. If you are thirsty, you need water, for instance.
As for the exchange value, this is a good which you can sell, or buy, because it is traded, and comes with a price on a market. For instance cocoa was an asset used as money in Precolombian civilisation of Latin America. The classical example is diamond.
For data, let us give examples in each of these two categories of value:
-Use value
An automated system, supervised from a Control Room (say a train network, an electrical grid, a telecommunication network) uses for its own purpose industrial data. This data has an obvious use value, however the data owners seem to be little keen on releasing such data, even for a price, to third parties. This data use case is perceived to fit under a dominant "value of use" and not any identified "exchange value" yet.
-Exchange value
An entertainment content such as a movie, materialises (virtually :) ) as a file, which is a data-set. It has an exchange value: rights are sold to cinemas, TV channels, and end users, for viewing this content. In package media form (Bluray, DVD), it can be resold even, by consumers.
Naturally there are many more questions regarding "big data economics" and leading "towards data market places". The following book I have published recently on Amazon/Kindle Editions addresses some...

mercredi 14 janvier 2015

Debate on the Economics of Data, at TELECOM PARIS workshop on 12 January 2015

Here is a summary of the lively Debate on the Economics of Data hosted by Telecom Paris university on 12 January:



Industrial data?
Currently industrial data are part of closed systems, and the suppliers and users of such systems are very protective about sharing with third parties.
However, some intents are being made.

Luxury data?
Some data can be seen as Giffen good, where high price is expected and desired as part of the value proposition (luxury car, etc). Some "gem" data exist.
High value financial information is part of this category.
Beyond any open economy, State Security data has a somewhat comparable exclusive status.

Key sectors?
Business intelligence is a very active market for big data solutions already.
Health Care and Care for the ageing population is an other area, where big data solutions could:
-support the people in care
-support the carers
especially in the Ambient Assisted Living framework.

Data management?
This is a key question. In particular ensuring that data owners keep control of multiple, possibly cascaded use.

mardi 13 janvier 2015

Big data economics, 
Workshop at Telecom Paris Tech, 
12-1-2015, 
Paris, France



Event
100+ registered attendees
Introduction by Patrick Duvaut, director of research at Telecom Paris-Tech
Presentation by Renaud Di Francesco
Speech by Pierre-Jean Benghozi
Participation of Yves Poilane, director of Telecom Paris-Tech




Presentation summary

The scope of big data, is broader than business intelligence, and extending towards:
-real world to digital, analytics AND decision, feedback to real world
-real time

A change in needed technology portfolio is happening, beyond NoSQL and search technologies, with other technologiues determining success:
-signal processing
-maximum likelihood decision methods
-optimal control
-real time system engineering

The digital economy relies on three pillars, two of which have identified pricing schemes and economic mechanisms:
-software
-network
however, the third one, data, does not always have recognised value, and economic mechanisms.
For instance, what is the price of an electrocardiogramme as usable data? What is the price of my geographic position?

Nevertheless in some sectors and categories, data can have pricing schemes and economic mechanisms:
-content (e.g. movie) industry
-news
-loyalty schemes
-etc...

Starting from these chartered territories of big data, one can start considering adapted economic schemes for new data categories, which are not yet priced and covered by economic schemes.

The software licensing scheme offers a starting framework for data contracts, which cover rights on data.
The enforcement of rights is helped by Digital Right Management systems granting authorised access to the data.
The target for a data economy to work efficiently is the development of data market places, where data collectors, data owners, data users, and data processors, meet as data offer has to meet data demand.
The raw material or commodity market places established for physical goods give a reference framework from which data market places can be derived.
Moreover, in some categories, digital data market places are already in operation. For instance Getty Images buys and sells pictures, which are a special case of data.