“If you torture the data long enough, it will confess to anything.”Ronald H. Coase – Essays on Economics and Economists. I also found this phrase listed as an idiom rather than a direct quote. For a full dive into Coase I can recommend reading into Coase theorem.
“Because our educational system is hung up on precision, the art of being good at approximations is insufficiently valued. This impedes conceptual thinking.”Ray Dalio – Writing in Principles: Life and Work; a book of tremendous value and boundless wisdom. In my past year commuting on the London Underground I have seen only one person immersed in a copy which seems at odds with my perception of the book’s value. Perhaps I’m on the wrong train.
“War is ninety percent information.”Napoleon Bonaparte – French Military and Political Leader. I’ve struggled to find a direct attribution for this popular quotation though I have happened upon a wonderful collection of his misquotations including the following sentences that are often wrongly attributed to the man “An army travels on its stomach.“, “No plan survives contact with the enemy.“, “An army of sheep, led by a lion, is better than an army of lions, led by a sheep.” and “Never ascribe to malice that which is adequately explained by incompetence.“.
“Errors using inadequate data are much less than those using no data at all.”Charles Babbage – Inventor of the difference engine. This genius and father of computing also has the following, equally suitable, quotations “At each increase of knowledge, as well as on the contrivance of every new tool, human labour becomes abridged”, “The successful construction of all machinery depends on the perfection of the tools employed; and whoever is a master in the arts of tool-making possesses the key to the construction of all machines… The contrivance and construction of tools must therefore ever stand at the head of the industrial arts” and (brilliantly) “On two occasions I have been asked, — ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question“.
“The only people who see the whole picture are the ones who step outside the frame.”Salman Rushdie – From The Ground Beneath Her Feet. A rather excellent expansion on the ‘think outside the box’ and ‘the spectator sees more of the game’ line of thinking that is beautifully pithy. I was lucky enough to spend a couple of hours listening to Mr Rushdi talk on his most recent book. The Q&A was mostly fatwarian in nature as you might expect.
“Throughout much of his career, he led a double life: as an intellectual leader in the ivory tower of pure mathematics and as a man of action, in constant demand as an advisor, consultant and decision-maker to what is sometimes called the military-industrial complex of the United States. My own belief is that these two aspects of his double life, his wide-ranging activities as well as his strictly intellectual pursuits, were motivated by two profound convictions. The first was the overriding responsibility that each of us has to make full use of whatever intellectual capabilities we were endowed with. He had the scientist’s passion for learning and discovery for its own sake and the genius’s ego-driven concern for the significance and durability of his own contributions. The second was the critical importance of an environment of political freedom for the pursuit of the first, and for the welfare of mankind in general.
I’m convinced, in fact, that all his involvements with the halls of power were driven by his sense of the fragility of that freedom. By the beginning of the 1930s, if not even earlier, he became convinced that the lights of civilization would be snuffed out all over Europe by the spread of totalitarianism from the right: Nazism and Fascism. So he made an unequivocal commitment to his home in the new world and to fight to preserve and re-establish freedom from that new beachhead.
In the 1940s and 1950s, he was equally convinced that the threat to civilization now came from totalitarianism on the left, that is, Soviet Communism, and his commitment was just as unequivocal to fighting it with whatever weapons lay at hand, scientific and economic as well as military. It was a matter of utter indifference to him, I believe, whether the threat came from the right or from the left. What motivated both his intense involvement in the issues of the day and his uncompromisingly hardline attitude was his belief in the overriding importance of political freedom, his strong sense of its continuing fragility, and his conviction that it was in the United States, and the passionate defence of the United States, that its best hope lay.”Marina von Neumann Whitman on John Von Neumann.
What data analysis is
Data analysis has been given an elevated position amongst the contemporary commercial disciplines. Though the analysts, data and questions are numerous the field itself seems oddly ill-defined. If two people were is asked to ‘analyse data’ it could not reasonably be expected that the outputs of their labour would be similar or comparable. This seems a strange state given the self-evident opportunities present in the conversion of data to information (and perhaps to insight) and the legions employed in this amorphous endeavour.
The ambiguity within the commercial ‘analytic’ world can be evidenced with only a brief search for ‘analyst’ job postings in, say, London wherein the analytical undertaking can mean anything from clerical work to large-scale strategic overhaul. Naturally there is a conflation between the convenience of the job title “analyst” and the actual act of analysing. These two elements do not necessarily overlap though the disparity and variation in the space seem to support the notion of widespread dubiety.
Given these starting parameters it seems reasonable to make an assertion of the primary models of data analysis within commerce (excluding R&D possibly) so that organisations might begin to share a common understanding of the work. The models proposed in this essay are all original and aim, in three parts, to address the role in generic terms. Just as a bricklayer might expect to arrive on site and contribute in a similar pattern regardless of the structure, substrate or supervisor these models should allow for a data analyst to turn up, fill herself with coffee, and work uninterrupted with the benefit of a common language of expectations between her and her commissioner.
In order for these models to function we must establish a quick summary of what we mean by ‘data analysis’. This list is shown below and has been kept as generic as possible to facilitate a wide-ranging application:
- Data analysis is the conversion of the unknown into the known.
- Data analysis is the aggregation of information to facilitate understanding.
- Data analysis is a support structure to decision-making.
The three models
The models presented here each provide a different view of data analysis and are to be treated as overlays of each other as opposed to alternatives to each other. Each model presents the task of analysing data from a different perspective and therefore the adoption of all three should provide a comprehensive approach to the domain.
Each model is independent and users are free to deploy and discard models as they fit. Though each one takes aim at a different element of data analysis it is natural that situational necessity will dictate the utility of each model. These models are, furthermore, likely incomplete and should be included within strategy as part of an array of methods and expectations.
The three models are as follows:
- The Machine Understanding Model
- This provides the analyst with a sense of purpose and reduces the total uncertainty experienced throughout the whole process.
- The Analysis Pathway Model
- This model provides the analyst with a checklist-style output register that will assist in the provision of complete analysis.
- The Data Defence Model
- This model asks the analyst to consider the output within the context of its presentation to the business in order to maximise robustness and utility.
Each model is explained in detail below with reference made to the diagrams provided. The diagrams are generic and serve as illustrations and aids to communication rather than strict, technical, blueprints.
The Machine Understanding Model
The Machine Understanding model proposes a five-step deepening of analytical effort that transports the analytical endeavour through the commissioning of analysis to the monitoring of the altered-state. In this case we imagine that there exists an element within an organisation, market or space that can be thought of as a machine with inputs and outputs. This machine may be improved in some way and the analysis is deployed in service to this improvement.
The ‘Investments’ shown here are simply the system inputs. This may be informational or financial but ultimately serve as a commitment of resources to an end. The ‘Outcomes’ are the desired outputs of the system. This may be revenue, customers, savings and so on. The diagrams show a flow of activity from left to right as the investments are transformed into desirable outcomes. The first diagram shows the initial conditions and the subsequent steps are labelled A through E.
A) Find the machine
Organisations are incredibly complicated. First and second order emergent properties are not only common but form the very bedrock of what we would usually perceive as simple. As a result the data analyst must first work to find the machine that she has been asked to study, understand where the edges of the machine are and understand the elements of the organisation or broader system that are not within this particular machine. This is, of course, a difficult and demanding occupation and should not be under appreciated; without a clear definition of the mechanistic scope a practical causal analysis cannot be undertaken within the practical limits of limited resources.
B) Identify known and unknown components
Within any complex system it is entirely possible that there exist multiple elements that must be considered by the analyst in order to fully understand the mechanism of the machine. It is, furthermore, quite likely that not all of these subcomponents will be fully explorable through data. This then establishes a plane of explorability throughout the subcomponent mix. Expression and quantification of this landscape is crucial at this stage as it will directly influence the viability of analysis and confidence in the remaining analysis.
C) Quantify component interaction
We now enter the exploration of causal influence and dependence. Provable, quantifiable interactions exist between subsystems and it is the analyst’s responsibility to define these relationships. Much of the analytical workload will focus here as the demonstration of influence is fundamental to an understanding of the mechanism in totality.
D) Model creative and exploitive alterations
Now the machine is understood and its mechanisms are quantified the analyst works creatively and collaboratively to model potential futures given certain manipulations of the machine. This process may be explorative or directed but will ultimately converge into an assertion of optimum state given potential alterations and known operation.
E) Measure machine performance
The final step assumes adoption of the most advantageous alteration. Within this element of the model the analyst is asked to monitor the actual activity of the mechanism against the forecast to both derive the efficacy of the implementation and the verisimilitude of the model. Within this model the analysts work is considered complete as an evidence-based change has been enacted within the organisation.
The Analysis Pathway Model
The Analysis Pathway Model charts an evolving list of outputs that the data analyst could reasonably be expected to produce. Each element of the model A through H is necessary to consider the analysis to be in any way complete. This is a model that has been in development for some time and should form the basis for most Commerical analyses.
Just as a house of cards collapses with the withdrawal of a single strut the model proposes a list of interdependent elements that span most analytical activity and the removal of any element allows for an entirely theoretical product. To remove a single point here is to introduce a potentially destabilising nature that may be unaddressed by the remaining elements.
The analyst deploying this model may wish to use this structure as both a backbone for the project plan and a delivery format for the final product, depending on the environment and audience, as the information flow has natural divergent, convergent format that facilitates productive communication.
A) Describe the data
The data landscape must be articulated by the analyst so that subsequent deductions are conducted within a clear and common Field of understanding. It is crucial that the available, and unavailable, data are described so that the following analytical effort can be fully understood. The analyst might, at this point, describe the, say, four data sets that will be utilised, where the data come from, the flaws within the data and the eccentricities of the information. Through this activity the remaining analytical effort is grounded within a known inertial frame of reference.
B) Show the aggregate
This stage represents the total analytical effort in too many cases. The statement of the mean, mode, harmonic mean… and so on, is the culmination of the thrifty analytical glimpse. Though the assertion of the aggregate, of one form or another, is plainly not without value this step must be deliberate in its descriptive statistics and it contextualisation within the subsequent steps in the model.
C) Show distribution
So the data collect around some point or another, but to what extent? This stage asks the analysts to describe the form of the distribution. This is, within the context of the model not just the step after C but rather a necessary component of B. Truly, the suggestion of an aggregate statistic must be balanced with some description of distribution in order to be considered useful. The challenge here is not the generation of the statistic but rather the communication of the significance of the measure.
D) Show comparisons
We now evolve the analysis beyond the first set of observations and contextualise our findings within comparable data points. This is where the analyst seeks a dimensionalisation of the initial findings. Examples of these dimensions might be temporal, geographical or segmental in some other fashion. I shall explore the dimensional analytical space in future articles as this topic demands a particular attention.
E) Show confidence
Assertion of confidence is necessary at this juncture as we are about to begin to assemble our findings. This step represents a moment to ensure that the error margins are established and quantified such that subsequent assertions are moderated in their potency given the analytical landscape. Without this step the recipients of the analytical products would be unable to ‘hedge-their-bets’ as is appropriate.
F) Derive meaning
It is at this point that the analyst is asked to extend their capacity beyond the mere presentation of fact but rather to synthesise the information into a coherent narrative. Within many organisations this step is regretfully ignored as analysts present data visualisations and ask for the commissioners of the analysis to draw their own conclusions. This model calls for the analyst to pursue the truth through the presentation of facts and present to their audience an assertion of meaning. It should be noted that these assertions are of course moderated via the confidence assertions made in the previous step.
G) Extrapolate future
Most analysis within commercial (and governmental perhaps) enterprises is deployed in decision-making processes. This then means that the findings produced by an analyst are indulged insofar as they are useful in the prediction of future conditions. The analyst must then take the opportunity to extrapolate the findings into the future so that the organisation is able to make optimal choices.
H) Articulate analytical opportunity
The model calls for, lastly, an articulation of the subsequent analytical opportunity. This is the moment that the analyst can pitch the absences and weaknesses of their own products in the context fo the total business opportunity. If the analysts has evidence of lacking data or transformational capacity then this moment can be utilised to build contextual requests.
The Findings Defence Model
The Findings Dence Model suggests that the analyst reflect on the way in which the analyical product is reveived at the conclusion of the activity. It is quite normal for data analysis to be greeted with a barrage of concerns and questions and this model isolates three areas of these for particular attention.
In this model we imagine that the analysis and findings exist as a central castle walled in by three defensive structures. Each structure is assualted by a particular concern and the analyst is asked to build the product in such a way to either defend against these misiles or deter the attack entirely.
It should be noted that this model posits the combative environment purely as metaphor and the manifest conflict is left to the users to determine. Just as the Human eye finds its resting place between two opposing forces this model hopes to balance the reasonable concerns of the organisation against the reasonable propositions of the analyst.
A) People looking for answers asking “So what?”
These people are keen to find the meaning in the analysis. The need here is to ensure that the synthesis of the data is meaningful and deployable. It is a shame that too many analysts present a selection of graphs to an audience without a series of express conclusions. The model suggests that unfulfilled recipients will require a conclusion from the analyst; this is entirely reasonable if the analyst has not taken ownership of the conclusion and considered the final empirical judgement to e part of their output.
B) People looking for problems asking “What about?”
“What about” is somewhat of a canary on the dysfunctional coal-mine. I would suggest that if you encounter such a conversation you refer to the essay on the dysfunctional measurement. Although this line of question i s normal within most organisations it illuminates either a failing in the analytical scope or a communication problem throughout the organisation. I, at the moment of conclusion, the analyst is presented with a new parameter… something has gone wrong.
C) People looking for action asking “Are you sure?”
Finally, we arrive at a relatively simple yet pervasive issue; the problem of confidence. Simply put, it is unreasonable for organisations to act on data Without some assertion of confidence. As a result, if these questions are being asked of the analyst it is clear that the confidence has not been articulated clearly.
The models provided here may assist in the formation of a general understanding of the nature and expectations of the discipline of data analysis within contemporary commerce. Though these models do not represent a comprehensive articulation of the discipline it is hoped that these three ideas might be humbly offered to the analytical masses and deployed to whatever degree is most appropriate.