There is huge interest in data mesh right now. In February 2022, I was at the Starburst virtual event and the Thoughtworks State of Data Mesh 2022 virtual event. They were both interesting and well-produced, looking at data mesh from different viewpoints. Here's one of my conclusions, with a historical parallel.
Data Mesh
Zhamak Dehgani, the originator of Data Mesh, gave the first presentation of State of Data Mesh 2022. She stated its four principles: domain ownership, data as a product, self-serve data infrastructure, and federated computational governance. Domain ownership puts the technology – particularly the data technology – in the business units. This can result in data siloing, which leads to the second principle, of data as a product. The third principle – self-serve data infrastructure – addresses the problem of cost that arises from having to have technical specialists in the business teams. The final principle, federated governance, is needed so that all that this can be done without compromising security and privacy.
Although it is often promoted as a technical solution, Data Mesh is really about the people, not the technology. Emily Gorcenski, Head of Data for Thoughtworks Germany, put this very well in a panel discussion. She said that everyone has bottlenecks in their centralised platforms, but these are not due to the technology, they are due to the ability of people to do the work. Data Mesh can solve the bottleneck issues but it's not about doing so by decentralising the platform, it's about doing it by shifting the responsibilities and the burden of ownership to the right parties.
Data Products
Let's look more closely at what this means in practice. It's all very well to talk about shifting the burden, but will the business departments accept it and be happy to take it on?
I am a data producer on a very small scale. Some years ago, my wife and I bought a field. It is a pleasure and privilege to be part of the rural ecosystem, but we don't do it for fun; it is a part of our pension arrangements, and we expect a return on our investment. I can't drive a tractor, but I do have one essential agricultural skill: the ability to fill in forms. I'm fortunate to know people with farming expertise, who can do the other jobs.
One of the forms I fill in records the nitrogenous fertilizer applied to the field. These records help the government monitor and control the build up of potentially carcinogenous nitrates in tap water. I fully sympathize with this, but it is calling for effort that doesn't contribute to my bottom line. I would in any case record purchases of fertilizer and the cost of applying it. I wouldn't record which packs of fertilizer are applied when, or how much nitrogen they contain, if it weren't for the Nitrate Vulnerable Zone (NVZ) regulations. They require a data model that I wouldn't otherwise use, and data that I wouldn't otherwise collect.
So I can sympathize with a department manager who is asked to take responsibility for a "data product" that will be used outside the department. It sounds harmless, but is likely to lead to requests for "Could you please include so-and-so?" and "Why can't you present it this way?" After all, if you supply a product, you should make your customers happy. That's part of the point of Data Mesh.
Product Responsibility
Most farmers think of NVZ records as a chore, and wouldn't produce them if it wasn't a legal requirement. That isn't because we are scared of being punished. I have never heard of a prosecution for breaking the regulations, and suspect that fines are very rare. It's because we understand that it is part of our job. Making it a legal requirement removes any confusion about this and ensures that, although there are always lots of other things to think about, the NVZ records actually get done.
In a corporate environment, it has to be clear that delivery of a data product is part of the responsibility of the department that owns the data. How this is made clear depends on the corporate culture. It could be a formal part of the salary reviews of the department managers. It could be a decision reached at an inter-departmental meeting. It could be an informal word from the CEO. It can't be just a request from the person or group that wants to use the data.
Central Governance
Data Mesh needs some form of central co-ordination, to establish responsibility for delivering data products, but data mesh enthusiasts keep talking about the need to get rid of central governance. What's the real issue here?
It's the data management equivalent of "No taxation without representation!"
On December 16, 1773, at Griffin’s Wharf in Boston, Massachusetts, British colonists in America dumped into the harbor 342 chests of tea, on which they were expected to pay tax. They were actually paying lower taxes than people in Britain, but the taxes were set by a parliament in London where they were not represented. They were protesting, not about having to pay tax on the tea, but about not being consulted on decisions that affected them.
Taking decisions without talking to affected stakeholders is never a good plan.
- It leads to bad decisions. The result of the decisions taken by the London parliament was that no tax was collected, as the Americans refused to import the tea. If they had been represented, the tax on tea might have been designed in a fairer way, and the British government might have collected some of it.
- It leads to decisions being ignored. The colonists could do without the British tea; they were drinking tea smuggled from Holland.
In data management, a decision that leads to a department spending many hours to produce data for a report that isn't read is a bad decision. A decision that leads to incorrect data being produced because the department doesn't spend enough effort on it could be worse. Shared understanding of the concerns of the producers and consumers of data helps enterprises avoid these mistakes.
Stakeholder Representation
The data mesh federated governance model includes representatives of affected departments in the decision process. It is in some ways similar to the idea of an Architecture Board, which has proved successful in many enterprises. Depending on the enterprise culture, there may be other organizational structures that work. The key point is that stakeholder concerns are addressed in taking the decision. This leads to better decisions, and to commitment by the people who carry them out.
Data products are a great idea for managing enterprise data but, while they may be free to their consumers, they aren't easy. Imposing them without consultation will lead to trouble.