Pioneering the Data Production Line
Data is run like a factory in the pre-Ford era. What can we learn from how Ford changed manufacturing?
Last year, I commented in a LinkedIn post that I didn't understand why the data profession wasn't maturing as fast as comparable areas like software development. The responses showed that I wasn't the only person puzzled by this.
As a profession, data struggles with a lack of standard definitions for essential concepts like data product and data literacy and a recognised taxonomy of data roles (one person's data scientist is another person's statistician). In a quick LinkedIn poll, 90% of respondents think this is a problem.
Given how much time data professionals spend discussing the importance of data quality, we ought to be as embarrassed by this as a finance function that didn't know it had overspent its budget or an HR function that didn't have an HR training plan or job framework.
So what's the cause of this? Why is data behind software development where there are recognised standards, ways of working, team structures (or team topologies) etc.?
The starting point is to recognise that data didn't start on a greenfield site like software. People have always been processing, analysing and managing data in companies, even before computing. Pre-computing this was all done in paper records, and businesses of any size and complexity depended on proper filing and record management systems. Sadly, as computing took off, many of those disciplines disappeared and weren't replaced. As a result, most data sets are largely unmanaged. There are ERPs and well-managed systems, but they are the exception, and most data sets often sit outside them.
With this starting point, manufacturing is the best area for data to learn from. After all, there is a lot of discussion in data about moving to data products, so let's take lessons from more sophisticated production environments.
In manufacturing terms, data products are most definitely produced in the craft method, one by one, with a high degree of variability. That might be OK if we only want a small number of data teams producing a small number of data products. However, that isn't the case. As we enter the AI era, data teams in all business functions need to increase throughput and deliver against a growing range of use cases.
This brings us to this article's title; data is in a pre-Ford era. Hence, the header image (courtesy of Midjourney) of data engineers and scientists working in a 1920's factory.
Henry Ford moved automotive production out of craft methods and into mass production. He said that at the start, they "simply started to put a car together at a spot on the floor. Workmen brought parts to it as they were needed... The undirected worker spends more of his time walking about for materials and tools than he does in working."
Compare this to the data situation most data workers find themselves in. The majority of data sets they need are undocumented and often hard to liberate from out-of-date systems or, worse, spreadsheets. To rephrase Ford, the undirected data worker spends more of their time going between functions to find data or access the proper tooling than they do working.
Going back to the pre-Ford era, to start with, workers also spent time filing and grinding the parts so that they would fit together. The parts weren't engineered and manufactured to a high enough quality. Ford invested in precision tooling to solve this. Not only did it speed up production on the line, by removing the need for filing parts, but it was so revolutionary that Ford ended up using it in marketing campaigns "You might travel around the world in a Model T and exchange crankshafts with any Model T that you met en route". The idea of interchangeable parts was a revolution.
In what way are these issues different from today's data quality challenges? Instead of filing and grinding parts, data workers spend time wrangling data to make it fit together and into the desired use case.
So how did Ford progress? What he didn't do was rely on the manufacturers of tooling to come up with all the answers. He took advantage of the improved quality of the parts those tools produced to rethink and redesign the manufacturing processes completely. He experimented with different ideas. He tried moving from a team of 15 making a car to one person dedicated to making a car. Eventually, he landed on the innovation we are all familiar with, the production line and conveyors.
The changes Ford made reduced the time to make a model-T from 12 and a half hours to 93 minutes. That in a eightfold improvement in productivity.
What lessons should we take from this? We can't rely on the data technology vendors to create our production processes; it isn't their job. We need a CEO who is brave enough to radically experiment with different ways of structuring the data workforce and running the data processes in their organisation. The prize for the company that succeeds is huge. They will become the leader in AI adoption and achieve unparalleled efficiencies and agility, enabling them to lead in their sectors.
The role of the CEO in data won't end there, either. If we look at modern manufacturing organisations, the board remains a sponsor of manufacturing efficiency. The board discuss productivity metrics, and board members are often seen on factory tours where they ask the workers if they have the tooling they need, if there are issues with the processes and if they have had sufficient training.
If we professionalise data, all of this will need to be true for data too. We can start by inviting our board members to visit our data teams similarly. Not to attend glamorised presentations that show off overly-polished products and visualisations but to drop in on a sprint review and take part in honest discussions about challenges with data quality, tooling and investments in training.
Another lesson we can take from manufacturing is to involve analysts, data engineers, etc., in selecting and refining data tooling in the same way that factories test tooling on the line and involve factory workers in decisions. At the moment, data technology decisions are too heavily influenced by IT. IT has an absolutely vital role to play in the development of data production systems. I'll write more about this in future articles, but for the benefit of this article, their role is analogous to that of logistics and facilities in manufacturing. These are essential roles. They run the warehouse, ensure data availability from core systems, and set up and maintain the technology used in developing data products. However, in too many companies, IT is the only voice in procurement discussions on data technology. That is like facilities and maintenance teams deciding what tools are used in the factory. It just doesn't make sense.
Hopefully, this is a helpful analogy to explain why data is still such an immature profession. To mature the profession faster, we will need two things. We will need brave CEOs competing to become the Henry Ford of the data and AI era by experimenting with different processes and org designs for their data teams. We will also need to improve open innovation in the data profession by openly sharing learnings on data processes, roles and driving standards and common definitions. To move fast and incentivise collaboration, this needs to be led by a non-profit with no purpose other than professionalising data.
Please let me know your thoughts in the comments. And please get in touch if you know any CEO's that want to be the Henry Ford of the data and AI era or believe that data needs a non-profit to drive collaboration and help professionalise data.