Why Managing Data Science Like Engineering Leads to Failure
Why Managing Data Science Like Engineering Leads to Failure
In many organizations, there’s a tendency to manage data science projects using the same frameworks and expectations as engineering projects. This approach, while tempting, often leads to failure and frustration. The core issue stems from fundamental differences between these two disciplines. While engineering focuses on executing known solutions within defined boundaries, data science is rooted in experimentation, discovery, and often, unpredictable outcomes.
Here’s a deeper look at why these two approaches are incompatible and how to rethink your strategy when managing data science initiatives.
The Nature of Data Science: Known Scope, Unknown Solution
Experimentation and Discovery
At its core, data science involves solving problems using data. However, unlike engineering, where the solution is often clear from the outset, data science is inherently exploratory. In many cases, the exact path to solving a problem is unknown at the beginning. Data scientists need to experiment with different techniques, models, and datasets to discover what works. The process is iterative, and even when a model is found, it may need continuous tuning and validation before yielding useful results.
-
Unpredictable timelines: Because the solution is unknown, predicting how long it will take to find a viable outcome can be extremely difficult. You might spend weeks testing a hypothesis, only to realize it’s not viable and have to pivot to a new approach.
-
High uncertainty: Data science projects often start with a clear question but little idea of how that question will be answered. For example, predicting customer churn might be the goal, but the best algorithm or the features required may not be clear until the data has been explored extensively.
The Iterative Nature of Data Science
Unlike engineering, where the path to a solution is generally straightforward once the constraints are understood, data science is non-linear. It often requires multiple iterations of data cleaning, feature engineering, and model development. As a result, time estimates for completing a project can feel like “throwing darts in the dark.” You might hit the mark early, or you might need to explore numerous dead-ends before arriving at a functional solution.
- Testing multiple models: A key part of data science is experimentation. Data scientists may test many algorithms (e.g., decision trees, neural networks, random forests) before finding one that performs optimally on the problem at hand.
- Feature engineering cycles: Data scientists also spend a large amount of time identifying and creating useful features, which may involve many cycles of trial and error.
The unpredictable nature of these tasks means that project timelines are fluid and hard to pin down at the outset.
The Nature of Engineering: Known Solution, Unknown Constraints
Structured and Predictable Processes
Engineering, in contrast, operates on a more structured and predictable model. Once the scope of a project is well-defined, the solution is typically known. For example, if you’re building a web application or setting up an infrastructure pipeline, the architecture, tools, and technologies are usually pre-determined based on industry standards and best practices.
-
Known outcomes: In engineering, there is typically a clear understanding of the desired outcome. Even if challenges arise during the project (e.g., scaling issues, infrastructure limitations), the path forward is generally understood, and solutions can be quickly identified and implemented.
-
Clear timelines: Since the solution is known, engineering teams can create clear roadmaps, breaking down tasks into milestones. These milestones help maintain progress and ensure deadlines are met.
Unknown Constraints in Engineering
While engineering solutions are often well-defined, the constraints can be unknown. For example, an application might need to handle more traffic than anticipated, or certain integrations might present unexpected challenges. However, because the process is structured and the solution is defined, these constraints can typically be addressed in real-time without fundamentally changing the direction of the project.
In short, engineering focuses on executing a known solution and managing the constraints that arise during that process.
Key Differences Between Data Science and Engineering Projects
Aspect | Data Science | Engineering |
---|---|---|
Scope | Known scope, but unknown solution | Known solution, but unknown constraints |
Nature of work | Experimentation, discovery, and iteration | Structured, planned, and predictable |
Timelines | Fluid and unpredictable | Clear and measurable |
Process | Non-linear and exploratory | Linear, with well-defined steps |
Goal | Create insights, predictions, and models | Build systems, infrastructure, and applications |
Project management | Flexible, adaptive, and iterative | Rigid and structured with clear milestones |
Why Applying Engineering Management to Data Science Sets Projects Up to Fail
Misaligned Expectations
Managing data science with an engineering mindset often leads to frustration due to misaligned expectations. Project managers may expect clear timelines, structured tasks, and predictable results—characteristics of traditional engineering workflows. However, in data science, projects are often subject to change as new insights emerge and new hypotheses are tested. This misalignment can create a cycle of unrealistic deadlines, scope creep, and project overruns.
-
Pressure for early results: In engineering, it’s common to show early progress through clear deliverables. However, in data science, the first few weeks or months of a project might yield little in terms of tangible results, as the team is exploring data, testing hypotheses, and refining models.
-
Constantly shifting goals: Data science projects often require adjusting goals based on intermediate findings. This can create confusion if stakeholders expect a rigid adherence to initial plans.
Resource Misallocation
Applying engineering timelines to data science can lead to resource misallocation. Data science projects often need to pivot quickly based on experimental results, meaning that resources allocated to certain tasks might be wasted if those experiments fail. Engineering projects, on the other hand, can more easily plan resource allocation since the outcome and process are well-understood.
-
Over-planning: Trying to plan every detail in advance for a data science project is counterproductive. Instead, the focus should be on creating short, iterative cycles where feedback and new discoveries are rapidly incorporated into the process.
-
Fixed deadlines vs. flexible timelines: While fixed deadlines work well in engineering projects, they can be damaging to data science efforts, leading to incomplete or suboptimal models being rushed into production.
How to Manage Data Science Projects Effectively
Embrace Flexibility
Data science projects should be managed with flexibility at their core. Since the outcome is unknown, it’s important to adopt an iterative approach, allowing the team to explore different solutions and adapt as they learn from the data. Instead of rigid timelines, adopt a more agile methodology, where progress is measured in terms of experiments completed or insights gained rather than fixed milestones.
-
Use short sprints: Break the project into smaller, manageable sprints with clear goals for each iteration. These sprints might involve testing new algorithms, refining data pipelines, or experimenting with feature sets.
-
Regular check-ins: Frequent check-ins with stakeholders can help manage expectations and update them on the progress of experimentation.
Focus on Discovery, Not Just Deliverables
Data science projects should emphasize the process of discovery, not just the final deliverable. Stakeholders need to understand that the path to success is not always linear, and that the value of data science often lies in the insights gained during the experimentation process.
- Celebrate intermediate wins: Encourage the recognition of intermediate results, such as identifying key features or eliminating unviable models, even if the final solution is still in development.
Align Expectations Early
It’s essential to set realistic expectations with stakeholders at the start of any data science project. Make it clear that while the problem may be well-defined, the exact solution may take time to discover. Explain the iterative nature of the work and that adjustments will likely need to be made as the project progresses.
Conclusion
Treating data science like engineering is a recipe for failure. While engineering projects thrive on structure, known solutions, and clear timelines, data science projects require flexibility, experimentation, and a willingness to embrace uncertainty. By adopting an adaptive, discovery-focused approach to managing data science efforts, organizations can set their data teams up for success and unlock the full potential of their data-driven initiatives.