Our common approach to exploring the opportunity with data science begins with an idea from a business or technology leader. The business has identified a use case and often a source of data. Your objective is to ‘build a model’ that demonstrates enough potential value with the long-term desire to establish a business case for more investment. Currently, over 75 percent of data science proofs of concept do not achieve a production implementation. The reasons for this failure rate are varied and our process for executing a proof of concept ensures the business objectives are aligned with the underlying starting components. Use Case, Data, Talent, and Technology are oriented to a fixed duration, fast delivery, and simple technology to expose the potential.
An effective data science process is one that ensures the highest likelihood of success for a data science project. To achieve this, it borrows from two methodologies; agile software development and the scientific method.
Define Use Case, Ingest, and Define data source – 1 week
- The first step in the data science process is to develop context around the problem and begin to shape viable use cases (this part should be deeply business focused and creates awareness around the problem to define what a successful outcome is)
- Deliverables: 9 Step Use Case Definition, Data Dictionary, Static data pipeline to ingest data, Basic Visualization
Explore Data and Hands-on Workshop – 1 week
- Exploratory Data Analysis (EDA) is where the data scientist will typically spend most of their time on any project. The goal is to develop a firm understanding of the data and its nuances and begin to consider how the data can be further enriched in order to be ready for predictive modeling. Raw data fed into a model will often create spurious, or even worse, misleading results.
- Deliverables: Data dictionary that captures the detail and distribution of each feature as well as the target objective. Correlations and relationships between data elements are explored.
Develop Preliminary Model – 2 weeks
- This is where the predictive modeling effort starts. Preliminary effort to build, analyze, and modify several models until the desired outcome is reached or the best possible result is achieved under the current data conditions. It is expected that the data science team will iterate between EDA, Data Transformation, and Modeling several times during this phase of the process.
- Deliverables: An in-depth presentation of the code and results from the candidate Model object and documentation outlining the rationale behind its selection, assessment, and development.
We look forward to helping you implement new, positive initiatives in your company.