Microsoft is releasing Team Data Science Process, an "agile, iterative, data science methodology to improve collaboration and team learning." The launch of the methodology is accompanied by a set of utilities meant to help companies better organize its data. You can check Microsoft's blog post regarding the details of TDSP, but you can get a pretty good idea of what the process is about through these key points:
- A data science lifecycle definition.
- A standard project structure, including a well-defined directory hierarchy and a list of output artifacts in a standard document template structure that are stored in a versioned repository.
- A shared and distributed analytics infrastructure.
- Productivity tools and utilities for data scientists. These simplify adherence to the process by automatically producing project artifacts and providing scripts for common tasks such as the creation and management of repositories and shared analytics resources.
To put things simply, this methodology (and the tools that go along with it) are meant to help teams organize data in a way that makes sense. Should TDSP prove to be effective, it can put an end to a whole lot of potential employee confusion regarding where and how data is accessed. Unsurprisingly, TDSP is an entirely open source solution and available right now on GitHub, so interested parties should definitely hop over to the official page and see if they can make any use of it.