This document describes the principles of data sharing held by the FAASG consortium. This document is subject to approval by the FAASG steering committee. Any queries about this document should be sent to firstname.lastname@example.org.
FAASG recognizes that quickly sharing the data generated by the consortium with the wider community is a priority. Rapid data sharing before publication ensures that everyone can benefit from the data created by FAASG and can take advantage of improved understanding of the functional elements in these animal genomes to aid their own research.
All raw data produced for a FAASG associated project will be submitted to the archives without any hold until publication date, thus allowing the data to be publicly available immediately after successful archive submission and useful to the community as soon as possible.
The FAASG analysis group will turn the raw data into primary and integrated analysis results. Primary analysis results consistent of sample level analysis such as alignment to a reference genome or quantification of signal in the assay. Integrated analysis results represent analyses which drawn together data from multiple samples and/or experiments such as genome segmentation or differential analysis results.
The majority of these analysis results will not be archived before publication but FAASG recognizes the need to share them both within the consortium and with the community. Initially all files that are not archived will be shared between FAASG members in private shared storage hosted at the EMBL-EBI. Any individual who signs up to FAASG and agrees to the Toronto principles 1 will be allowed access to this. There will be metadata files in the private data sharing area, which make credit for different datasets as clear as possible.
FAASG expects to make multiple releases each year. A data release will involve declaring a data freeze and copying all files associated with that data freeze from the private shared storage to the public FTP site. In the first instance these data freezes will contain the primary analysis results. As FAASG's analyses progress, the data freeze will be expanded to include integrative analysis too. The data freeze process will be coordinated by the FAASG Data Coordination Centre and will be based on consultation with FAASG members. FAASG will also aim to release all data associated with a paper before publication even if it lies outside this standard freeze cycle. The public data will be available to the whole community.
All FAASG public data is released under Fort Lauderdale principles 2. The FAASG website, data portal and FTP site will all have clear data reuse statements on them.
When considering internal FAASG data, if one FAASG member wishes to publish using data generated by another FAASG member they should first contact the data generator and clarify the member's publication strategy. Collaboration is for everyone's benefit and is strongly encouraged. The FAASG Steering Committee commits to report to journal editors and the laboratories involved any event that disregards the rights of data creators (including biological measurements as well as analysis of such measurements).
All members of FAASG can and will continue to do experimental and analysis work outside of FAASG and the other data generated is not required to meet the same data sharing expectations.
Only FAASG data can be distributed through the private storage and public FTP site.
- Toronto International Data Release Workshop: Rapid release of prepublication data has served the field of genomics well. Attendees at a workshop in Toronto recommend extending the practice to other biological data sets.
- Fort Lauderdale principles: Reaffirmation and Extension of NHGRI Rapid Data Release Policies: Large-scale Sequencing and Other Community Resource Projects. (alt link)