Log in Page Discussion History Go to the site toolbox

Vegfunction data sharing

From BluWiki

Contents

WIKI FOR DISCUSSIONS ABOUT DATA SHARING PRACTICES IN THE COURSE OF RESEARCH COLLABORATION

ARC-NZ RESEARCH NETWORK FOR VEGETATION FUNCTION website here

Members of Vegetation Function Network are welcome to add opinions and observations. Please don't alter contributions put in by others, and please do label your contributions with name and date.

Advice on how to work with wikis can be found here.

BACKGROUND

Working groups of the Vegetation Function Network often compile datasets.

In principle, most scientists are in favour of research data becoming public as soon as possible, because this moves research forward faster. On the other hand data cost time and money to acquire in the first place, or to compile from multiple sources, or to curate (error-check, provide meta-data, answer enquiries, top up with alcohol). These investments of effort will only be made if there is an incentive to do so. In the research world, the incentive is usually authorship of publications, this being the main currency of academic careers. So the question is how to strike a satisfactory balance. How do we encourage pooling of data, while at the same time maintaining the incentive for the work of developing, compiling and curating data?

This wiki does not aim to lay down rigid rules about how to strike that balance. Rather, its purpose is to collect opinions, practices and experiences in one place, in the hope that this will help people to find the appropriate middle ground more quickly and with less angst. It is concentrated on experiences within working groups of the Vegetation Function Network, but draws in some other experiences also.

Link to wiki for VFN Database Working Bee

SOME SOURCE DOCUMENTS

Vegetation Function Network's first-draft data sharing policy (Aug 2008) link here

TRY Guidelines for Intellectual Property (March 2008) link here. (TRY's full title is “Plant functional types: Refining plant functional classifications for Earth system modeling". It is an IGBP-QUEST-DIVERSITAS Fast-Track Initiative, in coordination with the Max Planck Institute for Biogeochemistry at Jena, Germany.)

GENERAL COMMENTS AND QUESTIONS

Comment from Lou Santiago, convenor of WG21 Water use efficiency: relating d13C to instantaneous measures.

"our group is in complete agreement with [the draft policy circulated August 2008]. Indeed, this discussion came up during our working group, so we basically understood much of these matters from the outset and everyone is willing to work within this data-sharing environment ... there seemed to be agreement within our group that when the main data paper that includes our database is published, that the carbon isotope composition and climatic variables that accompany the sites would be published as an online appendix. For this line of thinking, we are using the example of the 2004 leaf economics spectrum paper."

Comment from Yusuke Onoda, convenor of WG38 Leaf Biomechanics:

"In the case of leaf biomechanics database, data owners have agreed to share the data with collaborators of leaf biomechanics project, but we don't have much discussion when and how the database is available for public."

Comment from Peter van Bodegom, convenor of WG39 Wetland Plant Traits, 4 Sept 2008

"It is great to read that the Network is developing a policy for data sharing. Personally, I think that data sharing is very important and should be encouraged and I agree with the policy outlined."

Comment from Nick Williams, convenor of WG22 Urbanization and Plant Functional Traits, email of 19 Sept 2008

"... to me the policy seems reasonable and sensible and there is a need for it"

Comment from Sandy Harrison, convenor of WG24Fire, Vegetation and Climate Change in Australasia, email of 24 Sept 2008

"the group discussed data policy right at the beginning and adopted a policy (link here) that has been used by other groups working in palaeo-synthesis"

Question from Sandy Harrison, convenor of WG24Fire, Vegetation and Climate Change in Australasia, email of 24 Sept 2008

"I assume that we will need to update these files from time to time. Do you have a strategy for that?"

ISSUE 1: TIMELINE

The Network's draft policy (August 2008) suggested as follows:

1. A “clock” should be considered as starting when a dataset is complete, in the sense of being ready to analyse and write papers from. It is normally expected that papers should be submitted within one year from this date. Questions should be asked if papers are not submitted within two years. If papers have not been submitted after two years, then the dataset should normally be made generally available for others to work with at that time. There can, of course, be exceptions made for illness or similar misadventure.
2. It is similarly expected that adequate metadata and documentation will have been prepared within 2 years from when the dataset was ready to be analysed. (Really, it should be prepared at the same time as the dataset.)
3. Initially, data availability would be within the Network, that is, data would be available for projects organized through the Network and giving credit to the Network.
.....
5. Provided a paper has been submitted within 2 yr, the Network is willing to wait for it to be accepted somewhere before the data are made available.
6. Sometimes more than one analysis, and more than one paper, are planned to flow from a single database compilation. It is reasonable for only the data used in the first paper to be made available once that paper has been accepted, and for some other data to be kept out of circulation until papers dealing with it have been accepted. After the two year expectation for the first paper, a further year can be allowed for a second paper, a further year again for a third paper, and so forth.


Comment from Margie Mayfield, convenor of WG31. Human-influenced countrysides and plant traits (from email of 2 Sept 2008):

"The only thing that I might change is the time line for making data sets available. 1 year seems too short to me. I would change the time limits to submission within 2 years. I know that one of the major goals of the Network is to encourage rapid publication of synthetic work. The reality is, however, that the demands on most Network participant's time when not at a meeting makes it very difficult to work at that pace. In our group we have run into trouble in that many contributors have no money to hire someone to help compile their data into our format and also have no time to do it themselves. The result is that they are very slow getting the data to us. All of these contributors are very dedicated to the project but simply don't have the time or support to move on such short time scales. The lack of resources is particularly a problem for participants from developing nations where academic funding is scarce to non-existent and where teaching demands are much higher. Even within Australia, we have been running into an almost complete halt in progress during semesters because of the teaching demands that people face. An indication that people really are dedicated to this working group is that progress is immediately made in school breaks - short and long. Despite limitations on our time, I feel we have been doing a pretty good job keeping within a few months of being on schedule but I do think a year is too short particularly when trying to compile very large amounts of data from all over the world. Just for comparison - I have been involved in 3 NCEAS working groups and multiple publications came out of all three groups but the only papers that were submitted within a year of the first meetings were literature reviews - all of the synthetic data papers took two years or more."


Comment from Fred Gurgel, convenor of WG47. Marine plant phylogeography and bioregionalization (Email of 28 Aug 2008):

"In theory I do not have any disagreements. However, like people say " the devil lives in the details". For example, some datasets can be mined by the same working group (or custodians) for more than the 2 years (certainly my case) thus passing the proposed deadline for public release, as currently drafted. This in my view is/will be the most controversial point. Many like me risk their lives by diving in places infested with white sharks in order to get or improve data so it should be understandable that we get a little bit possessive and want to get the very most out of them. For us the idea of a 'clock' or a '2 year mark' for dataset release is uncomfortable but this refers to biogeographical, distributional and ecological data only.
I did not check within my WG before replying to this email but, coincidently, during our first meeting it became very clear that dataset release and their accessibility by other labs is a big concern and should be avoided at all costs as long as we have plans and ideas on how to squeeze manuscripts out of them. Especially because we know there are people out there salivating to access what we have to run their own analyses and studies. So short timelines for public release of big and rich distributional and ecological datasets (from which many papers can be produced in the following 4+ years) was not a very popular idea within the WG. However everybody agreed that they should eventually be released once we are done with them."''

Clarification from Mark Westoby, 3 Sept 2008:

"The issue about timelines is what would be an appropriate balance. Under the draft policy of Aug 08, the clock would start from when the dataset was complete (complete enough to start serious data analysis, anyhow), not from the first meeting of the working group. Then we would want to see something submitted within two years after that. It's not being suggested that all possible papers should be out within two years, rather that at least one should be submitted within two years, then continuing at a rate of at least one per year until the group feels they have made use of the main opportunities."


From report following first meeting of WG30 Plant Population Syndromes Authorship, March 2008:

"• All participants in the workshop can obtain a copy of the data-base from the data-base managers ...
• If one of the participants wishes to share the data-base with a student or other collaborator agreement of the group must be sought first. We propose to maintain the data-base within the participant group until the next workshop when we will discuss the possibility of making the database more generally available."

Comments from Peter van Bodegom, convenor of WG39 Wetland Plant Traits, 4 Sept 2008:

"Datasets are compiled, but also increasingly combined and structured. A logical next step is indeed to set up a clear formal data sharing policy. However, as some datasets are used and added to more than one of those databases, it might be good to avoid duplication (or worse differences) in policy. It seems to me, that the TRY initiative is presently the biggest compilation (although it seems partly duplicated in the LifeWatch intitiative) and many of the separate databases I know of seem to be ending up in TRY in the coming years. Therefore, I guess that in the long run most users might want to use the TRY database. Would it be an idea to integrally use the TRY agreements on data sharing for the Network databases (to make sure that the same sets of rules are used everywhere)? I don't know whether the Network has a formal agreement with TRY (I know that LEDA has), but if so, then it might even be worhtwhile to consider to deposit all Network databases in the TRY database (with a certain delay to allow first use by the Network). This would ensure the biggest users community."
"To facilitate combining existing databases and databases still to be developed, another next step might be to come to a common template or toolkit that allows to recombine datasets easily. Again, TRY has such a template and uploads everything through Excel-sheets. In the end, a more suffisticated tool might be needed. For WG39 we use Access with all its imperfections, but I know that there are several developments within bioinformatics that might provide better long-term solutions. Could the Network anticipate on such development or come up with some guidelines/templates to make future compilations easier?"

Response from Mark Westoby, 5 Sept 2008:

"I certainly hope we can collaborate and stay in effective communication with TRY. But there would be complications simply adopting TRY. These include (1) our databases cover many matters other than plant traits, e.g. phylogenetic divergences (2) TRY's current policy does not actually resolve any of our issues, it just says that use is free for earth system modelling but after that everything is to be negotiated with the data contributors (3) TRY's policy does not (in my view) deal adequately with issues of large databases that have many sources, it just takes the attitude that whoever sent the spreadsheet to TRY is entitled to make all decisions."


Comment from Nick Williams, convenor of WG22 Urbanization and Plant Functional Traits, email of 19 Sept 2008

"The 2 yr clock may not be long enough for complicated datasets where the collaborators meet infrequently. For example our datasets are still evolving and we are almost ready to do a major analysis but it is 18 months since we first met. The data may throw up interesting research questions additional to the original purpose of the dataset which the working group collaborators should have first rights to explore. This may be difficult unless we can secure funds etc"

ISSUE 2: AUTHORSHIP

From report following first meeting of WG30 Plant Population Syndromes Authorship, March 2008:

"Project leaders will be the first authors on their respective publications (as outlined in Appendix 2). Other authors will be self-nominated according to the 10% rule of at least a 10% contribution to at least two of the following: problem outline, methods, analysis or writing. Contribution of data alone is not sufficient for authorship."

Comment from Fred Gurgel, convenor of WG47 Marine plant phylogeography and bioregionalization (Email of 28 Aug 2008):

"... the way we decided who are authors. All members are automatically authors of any papers regardless of the degree of input or who compiled the dataset(s) as long as they stayed during the meeting to work on them. Thus, after the first 5 days of our first meeting, only 3 of us stayed afterwords. During that time we were able to draft a third paper where only those three who stayed are authors."

Comment from Nick Williams, convenor of WG22 Urbanization and Plant Functional Traits, email of 19 Sept 2008

"We haven't negotiated authorship arrangements for data brought to the working group by participants that was compiled by them from other sources. In most cases participants had a major role in generating the data for each of their cities but others parties would also have been involved."

ISSUE 3: CONTACT PEOPLE (AKA CURATORS, CUSTODIANS) AND ACCESS TO DATA

From Hans Cornelissen and Will Cornwell, convenors of WG17 Global meta-analysis of plant trait and plant type control over litter decomposition rates, email 1 Sept 2008

"The only real caveat with our ARTDECO database is that the actual k values (leaf litter decomposition rates) for each species mean nothing unless coupled to the specific conditions of each of the litterbeds, time periods and climatic conditions during incubation, etc. etc. This is fundamentally different from most other traits, for which one can have a mean species value that means at least something in itself. Will (and I) as the custodian would appreciate being consulted by any future users to avoid misinterpretation of the data."

Form of words used when data were published as supplementary online material in connection with the first "leaf economic spectrum" paper (Wright, I. J., P. B. Reich, M. Westoby, D. D. Ackerly, Z. Baruch, F. Bongers, J. Cavender-Bares et al. 2004. The worldwide leaf economics spectrum. Nature 428:821-827):

"As a condition for use of the Glopnet dataset, we request that users agree (1) To notify the main Glopnet organisers (Ian Wright and Peter Reich) if the dataset is to be used in any publication; (2) To provide the main Glopnet organisers with formal recognition that, at their discretion, may include co-authorship or acknowledgements on publications; (3) To recognize that the researchers who gathered these data may be using them for scientific analyses, papers or publications that are currently planned or in preparation, and that such activities have precedence over any that that you might wish to prepare."


Comment from Nick Williams, convenor of WG22 Urbanization and Plant Functional Traits, email of 19 Sept 2008

"We have been approached by people outside the working group who are interested in using the data generated for a NCEAS proposal/working group but we said it was too early to consider this and would need agreement from all participants"

Site Toolbox:

Personal tools
GNU Free Documentation License 1.2
This page was last modified on 29 June 2009, at 03:36.
Disclaimers - About BluWiki