Technical Infrastructure

Data Policy and Planning

Are you involved in planning technical infrastructure for DIR? Is this at the research group, department, institution or discipline level?
How do you review and up-date technological requirements?
Are you subject to a data policy? E.g. institutional or funder
What facilities do you use for managing research grants and applications? e.g. a CRIS (Current Research Information System)?
Do these facilties support CERIF (Common European Research Information Format)?

Notes

Is there more to do in this area?

Capture and Collection (Primary Data)

Are you making use of pre-exisiting data? If so, where does the data come from?
Which tools and facilities do you use to acquire or collect data? E.g. instruments, surveys, STFC DIAMOND/ISIS
What forms of research data do you work with? e.g. documents, notebooks, spreadsheets, databases, images, audio, video, websites, emails, physical samples etc.
Are the file formats you use during data collection open, standard, proprietary?
What are typical data volumes that you work with?
Which long-term storage facilities do you use for primary data? e.g. hard drive, CDs or DVDs, storage or filing cabinet, repository (institutional, publisher, disciplinary, Dryad, Figshare), data centre, cloud storage. To what extent can they be "trusted" for long-term access?
At what level of granularity are digital identifiers assigned for primary data? Are they locally allocated, discipline specific or internationally accepted? Are they persistent?
How appropriate is the version control system that is in use?
Which quality control, security, validation and integrity checks do you perform when collecting primary data?
How do you describe primary datasets e.g. through textual documentation; a metadata schema; a data catalogue
What types of metadata are recorded e.g. descriptive, reference, context, provenance, calibration etc. Are these captured automatically?
List any validity or integrity checks performed on the metadata
Who is responsible for the day-to-day management, storage and backup of primary data?
Do you make use of shared technical facilities? If so, with whom and how? (e.g. STFC DIAMOND/ISIS))
Are primary data managed according to a DMP? If so, how often is the DMP reviewed and by whom?

Notes

Is there more to do in this area?

Processing and Analysis (Intermediate Data)

How do you process and analyse your data e.g. normalisation or anonymisation; crystal structure determination; Reverse Monte Carlo techniques etc.
Which file formats are used during data processing and analysis? Are these open, standard, proprietary?
What are typical data volumes that you work with?
Which long-term storage facilities do you use for intermediate data? e.g. hard drive, CDs or DVDs, storage or filing cabinet, repository (institutional, publisher, disciplinary, Dryad, Figshare), data centre, cloud storage. To what extent can these be "trusted" for long-term access?
At what level of granularity are digital identifiers assigned for intermediate data? Are they locally allocated, discipline specific or internationally accepted? Are they persistent?
How appropriate is the version control system that is in use?
Which quality control, security, validation and integrity checks do you perform when processing research data?
How do you describe intermediate datasets e.g. through textual documentation; a metadata schema; a data catalogue
List the types of metadata that are recorded e.g. descriptive, reference, context, provenance, calibration etc.
How do you capture metadata for intermediate data? List any validity or integrity checks performed on the metadata
Who is responsible for the day-to-day management, storage and backup of intermediate data?
Do you make use of shared technical facilities? If so, with whom and how? (e.g. EPSRC NCS)
Are intermediate data managed according to a DMP? If so, how often is the DMP reviewed and by whom?

Notes

Is there more to do in this area?

Data Curation

Which long-term storage facilities do you use? e.g. hard drive, CDs or DVDs, storage or filing cabinet, repository (institutional, publisher, disciplinary, Dryad, Figshare), data centre, cloud storage. To what extent are they "trusted" for curatorial purposes?
If using a centralised storage facility, which file formats are accepted during data deposit? Are these widely accepted, standard, proprietary, open?
What quality control, security, validation and integrity checks are performed before making data available for long-term access? By whom are the checks undertaken?
What documentation and metadata are required at the time of deposit?
At what level of granularity are digital identifiers assigned to results data? Are they locally allocated, discipline specific or internationally accepted? Are they persistent?
How appropriate is the version control system that is in use?
Is support available for long-term management of your data? If so, explain in what form? e.g. metadata, curation. Who provides these services?
Who is responsible for the day-to-day management, storage and backup of results data?
Do you have adequate provision for managing results data?
Are results data managed according to a DMP? If so, how often is the DMP reviewed and by whom?

Notes

Is there more to do in this area?

Reuse and Impact

How do you search for and retrieve exisiting datasets?
How do you reference and cite data?
Which data integration platforms do you use? Are these home-grown?
List any specialised tools you use for accessing or visualising pre-existing data?
Which platforms do you use for engaging with the public (e.g. for citizen science)?
Are any of your own tools used in other disciplines? If so how have they been adapted?

Notes

Is there more to do in this area?

Back to pi perspective