Borealis, the Canadian Dataverse Repository, is a bilingual, multidisciplinary, secure Canadian research data repository. Borealis repository infrastructure is a shared academic service provided in partnership with academic library consortia, participating institutions, and the Digital Research Alliance of Canada. Technical infrastructure hosting and service operations are provided by Scholars Portal and the University of Toronto Libraries (UTL). The Borealis Steering Committee, established in 2024, brings together regional library consortia (OCUL, COPPUL, CAAL, PBUQ) to support national governance and ongoing commitments to data stewardship. Learn more about Borealis academic library partners, services, and governance.
Borealis uses the open-source Dataverse software, which is developed and maintained by the Institute of Quantitative Social Science (IQSS) at Harvard University and with community members, users, and the Global Dataverse Community Consortium (GDCC).
The Borealis Preservation Plan outlines the primary objectives, roles and responsibilities, strategies, and actions for preserving the digital files uploaded by users and stored in the repository, and complements digital preservation activities and other services provided by academic libraries and institutions.
The objectives of the preservation plan activities for the Borealis repository are as follows:
Description: The first level of preservation combines two broad sets of activities: bit-level preservation via regular independent file fixity checking and safe storage in the Ontario Library Research Cloud (OLRC), and, maintaining and improving the preservation features that are part of the Borealis repository. As the technical service provider, Borealis is not directly responsible for validating the contents or quality of user-uploaded files.
Level 1 preservation addresses Objectives 1, 2 and 3 (noted above): that user-uploaded files are safe from loss and that minimum preservation functions are run as a necessary precursor to additional preservation strategies.
Scope: Bit-level preservation is conducted for all user uploaded files in Borealis. This includes files associated with all versions of draft and published datasets (open or restricted). It does not include files generated by the Dataverse application itself, such as derivatives, thumbnails, or system generated metadata. Basic preservation activities are largely managed by the Borealis service team on behalf of Participating Institutions, except in cases related to fixity check failures and when interventions and remediation is required.
Term: Borealis will maintain Level 1 preservation activities for all user data as long as an institution is a subscriber to the Borealis service. In the event that the service agreement between Borealis and a Participating Institution is terminated, or stewardship for a sub-collection is no longer viable, Borealis will support transfer processes, such as data exports, to facilitate external collection management as required by the Institution to implement their plans. This may include return of data to the Institution or succession by another party identified by the Institution. Data will not be deleted except at the express request of the Institution or if reasonable attempts to ensure ongoing stewardship have not succeeded.
Activities:
Primary storage of all Borealis data files in the OLRC, with replicated copies stored at three of the five institutional partners storage nodes located in Ontario, Canada
Daily export and backup of all files to local disk storage and tape using industry-standard tape backup software
Regular independent fixity validation checks
Maintenance of additional preservation-supporting functionality available as part of the Dataverse application:
Description: This level of preservation is intended for Participating Institutions implementing active preservation through advanced preservation processing and/or the export of independent packages for management in external/institutional preservation systems.
In collaboration with institutional preservation policies and strategies, the Borealis technical preservation workflows such as those noted above can support a Participating Institution’s application for trustworthy repository certification, such as CoreTrustSeal Repository Certification. Additional information about these preservation features and strategies is described below.
Scope: Participating Institutions are responsible for defining their preservation policies, approaches, and activities, and determining which datasets are eligible for additional processing, export, and long-term managed preservation. Administrators, curators, or other preservation staff and designates at Participating Institutions may select the complete contents of their institutional collections or a subset as guided by internal appraisals, selection criteria, and preservation policies. Borealis can provide technical support and setup as requested by institutions.
Activities:
BagIt packages produced by the Dataverse application are conformant with the RDA-endorsed BagIt profile, and contain:
Note: in the case of tabular data uploads, only the original version is retained in the BagIt package, and upon export does not include the tabular derivative files (.tab) or variable-level metadata (DDI variable-metadata in XML format), unless uploaded with the deposited dataset.
BagIt exports are conducted by the Borealis team at the request of a Participating Institution. Exports are conducted at the dataset level and require a structured list with the DOI and Version Number of each requested dataset from the Participating Institution.
Bags may be transferred to external S3 storage managed by the Participating Institution.
Users: responsible for uploading data files and metadata to the Borealis repository, as well as viewing, downloading, and accessing data files and metadata in the repository. Users create an account and must adhere to the Borealis Terms of Use as well as any policies and procedures governing their use of the service as set by Participating Institutions.
Participating Institutions: responsible for administering the collections and use of Borealis at their institution. Institutions subscribe to Borealis via consortial agreements and are allocated storage space and administrative rights for local staff to manage their institutional collections within the Borealis repository. Institutions are responsible for oversight and stewardship of the data uploaded to their institutional collections by setting policies and deposit guidelines, administering users and user rights, and handling takedown and copyright decisions. Institutions may also validate data deposits for quality and completeness via curation and preservation activities, or providing guidance to depositors about preferred file formats and data documentation for deposit, sharing, and long-term preservation, as well as metadata to support discovery, understandability, reproducibility, and FAIR data for now and in the future.
Preservation policies and managed preservation activities are also defined by Participating Institutions for their collections, or selected sub-collections or datasets, to facilitate long-term preservation and access. In the event that the agreement between Borealis and Participating Institutions is terminated, or that an institution is otherwise no longer able to steward some or all of their data, Institutions are responsible for determining exit strategies and succession plans for their data.
Borealis: responsible for the technical repository and storage infrastructure, including maintenance, client support, and administration of the Dataverse repository software and service. Borealis ensures the Dataverse application is functional, secure, and updated. Borealis maintains the connected components, including storage infrastructure for files and data in the repository, integrated applications, such as Data Explorer, and customizations. Borealis supports administrators and users at Participating Institutions through training, documentation, guides, and administrator community calls. Borealis maintains no oversight over the quality, completeness, or format of files uploaded by users but will assist in identifying and remediating fixity issues in collaboration with Participating Institutions as they arise.
Archivematica: an open source, standards-based processing tool for creating well-formed packages for preservation storage. Archivematica performs signature-based file format identification, validation and characterization functions; can normalize copies of files to preservation and access formats; and creates preservation metadata files using the METS and PREMIS standards. The Dataverse - Archivematica Integration supports processing of data packages from a Dataverse instance using the Archivematica UI tools, workflows, and Dataverse APIs.
BagIt: a set of formatting conventions that guide creating checksums for, and verifying the fixity of, collections of files. Files contained in a BagIt-formatted directory (commonly called a “bag”) include a manifest of checksums that can be used to ensure that the contents of the directory have retained fixity after transfer or in storage.
Bit-level preservation: one type of digital preservation strategy, focused on ensuring that files retain fixity in storage through checksum validation and backup of multiple copies to multiple locations to protect against accidental loss, corruption, or disaster recovery. Bit-level preservation does not guarantee any form of future usability/accessibility based on the contents or format of the files in question.
Checksum: a unique numeric or alphanumeric string produced by running a checksum-generating algorithm against a file. When the contents of the file are altered in any way, the checksum value will change, indicating that the file no longer has fixity and therefore should be replaced from a good copy. Checksum algorithms include MD5, SHA-1 and SHA-256.
CoreTrustSeal: an international, community-based, non-governmental, and non-profit organization promoting sustainable and trustworthy data repositories. Certified repositories are recognized as being sustainable, transparent, and trustworthy from organizational, resourcing, and technical standpoints. Requirements for certification are based on the OAIS (Open Archival Information System) Reference Model for preserving and making available digital information.
Dataset: a container for a group of related files. For example, a dataset can include the original source data, code, and/or documentation related to a single study or publication. A dataset must also include metadata added by the user to describe the files, including a title, author(s), description and subject.
Dataverse: the open-source research data repository software application with which the Borealis repository is hosted and operated. Dataverse is developed by the Institute for Quantitative Social Science (IQSS) at Harvard University. Borealis infrastructure is based on a locally hosted instance of Dataverse.
Digital preservation: “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary” (DPC Glossary). Digital preservation activities can include active and ongoing monitoring of files and formats, regular fixity checks, and refreshing of storage media.
Fixity: the quality of knowing that a digital file has not been altered or changed. Fixity is established via computing a checksum. Fixity information can help establish the integrity of files via evidence that files have remained physically unchanged over time.
Ontario Library Research Cloud (OLRC): a five-node academic library community cloud storage network maintained by Scholars Portal and institutional partners. The OLRC uses the OpenStack Swift software and ORION network to manage and connect five storage nodes located at the University of Toronto, the University of Guelph, the University of Ottawa, York University, and Queen’s University, in Ontario, Canada. Borealis uses the OLRC as its repository storage and deposited files are replicated across three of the five nodes for reliability and integrity at any given time. If one of these copies becomes unreadable, a new copy is created by the system from the two remaining good copies. The OLRC service has a connection with DuraCloud, an open-source application integrated with the OLRC for advanced file preservation management of packages. Information about the technology and security of the OLRC is contained in the Borealis Technology Infrastructure and Security Information.
Permafrost: a hosted digital preservation service offered by Scholars Portal to members of the Ontario Council of University Libraries (OCUL). Permafrost pairs Archivematica with the OLRC to provide access to technical infrastructure, support, and training to enable OCUL members to actively process digital objects for long-term preservation and access.
The Preservation Plan is updated and maintained by Borealis, Scholars Portal, and the University of Toronto Libraries, in collaboration with its national governance and participating institutions. Thank you to the Alliance’s former Dataverse North Policy Working Group for creating the initial policy framework for this document. The Alliance’s RDM Preservation Expert Group’s report Preservation for Dataverse in Canada: Recommendations provides key requirements for the preservation strategies outlined above. Additional sources of inspiration were the Texas Digital Library Digital Preservation Policy and the Harvard Dataverse Preservation Policy.