If a publication platform enables user contributed content and that content is managed by the platform, e.g. annotations or comments, the platform’s Terms of Use should clearly define the rights related to that content, especially if they may wish to preserve it or migrate it as part of the context of the publication. If a publication is likely to be archived with this context intact, the implementation of these features and their associated terms should factor in ethical consideration of how a user’s information is displayed on the platform, and how they are informed about and consent to the use of the content.
See also:
55. Ethical concerns of user-contributed content
70. Consider systematically tagging material that should be excluded from preservation
PubPub supports features that allow users to contribute content through annotations and comments. This content is integrated into the page and can’t be excluded from web crawls. The default PubPub Terms of Service template includes language that covers User-Generated Content under a Creative Commons Attribution 4.0 License:
By submitting User-Generated Content, you hereby make that User-Generated Content available under the Creative Commons Attribution 4.0 License, and you represent and warrant that you have the right to provide your User Generated Content under that license, that all of that User Generated Content is either authored by you, or provided by third parties under the Creative Commons Attribution 4.0 License or in the public domain, and that your User Generated Content contains no personally identifiable information of third parties who have not expressly authorized you to provide it as part of your User Generated Content. All of your User-Generated Content must be appropriately marked with licensing and attribution information.
These terms allow for preservation of User-Generated Content on PubPub.
If a publication platform integrates third party applications for features such as annotations or comments, the publisher should ensure that the terms of service for that application provide appropriate permission for preserving and migrating that content over time.
See also:
14. Avoid being dependent on third party services for core features
15. Plan a strategy for preservation when third party dependencies exist
Some third-party annotation services have restrictive default terms of service or do not define their terms of service. Hypothesis, an annotation tool that can be added to or used with most websites, grants a CC0 license for all annotation data stored on their servers. This means you don’t need to seek special permission to preserve the annotation data.
The Library of Congress updates their Recommended Formats Statement regularly. This is a helpful quick reference for selecting a format that is stable when there is an opportunity to choose. If converting data from a proprietary format to an open file format results in some data loss, consider saving both. For less established or proprietary formats, consider recording the type, version, and software used to generate and play the file—this can be included in the metadata or documentation.
These guidelines may also be considered during file format selection:
13. Acquire the highest quality version of media to use for preservation
34. For EPUBs, opt for core media types, as defined by the EPUB specification
Sometimes it is necessary or preferable to reference or embed third-party content that is outside of the control of the publisher but integral to the understanding of the work. For these features, anticipate that their availability may be temporary and make plans to ensure that they are not only preserved, but sustained in some form as part of the publication while they are on the publisher platform. In the case of an embedded YouTube video, for example, some options to support preservation might include: retaining or requesting a copy of the video file; getting permission to copy the content directly from YouTube using a downloader tool in order to bring it into the local publication; or web archiving the video page and linking to the archived copy, e.g. on the Internet Archive. An informative caption can help support future readers if the content is unavailable.
These guidelines may also improve preservability of third party hosted media:
12. Start discussions about multimedia early in the project
14. Avoid externally hosted media
16. Captions for non-text features add meaningful context
20. Ensure all core intellectual components of a work are reflected in the export package
39. Avoid the use of iframes to embed multimedia
42. Facilitate a local web archive workflow for iframe content
Owning My Masters (Mastered): The Rhetorics of Rhymes & Revolutions by A.D. Carson includes an annotated interactive timeline created using the Northwestern University Knight Lab’s TimelineJS. A simplified text representation of this timeline is included in the EPUB on the Fulcrum publishing platform. The interactive version, hosted at University of Virginia and embedded on the author's website using an iframe, is linked as an external resource. The timeline is configured from data stored in a Google Sheet owned by the author. A web archive file (WARC) of the interactive timeline site and a CSV of the Google Sheet are included as hosted resources on Fulcrum and available for download. Since Fulcrum resources are included in the export, the archived web page (WARC file) and the text version are both part of the preserved copy.
When a publisher acquires rights for resources that are part of the publication, these should also include rights pertaining to the preservation of those resources. Express these rights in the metadata in a way that allows a preservation service to determine what they have permission to preserve and relate them to the relevant material.
These guidelines may also support the creation of license metadata:
8. Clarify the license related to preserving third party web resources
24. Create descriptive metadata for each publication resource
40. Embed license information in the HTML
Some publishers may use copyrighted fonts and obfuscate them in order to protect the rights. Because obfuscated fonts create both a technical and copyright challenge for preservation, open fonts should be used. Font files should also be embedded within the publication they are being used in. This is especially important if the font selection is non-standard or has special characters that are important for the specific presentation of the work. For an EPUB, they should be included in the EPUB package. Similarly, for a website, the font files should be hosted locally as part of the web application rather than depending on an external font provider to support the link over the long term.
Some preservation services will not collect web content outside of the agreed upon domain names unless copyright for the content being harvested is clear. If third-party pages and features that are visually embedded in an EPUB or a web-based publication are meant to be preserved, it should be possible to identify which content publishers have the right to collect them so that a web crawler can be configured to include or exclude them. One way to communicate these rights is to express them in the metadata that is supplied to the preservation service. Another option is to apply structured metadata describing the rights status to the HTML. The Creative Commons REL documentation includes examples of this that cover both page- and object-level licenses. This approach could support automated harvesting decisions at either level. Alternatively, a publisher could supply a list of domain names to include for harvest during the initial preservation workflow configuration.
These guidelines may also be useful to consider when embedding external web content:
25. Add license information to resource-level metadata
38. List the URLs for external web content in the metadata
45. Embed metadata that includes a license in the <head> of a web page
70. Consider systematically tagging component that should be excluded for preservation
For publications where some content should not be preserved, consider tagging what can be preserved in a consistent way that can be used by preservation export or harvesting processes to exclude items that should not be preserved. Platforms may want to facilitate this tagging.
These guidelines also concern the inclusion and exclusion of content in the preservation process:
10. Define and document core intellectual components that need to be preserved
20. Represent all core intellectual components of the work in the export package
40. Identify the rights for external web content
55. Consider whether it is ethical/appropriate to preserve social media
65. Ensure irrelevant or private administrative data is removed from data exports