diff --git a/index.md b/index.md index 5619fd38a72fb0ff6a34aeb221fd2373306ef59e..a4ea5fda3178fd3292b6f5ccbc9f1df57e5a3050 100644 --- a/index.md +++ b/index.md @@ -16,10 +16,10 @@ Data Management Services is a portfolio of services allowing to facilitate the w **S3** is a general service suitable for most of the usecases (archives, backups, special applications...). It also allows to share your data with other users or publicly via link. - [:octicons-arrow-right-24: Overview of S3 Service](./object-storage/s3-service.md) - [:octicons-arrow-right-24: Favourite S3 Clients](./object-storage/rclone.md) - [:octicons-arrow-right-24: Advanced S3 Functions](./object-storage/rclone.md) - [:octicons-arrow-right-24: Veeam stup against S3](./object-storage/rclone.md) + [:octicons-arrow-right-24: Overview of S3 Service](./object-storage/s3-service.md)<br/> + [:octicons-arrow-right-24: Favourite S3 Clients](./object-storage/rclone.md)<br/> + [:octicons-arrow-right-24: Advanced S3 Functions](./object-storage/rclone.md)<br/> + [:octicons-arrow-right-24: Veeam stup against S3](./object-storage/rclone.md)<br/> diff --git a/mkdocs.yml b/mkdocs.yml index 66f5d12e0b2d30ff7e69665531dd6cf66e89e1c9..05568f7d2a966c2a84aca77cfc2c63c6ab97a8ed 100755 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -2,6 +2,10 @@ site_name: "storage" nav: - Data Storage Services: index.md - - Object Storage Services: + - S3 Service: - S3 Overview: object-storage/s3-service.md - - General Storage Guides: object-storage.md + - Fauvorite S3 clients: object-storage/s3-clients.md + - Advanced S3 fetures: object-storage/s3-features.md + - Veeam backup over S3: object-storage/veeam-backup.md + - RBD Service: + - RBD Sevice: object-storage.md diff --git a/object-storage/s3-clients.md b/object-storage/s3-clients.md new file mode 100644 index 0000000000000000000000000000000000000000..8ca4c2789df60fa1685aaed7bd29e84293f71eaf --- /dev/null +++ b/object-storage/s3-clients.md @@ -0,0 +1,14 @@ +--- +languages: + - en + - cs +--- +# Favourite S3 service clients +In the following section you can find recommended S3 clients. + +## AWS-CLI (Linux, Windows) +[AWS CLI](https://aws.amazon.com/cli/) - Amazon Web Services Command Line Interface - is standardized too; supporting S3 interface. Using this tool you can handle your data and set up your S3 data storage. You can used the command line control or you can incorporate AWS CLI into your automated scripts. [Tutorial for AWS CLI](aws-cli.md). + +## Rclone (Linux, Windows) +The tool [Rclone](https://rclone.org/downloads/) is suitable for data synchronization and data migration between more endpoints (even between different data storage providers). Rclone preserves the time stamps and checks the checksums. It is written in Go language. Rclone is available for multiple platforms (GNU/Linux, Windows, macOS, BSD and Solaris). In the following guide, we will demonstrate the usage in Linux and Windows systems. [Rclone guide](rclone.md). + diff --git a/object-storage/s3-service-screeshots/direct_upload.png b/object-storage/s3-service-screeshots/direct_upload.png new file mode 100644 index 0000000000000000000000000000000000000000..2a1616aedd12e904a8aab58f1ed580d49f7a28c5 Binary files /dev/null and b/object-storage/s3-service-screeshots/direct_upload.png differ diff --git a/object-storage/s3-service-screeshots/s3_backup.png b/object-storage/s3-service-screeshots/s3_backup.png new file mode 100644 index 0000000000000000000000000000000000000000..9f64fdfad9893dbdcc15a619e6be05f0b0f24390 Binary files /dev/null and b/object-storage/s3-service-screeshots/s3_backup.png differ diff --git a/object-storage/s3-service-screeshots/s3_distribution.png b/object-storage/s3-service-screeshots/s3_distribution.png new file mode 100644 index 0000000000000000000000000000000000000000..9ddab8443c98f035f8ec0bfe724e33b2c8d33171 Binary files /dev/null and b/object-storage/s3-service-screeshots/s3_distribution.png differ diff --git a/object-storage/s3-service.md b/object-storage/s3-service.md index f5ab75fa2bb0555a23f4fe378cc1090a6e8f4db1..28887429f000d675657157b233f4a2f3ccc5b11f 100644 --- a/object-storage/s3-service.md +++ b/object-storage/s3-service.md @@ -7,21 +7,43 @@ languages: S3 service is a general service suited for most of the use cases. S3 service can be used for elementary data storing, automated backups, or various types of data handling applications. -S3 service utilizes similar name convention as AWS S3. The convention is “bucket.domain.cz”. The tenant is the unique identificator and domain is s3.clX.du.cesnet.cz. If you will not explicitly mention the tenant it should be recognized automatically. The recognition is being performed based on the access key and secret key. So it should be sufficient to use the format as follows: s3.clX.du.cesnet.cz/bucket - -In case your client considers the endpoint as native AWS you have to switch to S3 compatible endpoint. Most of the clients can automatically process both formattings. However, in some cases is necessary to specify the format explicitly. +Access to the service is controlled by virtual organizations and coresponding groups. S3 is suitable for sharing data between individual users and groups that may have members from different institutions. Tools for managing groups and users are provided by the e-infrastructure. Users with access to S3 can be people, as well as "service accounts", for example for backup machines (a number of modern backup tools support natively S3 connection). Data is organized into buckets in S3. It is usually appropriate to link individual buckets to the logical structure of your data workflow, for example different stages of data processing. Data can be stored in the service in an open form or in case of sensitive data it is possible to use encrypted buckets on the client side. Where even the storage manager does not have access to the data. Client-side encryption also means that the transmission of data over the network is encrypted, and in case of eavesdropping during transmission, the data cannot be decrypted. ???+ note "How to get S3 service?" To connect to S3 service you have to contact Data Storage support at: `du-support@cesnet.cz` -Once you obtain your credentials you can continue to connection itself using one of the following S3 client. +---- +## S3 Elementary use cases +In the following section you can find the description of elementary use cases related to S3 service. + +### Automated backup of large datasets using the tools natively supporting S3 service +If you use specialized automated tools for backup, such as Veeam, bacula, restic..., most of these tools allow native use of S3 service for backup. So you don't have to deal with connecting block devices etc. to your infrastructure. You only need to request an S3 storage setup and reconfigure your backup. Can be combined with the WORM model as protection against unwanted overwriting or ransomware attacks. + +{ style="display: block; margin: 0 auto" } + +### Data sharing across you laboratory or over multiple institutions +If you manage multiple research groups where you need users to share data, such as data collection and its post-processing, you can use S3. The S3 service allows you to share data within a group or between users. This use case assumes that each user has own access to the repository. This use case is also suitable if you need to share sensitive data between organizations and do not have a secure VPN. You can use encrypted buckets (client-side encryption) within the S3 service. Client-side encryption also means that the transmission of data over the network is encrypted, and in case of eavesdropping during transmission, the data cannot be decrypted. + +{ style="display: block; margin: 0 auto" } + +### Life systems handlig the data - Learning Management Systems, Catalogues, Repositories +You have large data and you operate an application in e-infrastructure that issues data to your users. This use case is particularly relevant to applications that distribute large data (raw scans, large videos, large scientific data sets for computing environments...) to end users. For this use case, it is possible to use the S3 service again. The advantage of using S3 for these applications is that there is no need to upload data to the application server, but the end user can upload/download data directly to/from object storage using S3 presign requests. + +{ style="display: block; margin: 0 auto" } + +## S3 Data Reliability (Data Redundancy) - replicated vs erasure coding +In the section below are described two aproaches for data redundancy applied to the object storage pool. S3 service can be equipped with **replicated** or **erasure code (EC)** redundancy. +### Replicated +Your data is stored in three copies in the data center. In case one copy is corrupted, the original data is still readable in an undamaged form, and the damaged data is restored in the background. Using a service with the replicated flag also allows for faster reads, as it is possible to read from all replicas at the same time. Using a service with the replicated flag reduces write speed because the write operation waits for write confirmation from all three replicas. + +???+ note "Suitable for?" + Suitable for smaller volumes of live data with a preference for reading speed (not very suitable for large data volumes). + +### Erasure Coding (EC) +Erasure coding (EC) is a data protection method. It is similar to the dynamic RAID known from disk arrays. Erasure coding (EC) is a method where data is divided into individual fragments, which are then stored with some redundancy across the data storage. Therefore, if some disks (or the entire storage server) fail, the data is still accessible and will be restored in the background. So it is not possible for your data to be on one disk that gets damaged and you lose your data. + + -## S3 service clients -In the following section you can find recommended S3 clients. -### AWS-CLI (Linux, Windows) -[AWS CLI](https://aws.amazon.com/cli/) - Amazon Web Services Command Line Interface - is standardized too; supporting S3 interface. Using this tool you can handle your data and set up your S3 data storage. You can used the command line control or you can incorporate AWS CLI into your automated scripts. [Tutorial for AWS CLI](aws-cli.md). -### Rclone (Linux, Windows) -The tool [Rclone](https://rclone.org/downloads/) is suitable for data synchronization and data migration between more endpoints (even between different data storage providers). Rclone preserves the time stamps and checks the checksums. It is written in Go language. Rclone is available for multiple platforms (GNU/Linux, Windows, macOS, BSD and Solaris). In the following guide, we will demonstrate the usage in Linux and Windows systems. [Rclone guide](rclone.md).