Code
::opts_chunk$set(echo=TRUE, message=FALSE, warning=FALSE, dpi=60, out.width = "100%")
knitroptions(scipen=999)
options(knitr.kable.NA = '--') #'--'
options(knitr.kable.NAN = '--')
al
May 23, 2024
May 24, 2024
Inspired by https://blog.djnavarro.net/posts/2022-03-17_using-aws-s3-in-r/ by Danielle Navarro.
Note to self - /Users/airvine/Projects/repo/new_graphiti/_freeze/posts/aws-storage-processx/index/execute-results/html.json
is created when I render this document. This seems to be what is published to website after 1. the github_actions
workflow is run to generate the gh-pages
branch (on github runner) 2. the site is published with gitpages
from github
.
“Quick” post to document where I got to with syncing files to aws with R. Didn’t love the aws.s3::sync
function because from what I could tell I could not tell it to delete files if they were not present locally or in a bucket (I could be wrong).
Then climbed into s3fs
which mirrors the fs
package and seems a bit more user friendly than the aws.s3
package for managing files. It is created by Dyfan Jones who also is the top contributor to paws
!! He seems like perhaps as much of a beast as one of the contributors to s3fs
who is Scott Chamberlain.
For the sync issue figured why not just call the aws
command line tool from R. processx
is an insane package that might be the mother of all packages. It allows you to run command line tools from R with flexibility for some things like setting the directory where the command is called from in the processx
called function (big deal as far as I can tell).
We need to set up our aws
account online. The blog above from Danielle Navarro covers that I believe (I struggled through it a long time ago). I should use a ~/.aws/credentials
file but don’t yet. I have my credentials in my ~/.Renviron
file as well as in my ~/.bash_profile
(probably a ridiculous setup). They are:
AWS_ACCESS_KEY_ID='my_key'
AWS_DEFAULT_REGION='my_region'
AWS_SECRET_ACCESS_KEY='my_secret_key'
s3fs
package.Current buckets are:
Let’s generate the name of the bucket based on the name of the repo but due to aws
bucket naming rules we need to swap out our underscores for hyphens! Maybe a good enough reason to change our naming conventions for our repos on github!!
We build a little wrapper function to help us debug issues when running system commands with processx
.
sys_call <- function(){
result <- tryCatch({
processx::run(
command,
args = args,
echo = TRUE, # Print the command output live
wd = working_directory, # Set the working directory
spinner = TRUE, # Show a spinner
timeout = 60 # Timeout after 60 seconds
)
}, error = function(e) {
# Handle errors: e.g., print a custom error message
cat("An error occurred: ", e$message, "\n")
NULL # Return NULL or another appropriate value
})
# Check if the command was successful
if (!is.null(result)) {
cat("Exit status:", result$status, "\n")
cat("Output:\n", result$stdout)
} else {
cat("Failed to execute the command properly.\n")
}
}
Then we specify our command and arguments. To achieve the desired behavior of including only files in the assets/*
directory, you need to combine the order of --exclude
and --include
flags appropriately (exclude everything first thenn include what we want):
Now lets put a tester file in our directory and sync it to our bucket. We will remove it later to test if it is removed on sync.
Run our little function to sync the files to the bucket.
Completed 237 Bytes/511.7 KiB (3.0 KiB/s) with 12 file(s) remaining
upload: posts/_metadata.yml to s3://new-graphiti/posts/_metadata.yml
Completed 237 Bytes/511.7 KiB (3.0 KiB/s) with 11 file(s) remaining
Completed 2.0 KiB/511.7 KiB (11.9 KiB/s) with 11 file(s) remaining
upload: posts/logos-equipment/index.qmd to s3://new-graphiti/posts/logos-equipment/index.qmd
Completed 2.0 KiB/511.7 KiB (11.9 KiB/s) with 10 file(s) remaining
Completed 3.6 KiB/511.7 KiB (20.8 KiB/s) with 10 file(s) remaining
upload: posts/snakecase/index.qmd to s3://new-graphiti/posts/snakecase/index.qmd
Completed 3.6 KiB/511.7 KiB (20.8 KiB/s) with 9 file(s) remaining
Completed 8.6 KiB/511.7 KiB (48.5 KiB/s) with 9 file(s) remaining
upload: posts/snakecase/thumbnail.jpg to s3://new-graphiti/posts/snakecase/thumbnail.jpg
Completed 8.6 KiB/511.7 KiB (48.5 KiB/s) with 8 file(s) remaining
Completed 13.9 KiB/511.7 KiB (74.3 KiB/s) with 8 file(s) remaining
upload: posts/aws-storage-permissions/index.qmd to s3://new-graphiti/posts/aws-storage-permissions/index.qmd
Completed 13.9 KiB/511.7 KiB (74.3 KiB/s) with 7 file(s) remaining
Completed 17.8 KiB/511.7 KiB (82.4 KiB/s) with 7 file(s) remaining
upload: posts/aws-storage-processx/image.jpg to s3://new-graphiti/posts/aws-storage-processx/image.jpg
Completed 17.8 KiB/511.7 KiB (82.4 KiB/s) with 6 file(s) remaining
upload: posts/test.txt to s3://new-graphiti/posts/test.txt
Completed 17.8 KiB/511.7 KiB (82.4 KiB/s) with 5 file(s) remaining
Completed 26.0 KiB/511.7 KiB (119.1 KiB/s) with 5 file(s) remaining
upload: posts/aws-storage-processx/index.rmarkdown to s3://new-graphiti/posts/aws-storage-processx/index.rmarkdown
Completed 26.0 KiB/511.7 KiB (119.1 KiB/s) with 4 file(s) remaining
Completed 34.0 KiB/511.7 KiB (152.0 KiB/s) with 4 file(s) remaining
upload: posts/aws-storage-permissions/image.jpg to s3://new-graphiti/posts/aws-storage-permissions/image.jpg
Completed 34.0 KiB/511.7 KiB (152.0 KiB/s) with 3 file(s) remaining
Completed 42.2 KiB/511.7 KiB (180.5 KiB/s) with 3 file(s) remaining
upload: posts/aws-storage-processx/index.qmd to s3://new-graphiti/posts/aws-storage-processx/index.qmd
Completed 42.2 KiB/511.7 KiB (180.5 KiB/s) with 2 file(s) remaining
Completed 83.0 KiB/511.7 KiB (256.0 KiB/s) with 2 file(s) remaining
upload: posts/logos-equipment/image.jpg to s3://new-graphiti/posts/logos-equipment/image.jpg
Completed 83.0 KiB/511.7 KiB (256.0 KiB/s) with 1 file(s) remaining
Completed 339.0 KiB/511.7 KiB (830.9 KiB/s) with 1 file(s) remaining
Completed 511.7 KiB/511.7 KiB (654.4 KiB/s) with 1 file(s) remaining
upload: posts/snakecase/all.jpeg to s3://new-graphiti/posts/snakecase/all.jpeg
Exit status: 0
Output:
Completed 237 Bytes/511.7 KiB (3.0 KiB/s) with 12 file(s) remaining
upload: posts/_metadata.yml to s3://new-graphiti/posts/_metadata.yml
Completed 237 Bytes/511.7 KiB (3.0 KiB/s) with 11 file(s) remaining
Completed 2.0 KiB/511.7 KiB (11.9 KiB/s) with 11 file(s) remaining
upload: posts/logos-equipment/index.qmd to s3://new-graphiti/posts/logos-equipment/index.qmd
Completed 2.0 KiB/511.7 KiB (11.9 KiB/s) with 10 file(s) remaining
Completed 3.6 KiB/511.7 KiB (20.8 KiB/s) with 10 file(s) remaining
upload: posts/snakecase/index.qmd to s3://new-graphiti/posts/snakecase/index.qmd
Completed 3.6 KiB/511.7 KiB (20.8 KiB/s) with 9 file(s) remaining
Completed 8.6 KiB/511.7 KiB (48.5 KiB/s) with 9 file(s) remaining
upload: posts/snakecase/thumbnail.jpg to s3://new-graphiti/posts/snakecase/thumbnail.jpg
Completed 8.6 KiB/511.7 KiB (48.5 KiB/s) with 8 file(s) remaining
Completed 13.9 KiB/511.7 KiB (74.3 KiB/s) with 8 file(s) remaining
upload: posts/aws-storage-permissions/index.qmd to s3://new-graphiti/posts/aws-storage-permissions/index.qmd
Completed 13.9 KiB/511.7 KiB (74.3 KiB/s) with 7 file(s) remaining
Completed 17.8 KiB/511.7 KiB (82.4 KiB/s) with 7 file(s) remaining
upload: posts/aws-storage-processx/image.jpg to s3://new-graphiti/posts/aws-storage-processx/image.jpg
Completed 17.8 KiB/511.7 KiB (82.4 KiB/s) with 6 file(s) remaining
upload: posts/test.txt to s3://new-graphiti/posts/test.txt
Completed 17.8 KiB/511.7 KiB (82.4 KiB/s) with 5 file(s) remaining
Completed 26.0 KiB/511.7 KiB (119.1 KiB/s) with 5 file(s) remaining
upload: posts/aws-storage-processx/index.rmarkdown to s3://new-graphiti/posts/aws-storage-processx/index.rmarkdown
Completed 26.0 KiB/511.7 KiB (119.1 KiB/s) with 4 file(s) remaining
Completed 34.0 KiB/511.7 KiB (152.0 KiB/s) with 4 file(s) remaining
upload: posts/aws-storage-permissions/image.jpg to s3://new-graphiti/posts/aws-storage-permissions/image.jpg
Completed 34.0 KiB/511.7 KiB (152.0 KiB/s) with 3 file(s) remaining
Completed 42.2 KiB/511.7 KiB (180.5 KiB/s) with 3 file(s) remaining
upload: posts/aws-storage-processx/index.qmd to s3://new-graphiti/posts/aws-storage-processx/index.qmd
Completed 42.2 KiB/511.7 KiB (180.5 KiB/s) with 2 file(s) remaining
Completed 83.0 KiB/511.7 KiB (256.0 KiB/s) with 2 file(s) remaining
upload: posts/logos-equipment/image.jpg to s3://new-graphiti/posts/logos-equipment/image.jpg
Completed 83.0 KiB/511.7 KiB (256.0 KiB/s) with 1 file(s) remaining
Completed 339.0 KiB/511.7 KiB (830.9 KiB/s) with 1 file(s) remaining
Completed 511.7 KiB/511.7 KiB (654.4 KiB/s) with 1 file(s) remaining
upload: posts/snakecase/all.jpeg to s3://new-graphiti/posts/snakecase/all.jpeg
Then we can see our bucket contents - as well as list our bucket contents and capture them.
s3://new-graphiti
└── posts
├── _metadata.yml
├── test.txt
├── aws-storage-permissions
│ ├── image.jpg
│ └── index.qmd
├── aws-storage-processx
│ ├── image.jpg
│ ├── index.qmd
│ └── index.rmarkdown
├── logos-equipment
│ ├── image.jpg
│ └── index.qmd
└── snakecase
├── all.jpeg
├── index.qmd
└── thumbnail.jpg
Now we will remove test.txt
Now we sync again.
delete: s3://new-graphiti/posts/test.txt
Exit status: 0
Output:
delete: s3://new-graphiti/posts/test.txt
List our bucket contents and capture them again
s3://new-graphiti
└── posts
├── _metadata.yml
├── aws-storage-permissions
│ ├── image.jpg
│ └── index.qmd
├── aws-storage-processx
│ ├── image.jpg
│ ├── index.qmd
│ └── index.rmarkdown
├── logos-equipment
│ ├── image.jpg
│ └── index.qmd
└── snakecase
├── all.jpeg
├── index.qmd
└── thumbnail.jpg
Compare the file structure before and after our sync.
old | new
[9] "posts/snakecase/all.jpeg" | "posts/snakecase/all.jpeg" [9]
[10] "posts/snakecase/index.qmd" | "posts/snakecase/index.qmd" [10]
[11] "posts/snakecase/thumbnail.jpg" | "posts/snakecase/thumbnail.jpg" [11]
[12] "posts/test.txt" -
Success!!
We need to build the call to sync the other way (cloud to local) in a way that perhaps nukes local files if they are not on the cloud. This is because we need to collaborate within our team so we do things like one person will change the name of images so when the other person syncs they will have only the newly named image in their local directory.
This all deserved consideration as it could get really messy from a few different angles (ie. one person adds files they don’t want nuked and then they get nukes. There are lots of different options for doing things so we will get there.)
Lets delete the bucket.
As we have tried this before we know that if we tell it we want to delete a bucket with versioned files in it we need to empty the bucket first including delete_markers
. That is easy in the aws console
with th UI but seems tricky. There is a bunch of discussion on options to this here https://stackoverflow.com/questions/29809105/how-do-i-delete-a-versioned-bucket-in-aws-s3-using-the-cli . Thinking a good way around it (and a topic for another post) would be to apply a lifecycle-configuration
to the bucket that deletes all versions of files after a day - allowing you to delete bucket after they expire (as per the above post). Really we may want to have a lifecycle-configuration
on all our versioned buckets to keep costs down anyway but deserves more thought and perhaps another post.
# old notes
# We are going to test creating a bucket with versioning on. This has large implications for billing with some details
# of how it works [here](https://aws.amazon.com/blogs/aws/amazon-s3-enhancement-versioning/) with example of costs [here](https://aws.amazon.com/s3/faqs/?nc1=h_ls). Thinking we may want versioned buckets for things like `sqlite`
# "snapshot" databases but definitely not for things like images.