Google Cloud
GCP for Archive and Analysis
National PAM SI Cloud Repo: more details on cloud storage and processing from the cloud team
Link to SWFSC GCP Bucket: our data storage site
Accessing Data
Check the archive spreadsheet to see if the dataset you are interested in accessing is available in the SWFSC-SAEL bucket. Note: only the raw audio data is uploaded here, see our NCEI archive for additional ancillary data.
The bucket is broken down by platform -> project -> deployment. You can select all data within a platform, or any subfolder within that directory, and click download.
Cloud Storage
Manual Upload
Open SWFSC bucket in GCP console
Select Upload (Upload files or Upload folder) and navigate to the data you want to upload
Click “Upload”
Large data files or datasets may crash GCP with this upload method. If you would like to upload large amounts of data, do so through the GCP sdk/gsutil options below
sdk/gsutil Set-Up
- Download the Google Cloud CLI installer and install Cloud SDK on your computer
- Open the ‘Google Cloud SDK Shell Terminal’, follow the prompts, and select ‘ggn-nmfs-pamdata-prod-1’ as your project
sdk/gsutil Upload
- Open the ‘Google Cloud SDK Shell Terminal’
- Enter the follow command to upload files to the GCP:
gsutil -m cp -r [source pathway to files to be uploaded/] gs:[destination pathway to folder on bucket/]
- SAEL ADRIFT Example
"gsutil -m cp -r E:\DATA\LEG_2\RECORDINGS\CalCurCEAS_004 gs://swfsc-1/drifting_recorder/CalCurCEAS_2024"
- SAEL ADRIFT Example
- Processing should start, wait for terminal prompt to update then refresh the SWFSC bucket website to see your uploaded data
Automated sdk/gsutil Upload
You can use the following R script to automatically run the above command and bulk upload data from a source. R script available here. This script is set up to only run from 6 pm to 6 am on weekdays and full time on weekends, to avoid excess network traffic.
# Check for weekday and hour
= function(x){
is_weekday = c('Thursday', 'Friday', 'Monday', 'Tuesday', 'Wednesday')
weekdays return(weekdays(x) %in% weekdays)
}
= function(x){
is_after_hours return(as.integer(format(x,"%H")) < 6 | as.integer(format(x, "%H")) > 18)
}
# Google cloud bucket to transfer to
= 'swfsc-1/drifting_recorder/ADRIFT'
my_fmc_bucket_name
# List deployment IDs to copy to bucket
<- "Z:/RECORDINGS/DRIFTERS/ADRIFT/RAW" #path to top level directory
deployment_dir <- list.dirs(deployment_dir, full.names = FALSE, recursive = FALSE) #list deployment IDs based on folder names within top level directory
all_deployments <- grep("^ADRIFT_\\d{3}", all_deployments, value = TRUE) #remove Opps deployments
ADRIFT_deployments
# Copy wav files
for (folder in ADRIFT_deployments) {
#check day and time
while (is_weekday(Sys.time()) & !is_after_hours(Sys.time())) {
cat(paste('waiting...',Sys.time(), "\n"))
Sys.sleep(600) #wait 10 minutes until trying again
}
<- file.path(deployment_dir, folder) #local folder with wav files
LocalDir
<- list.files(LocalDir, pattern = "\\.wav$", full.names = TRUE) #list of wav files to copy
wav
<- sub("_CENSOR$", "", folder) # create folder name for GCP bucket, removing CENSOR from folder name
GCP_FolderName
<- file.path(my_fmc_bucket_name, GCP_FolderName) # create path to GCP folder
GCP_path
for (file in wav) {
<- paste("gsutil -m cp -r ", file, " ", "gs://", GCP_path, "/", sep="") # create command to pass to google cloud sdk command window
string
message("Uploading: ", file, " -> ", GCP_path) # print a progress message in R console
system(string) # pass the command for processing and wav file transfer
} }
More information on Cloud Storage
See the PAM SI Cloud Storage website for more information and troubleshooting tips
Cloud Processing
work in progress
Cloud based processing is now available through google workstations using the data you upload to GCP. SAEL has not began processing in the cloud but you can follow the steps below to establish a windows virtual machine (workstation) and access the existing cloud based software
PAM Windows Workstation
Follow the detailed directions here to establish a windows workstation
More Information on Cloud Processing
See the PAM SI Cloud Processing website for more information and troubleshooting tips