Google Cloud

GCP for Archive and Analysis

National PAM SI Cloud Repo: more details on cloud storage and processing from the cloud team

Accessing Data

Check the archive spreadsheet to see if the dataset you are interested in accessing is available in the SWFSC-SAEL bucket. Note: only the raw audio data is uploaded here, see our NCEI archive for additional ancillary data.

The bucket is broken down by platform -> project -> deployment. You can select all data within a platform, or any subfolder within that directory, and click download.

Cloud Storage

Manual Upload

  1. Open SWFSC bucket in GCP console

  2. Select Upload (Upload files or Upload folder) and navigate to the data you want to upload

  3. Click “Upload”

    Large data files or datasets may crash GCP with this upload method. If you would like to upload large amounts of data, do so through the GCP sdk/gsutil options below

sdk/gsutil Set-Up

  1. Download the Google Cloud CLI installer and install Cloud SDK on your computer
  2. Open the ‘Google Cloud SDK Shell Terminal’, follow the prompts, and select ‘ggn-nmfs-pamdata-prod-1’ as your project

sdk/gsutil Upload

  1. Open the ‘Google Cloud SDK Shell Terminal’
  2. Enter the follow command to upload files to the GCP: gsutil -m cp -r [source pathway to files to be uploaded/] gs:[destination pathway to folder on bucket/]
    • SAEL ADRIFT Example "gsutil -m cp -r E:\DATA\LEG_2\RECORDINGS\CalCurCEAS_004 gs://swfsc-1/drifting_recorder/CalCurCEAS_2024"
  3. Processing should start, wait for terminal prompt to update then refresh the SWFSC bucket website to see your uploaded data

Automated sdk/gsutil Upload

You can use the following R script to automatically run the above command and bulk upload data from a source. R script available here. This script is set up to only run from 6 pm to 6 am on weekdays and full time on weekends, to avoid excess network traffic.

# Check for weekday and hour
is_weekday = function(x){
  weekdays = c('Thursday', 'Friday', 'Monday', 'Tuesday', 'Wednesday')
  return(weekdays(x) %in% weekdays)
}

is_after_hours = function(x){
  return(as.integer(format(x,"%H")) < 6 | as.integer(format(x, "%H")) > 18)
}

# Google  cloud bucket to transfer to
my_fmc_bucket_name = 'swfsc-1/drifting_recorder/ADRIFT'

# List deployment IDs to copy to bucket
deployment_dir <- "Z:/RECORDINGS/DRIFTERS/ADRIFT/RAW" #path to top level directory
all_deployments <- list.dirs(deployment_dir, full.names = FALSE, recursive = FALSE) #list deployment IDs based on folder names within top level directory
ADRIFT_deployments <- grep("^ADRIFT_\\d{3}", all_deployments, value = TRUE) #remove Opps deployments

# Copy wav files
for (folder in ADRIFT_deployments) {
  #check day and time
  while (is_weekday(Sys.time()) & !is_after_hours(Sys.time())) {
    cat(paste('waiting...',Sys.time(), "\n"))
    Sys.sleep(600) #wait 10 minutes until trying again
  }
  
  LocalDir <- file.path(deployment_dir, folder) #local folder with wav files
  
  wav <- list.files(LocalDir, pattern = "\\.wav$", full.names = TRUE) #list of wav files to copy

  GCP_FolderName <- sub("_CENSOR$", "", folder) # create folder name for GCP bucket, removing CENSOR from folder name

  GCP_path <- file.path(my_fmc_bucket_name, GCP_FolderName) # create path to GCP folder
  
  for (file in wav) {
    string <- paste("gsutil -m cp -r ", file, " ", "gs://", GCP_path, "/", sep="") # create command to pass to google cloud sdk command window
    
    message("Uploading: ", file, " -> ", GCP_path) # print a progress message in R console
    
    system(string) # pass the command for processing and wav file transfer
  }
}

More information on Cloud Storage

See the PAM SI Cloud Storage website for more information and troubleshooting tips

Cloud Processing

Note

work in progress

Cloud based processing is now available through google workstations using the data you upload to GCP. SAEL has not began processing in the cloud but you can follow the steps below to establish a windows virtual machine (workstation) and access the existing cloud based software

PAM Windows Workstation

Follow the detailed directions here to establish a windows workstation

More Information on Cloud Processing

See the PAM SI Cloud Processing website for more information and troubleshooting tips