Background: Simulating input data from an ecosystem model

We use existing Atlantis ecosystem model output to generate input datasets for a variety of multispecies models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models simulate a wide range of physical and ecological processes, include a full food web, and can be run using different climate forcing, fishing, and other scenarios.

We extract simulated data using the R package atlantisom. The purpose of atlantisom is to use existing Atlantis model output to generate input datasets for a variety of models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models can be run using different climate forcing, fishing, and other scenarios. Users of atlantisom will be able to specify fishery independent and fishery dependent sampling in space and time, as well as species-specific catchability, selectivty, and other observation processes for any Atlantis scenario. Internally consistent multispecies and ecosystem datasets with known observation error characteristics will be the atlantisom outputs, for use in individual model performance testing, comparing performance of alternative models, and performance testing of model ensembles against “true” Atlantis outputs.

Species in the ms-keyrun dataset

Our initial species selection includes 11 single species groups from the Norwegian Barents Sea (NOBA) Atlantis model [1,2]. These groups are fully age structured. All but two of them are fished.

spptable <- mskeyrun::simFocalSpecies %>% 
  select(Name, SciName)

knitr::kable(spptable, col.names = c("Model name", "Latin name"))
Model name Latin name
Long_rough_dab Hippoglossoides platessoides
Green_halibut Reinhardtius hippoglossoides
Mackerel Scomber scombrus
Haddock Melongrammus aeglefinus
Saithe Pollachius virens
Redfish Sebastes mentella
Blue_whiting Micromesistius poutassou
Norwegian_ssh Clupea harengus
North_atl_cod Gadus morhua
Polar_cod Boreogadus saida
Capelin Mallotus villosus

The WGSAM skill assessment subgroup generated initial survey outputs that reflect initial group decisions. This first dataset will be near-perfect (no additional observation error or bias) and all modelers will know this. Dataset will include survey biomass and catch indices, survey and fishery length and age compositions, and diet information, and possibly other input parameters (VBGF parameters, etc).

We will use this same dataset to make inputs for Hydra and MSCAA. Hydra inputs are described at the link above and reside in the Hydra-sim repository where the estimation version is being developed.

A basic simulated dataset is available in the mskeyrun R package and we will use those outputs to generate the inputs for MSCAA below, avoiding the direct use of atlantisom output files if possible.

Create input files for each multispecies model

The time series have already been dimensioned in the mskeyrun R data package, so we focus on functions to generate input files that can eventually be collected in a package.

Multispecies statistical catch at age model, MSCAA

Similar to Hydra, MSCAA requires .dat and .pin files as inputs. Kiersten’s file [9Species_Input_MspModel.R](https://github.com/thefaylab/MS-SCAA/blob/master/9Species/R.programs.Inputs/9Species_Input_MspModel.R) outlines the steps for creating these files and is used as a basis here as much as possible.

An important note for MSCAA is that the ragged arrays of age compositions require that the species order be from most to least age classes.

Therefore we first order the species for use in the rest of the functions.

The number of age classes for each species is found in the mskeyrun::simSurveyAgeComp data object. (To be safe we could also check the mskeyrun::simFisheryAgeComp object to ensure we haven’t missed the max age).

# Identify species information
nsp <- length(mskeyrun::simFocalSpecies$Name)

# Sp order by descending number of age-classes
Nages <- mskeyrun::simSurveyAgecomp %>%
  select(Name, age) %>%
  group_by(Name) %>%
  summarise(maxage = max(age)) %>%
  arrange(desc(maxage))
  
sp.names <- Nages$Name
Nage <- as.vector(Nages$maxage)

sp.order <- 1:nsp
names(sp.order) <- sp.names
mod.sp.names <- sp.names
  names(mod.sp.names) <- sp.names
#  mod.sp.names['SH'] <- 'SHake'
#  mod.sp.names['WH'] <- 'WHake'
# ID'g which species had length structured ssp runs (bc inputs differ)
length.struct <- rep(0,length(sp.names))
  names(length.struct) <- sp.names
#length.struct[c('SDog','WSk')] <- 1  # all sim species are age structured; could change
length.struct.sp <- names(length.struct[length.struct==1])

To get model years we will use the earliest year of either the mskeyrun survey index or fishery catch files: mskeyrun::simSurveyIndex, mskeyrun::simCatchIndex. Similarly we take the initial and last years of food habits data from mskeyrun::simSurveyDietcomp.

# Identify year information

fyr<-rep(min(mskeyrun::simSurveyIndex$year, mskeyrun::simCatchIndex$year),nsp)
lyr<-rep(max(mskeyrun::simSurveyIndex$year, mskeyrun::simCatchIndex$year),nsp)
  names(fyr) <- sp.names
  names(lyr) <- sp.names
  
# Food habits data
FH.fyr<-min(mskeyrun::simSurveyDietcomp$year)
FH.lyr<-max(mskeyrun::simSurveyDietcomp$year)
binsize <- 5

Constants to be left alone

# Ecosystem biomass
eco.b = 15;                        ##Original units = million metric tons
eco.b = eco.b*1000;           ##Units = 10^6 kg;  Conversion:  million metric tons * 1e6(=million) * 1000(kg=metric ton) / 1e6(=million)

# Constants
eof <- 54321
o.constant <- 1e-3
p.constant <- 1e-30

# Phases: 
aAge1.ph <- rep(1,nsp)
  names(aAge1.ph) <- sp.names
  aAge1.ph[length.struct.sp] <- -1
aFt.ph <- aAge1.ph
dAge1.ph <- aAge1.ph
  dAge1.ph[dAge1.ph>0] <- 1 # 3,1 
dFt.ph <- dAge1.ph
Yr1.ph <- aAge1.ph
  Yr1.ph[Yr1.ph>0] <- 1 # 2,1 
fic.ph <- aAge1.ph
  fic.ph[fic.ph>0] <- 1 # 4,1
fish.ph <- fic.ph

Identifying species interactions is done similarly to the predprey matrix in hydra, but using themskeyrun::simSurveyDietcomp data. This code keeps all pred-prey interactions among the species no matter how small.

predprey <- mskeyrun::simSurveyDietcomp %>%
  filter(prey %in% unique(Name)) %>%
  select(Name, agecl, year, prey, value) %>%
  group_by(Name, prey) %>%
  summarise(avgpreyprop = mean(value)) %>%
  unite('Pred-Prey', Name:prey, sep = "-")

int.names <- as.vector(predprey$`Pred-Prey`)

# int.names <- c( #'Pred-Prey'
#       'Cod-Cod', # high stdev
#       'Cod-SH',
#       'Cod-Her',
#       'Cod-Mack',   # high stdev   
#       # 'Cod-WH',  #???    
# 
#       'SH-SH',
#       'SH-Her',
#       'SH-Mack', #???
# 
#       ## 'Goose-SDog',  #???
#       'Goose-Cod',
#       'Goose-SH', 
#       'Goose-Her',
#       'Goose-Mack', 
#       ## ?Goose-Goose?, #???
# 
#       # 'WH-WH', 
#        'WH-SH',
#        'WH-Her', 
#       # 'WH-Mack', #??? # high stdev
#  
#       # 'Pol-Mack', #???
#        'Pol-SH',
#        'Pol-Her', 
# 
#       'SDog-SH', 
#       'SDog-Her',
#       'SDog-Mack', 
#       'SDog-Cod', 
#       # 'SDog-Goose', #???
# 
#        'WSk-SH', 
#        'WSk-Her'
#       # 'WSk-Mack'
# )
#print('Remember to modify species interactions if necessary')

The next step is reading in single species model outputs to initialize the multispeces model. We need to think about whether we want to simulate this whole process, and if not, how best to initialize the multispecies model. Or perhaps it is the same input data no matter what.

Things to confirm for generating data:

  1. MS-SCAA fits to only one survey index per year for each species? No! there are multiple surveys–get number below
    • If so pick fall simulated data to include NOBA mackerel. Note that each survey gets different ages of NOBA blue whiting.
    • If not how to input multiple survey time series? also different recruitment-to-survey parameters for each
#Number of trawl survey datasets
nFIC <- length(unique(mskeyrun::simSurveyIndex$survey)) #WARNING not safe for multiple surveys in the dataset

While there are multiple surveys, the survey age at first recruitment is dimensioned only per species, not survey, so if they are different we will need to make a decision which survey data to use. In the initial dataset there is no survey selectivity so they are the same anyway.

  1. Single fishing fleet per species? I think correct but if not, same questions as above

Some default assumptions matching the current 9species and appropriate for simulated data as well to go directly into dat file:

# Number of segments of different natural mortality (M1) rates, called nMseg in tpl, Mnseg in dat, M1nseg in singlespp
Mnseg <- 1 

# Number of segments for the FIC data; for each segment, a separate q is estimated. call all one survey, therefore 1
Nseg <- rep(1, nsp) 

# First age when partially recruited to the fishery; used to avoid hitting parameter bounds
agePR <- rep(1, nsp) # placeholder, use below to estimate full rec - 1 or -2?
  
# Age of full recruitment to the fishery
ageFR <- mskeyrun::simFisheryAgecomp %>%
  select(Name, fishery, age) %>%
  group_by(Name, fishery) %>%
  summarise(minage = min(age)) %>%
  pivot_wider(names_from = fishery, values_from = minage) %>%
  arrange(factor(Name, levels =sp.names)) %>%
  ungroup()
  
# Age of full recruitment to the fishery-independent survey, FIC
    # ficFR == 0 for species that never become fully recruited to the survey
    # All NOBA species have age 1 in first dataset, to be safe pull from age dat:
ficFR <- mskeyrun::simSurveyAgecomp %>%
  select(Name, survey, age) %>%
  group_by(Name, survey) %>%
  summarise(minage = min(age)) %>%
  pivot_wider(names_from = survey, values_from = minage) %>%
  arrange(factor(Name, levels =sp.names))
  
# Survey selectivity coefficient for the last age class; used to anchor the curve
  # FICs_lage == 1 for species that become fully recruited to the survey
FICs_lage <- rep(1, nsp)

Peek at a single species rep file for structure:

reptoRlist<-dget(here("9species/R.functions/reptoRlist.R"))

Cod.rep<-reptoRlist(here("Single.Species/Cod.Run119/Cod.rep"))
Cod.rep$ofv
## [1] 5050.31
Cod.rep$M1.1
##  [1] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

Since the multispecies simulated data is already together in mskeyrun, I may bypass this step for now and build a .dat file directly.

Kiersten’s workflow below for reference

# Organize data for each individual species into lists
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Organize_Individ_SpData.R")

# Organize food habits data
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Organize_FH_Data.R")

# Concatenate size preference parameters;  Size preference parameters were initially estimated in SAS
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Concatenate_SizePref_Parameters.R")

# Modify FH data to remove any observed %Wts representing species interactions that are not explicitly modeled
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Modify_Sp9_FH_ForSpInts.R")

# Import consumption estimates; Modify to accomodate low sample sizes; Consumption estimates were initially estimated in SAS
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Import_Modify_Consumption_Rates.R")

# Organize consumption estimates
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Organize_Consumption_Estimates.R")

# Rho
#  Initial parameter estimates
iRho <- rep(1,nint)
# Phase
Rho.ph <- rep(1,nint)# rep(1,nint) # 

# Obj fx weights for FH components
Owt <- 10
FHwt <- rep(10,length(sp.order))
  names(FHwt) <- names(sp.order)


ssp.CPwt[['Her']] <- 3

ssp.SPwt[['Mack']][1] <- 5
ssp.SPwt[['Her']][1] <- 10

# Create list for dat file
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Create_DatFile_List.R")
# Create list for pin file
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Create_PinFile_List.R")
# Create modified list for pin file (using the estimates that went *into* the ssp runs
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Create_PinFile_With.SSPInputs.R")

# Write dat and pin files
setwd(sp9.dir)
Kdat_write("9Species",Sp9.dat)
Kpin_write("9Species",Sp9.pin)
# Kpin_write("9Species",Sp9.SSp.pin)

Try going straight to Create_DatFile_List.R

Run <- c("Test NOBA")

##########################################
#Organizes food habits data for input into msp model
###########################################

# Organize species interactions; Determine pred, prey and nint

split.ints <- strsplit(int.names,"-")
pred.names <- unlist(lapply(split.ints,function(x){x[1]}))
prey.names <- unlist(lapply(split.ints,function(x){x[2]}))

pred <- sp.order[pred.names]
prey <- sp.order[prey.names]
nint<-length(pred)

# Rho
#  Initial parameter estimates
iRho <- rep(1,nint)
# Phase
Rho.ph <- rep(1,nint)# rep(1,nint) # 

# Obj fx weights for FH components
Owt <- 10
FHwt <- rep(10,length(sp.order))
  names(FHwt) <- names(sp.order)

## Make the observed data ##

## List structure by species

list.bysp <- vector('list',nsp)
  names(list.bysp) <- sp.names
  
seasons <- c('Spring','Fall')
list.bysp.season <- lapply(list.bysp,function(x){
                                     x<-vector('list',length(seasons))
                                     names(x) <- seasons
                                     x  })

ssp.CAA <- list.bysp
ssp.pred.cprop <- list.bysp
ssp.N <- list.bysp
ssp.Wt <- list.bysp
ssp.TotC <- list.bysp
ssp.TotFIC <- list.bysp
ssp.M1seg <- list.bysp

ssp.FIC <- list.bysp.season
ssp.pred.sprop <- list.bysp.season

## read in NOBA data to structures

# handy function see https://github.com/tidyverse/dplyr/issues/4223#issuecomment-469269857
named_group_split <- function(.tbl, ...) {
  grouped <- group_by(.tbl, ...)
  #names <- rlang::inject(paste(!!!group_keys(grouped), sep = " / "))
  names <- rlang::inject(paste(!!!group_keys(grouped), sep = "$"))
  
  grouped %>% 
    group_split() %>% 
    rlang::set_names(names)
}

# total catch, vector for each species (thousand ? tons)
TotClist <- named_group_split(mskeyrun::simCatchIndex %>% 
                                filter(variable=="catch") %>% 
                                mutate(value = value/1000) %>%
                                select(Name, year, value), Name) %>%
  map("value")
# need to order by sp.order
 ssp.TotC  <-  TotClist[names(ssp.TotC)]
 
# WT AGE INTERPOLATION BREAKS FOR FIRST FISHERY AGE IF NOT AGE 1 
# fix bug in atlantisom--done and mskeyrun updated
 
# fishery weight at age (kg), years in rows, ages in columns
sspWtlist <- named_group_split(mskeyrun::simFisheryWtatAge %>% 
                                filter(variable=="Wtage") %>% 
                                mutate(value = value/1000) %>%
                                select(Name, year, value), Name) %>%
  map("value")
 
 
# calculate expansion factor for CAA from sample
# weight at age times number sampled at age
fishwtagesamp <- mskeyrun::simFisheryAgecomp %>%
  dplyr::mutate(Natage = value) %>%
  dplyr::select(-variable, -value, -units) %>%
  dplyr::left_join(mskeyrun::simFisheryWtatAge) %>%
  dplyr::mutate(Wtage = value) %>%
  dplyr::select(-variable, -value, -units) %>%
  # fill missing wtages with average survey wtage, otherwise keep fishery wtage
  dplyr::left_join(mskeyrun::simSurveyWtatAge) %>%
  dplyr::select(-units) %>%
  tidyr::pivot_wider(names_from = c("survey", "variable"), values_from = "value") %>%
  dplyr::ungroup()%>%
  dplyr::mutate(meanSvWtage =  select(., tail(names(.), 2)) %>% 
                         pmap(~ mean(c(...))) %>% unlist()) %>%
  dplyr::mutate(Wtage = ifelse(is.na(Wtage), meanSvWtage, Wtage)) %>%
  dplyr::mutate(wtagesamp_g = Wtage*Natage) %>%
  dplyr::group_by(ModSim, year, Code, Name, fishery) %>%
  dplyr::mutate(sumsamp_g = sum(wtagesamp_g),
                propsampwt = wtagesamp_g/sumsamp_g)

fishtotc <- mskeyrun::simCatchIndex %>%
  dplyr::filter(variable == "catch") %>%
  dplyr::mutate(catch = value) %>%
  dplyr::select(-variable, -value, -units)

# SOLVED: DIAGNOSE WHY FISHERY AGE SAMPLE ONE YEAR EARLIER THAN TOTAL CATCH
# Fixed in mskeyrun dataset June 9 2022
# total catch from Catch.txt, includes 0 year
# Natage in catch from ANNAGECATCH.nc
# also mismatches lengths derived from CATCH.nc, same direction

# July 2022
# NEXT PROBLEM: annagecatch.nc and catch.nc have different age classes caught
# annagecatch.nc has all age classes of North_atl_cod in the catch
# catch.nc does not have age classes 1 and 2 (annual age 1-4)
# weight at age extrapolation is based on wt at age derived from catch.nc
# so fishery weight at age file is missing wt at age for ages present in annagecatch
# This is an Atlantis output comparison problem and nothing I can fix
# For our purposes, I will use the weight at age from surveys for missing catch weight at age
fishCAAest <- fishwtagesamp %>%
  dplyr::left_join(fishtotc)

# testing
# problem is only cod
# fishCAAest %>% filter(is.na(Wtage)) %>% ungroup() %>% select(Name) %>% distinct()
# but it is all years
# fishCAAest %>% filter(is.na(Wtage)) %>% ungroup() %>% select(year) %>% distinct()

# # this works so integrating above
# fishCAAestfix <- fishCAAest %>%
#   dplyr::filter(is.na(Wtage)) %>%
#   dplyr::left_join(mskeyrun::simSurveyWtatAge) %>%
#   dplyr::select(-units) %>%
#   tidyr::pivot_wider(names_from = c("survey", "variable"), values_from = "value") %>%
#   dplyr::ungroup()%>%
#   dplyr::mutate(meanSvWtage =  select(., tail(names(.), 2)) %>% 
#                          pmap(~ mean(c(...))) %>% unlist()) %>%
#   dplyr::mutate(Wtage = meanSvWtage,
#                 wtagesamp_g = Wtage*Natage)

# sum sampled weight
# catch numbers at age (millions)
sspCAAlist <- named_group_split(mskeyrun::simFisheryAgecomp %>% 
                                filter(variable=="Natage") %>% 
                                #mutate(value = value/1000000) %>%
                                select(Name, year, value), Name) %>%
  map("value")

 
 
# total FIC is two vectors 
 
 
 
 
# seasonal structure is different, season is one more list object down 
 # but writes to dat file as two vectors without season header
 
# MSCAA doesn't use biomass based index, uses numbers based so sum FIC CAA instead
 # this does work though to split the surveys into species and season list objects
# surveys, vector for each species and season
sspFICblist <- named_group_split(mskeyrun::simSurveyIndex %>% 
                                mutate(season = case_when(survey == "BTS_fall_allbox_effic1" ~ "Fall",
                                                          survey == "BTS_spring_allbox_effic1" ~ "Spring")) %>%
                                filter(variable=="biomass") %>% 
                                select(Name, season, year, value), Name, season) %>%
  map("value") 







## make the dat list
  
Sp9.dat <- c(
  list (
    Run = Run,
    Simulation = 1,
    Trophic = 1,
    nsp = nsp,
    nFIC = nFIC.num,
    o = o.constant,
    p = p.constant,
    Nint = nint,
    Binsize = binsize,
    EcoB.mil.kg = eco.b,
    Owt = Owt,
    Fyr=fyr, Lyr=lyr,
    FHfyr=FH.fyr,FHlyr=FH.lyr,
    Nage = Nage, #unlist(ssp.nage),
    Mnseg = M1nseg.num, 
    Nseg = Nseg, #unlist(ssp.FICnseg),
    agePR = agePR, #unlist(ssp.agePR),
    ageFR = as.vector(t(ageFR[,-1])), #unlist(ssp.ageFR),
    ficFR = #unlist(ssp.ficFR),
    FICs_lage = FICs_lage, #unlist(ssp.FICs.lage),

    aAge1ph = aAge1.ph,
    aFtph = aFt.ph,
    dAge1ph = dAge1.ph,
    dFtph = dFt.ph,
    ficph = fic.ph,
    fishph = fish.ph,
    Yr1ph = Yr1.ph,  

    Pred = pred,
    Prey = prey,
    Rhoph = Rho.ph,
    M1yr = M1yr.num,
    TCwt = unlist(ssp.TCwt),
    CPwt = unlist(ssp.CPwt),
    Bwt = unlist(ssp.Bwt),
    Ywt = unlist(ssp.Ywt),
    Rwt = unlist(ssp.Rwt),
    FHwt = FHwt,
    Bthres = unlist(ssp.Bthres),
    Rthres = unlist(ssp.Rthres),
    TSwt = do.call(rbind,ssp.TSwt), # weights for survey, weights at 0 if not used
    SPwt = do.call(rbind,ssp.SPwt),
    iM2 = rep(0,sum(unlist(ssp.nage))),
    Eta = Eta,
    Sigma1 = Sig1.out,
    Sigma2 = Sig2.out,
    fic_fage = do.call(rbind,ssp.FICfage),
    fic_lage = do.call(rbind,ssp.FIClage),
    FICmonth = do.call(rbind,ssp.FICmon),
    FICyr = unlist(ssp.FICyr)
  ),

  TotC.tmt = ssp.TotC,  # thousands metric tons fishery catch
  WeightAtAge.kg=ssp.Wt, # fishery catch weight at age or survey? start with fishery
  CatchAtAge.mil = ssp.CAA, # fishery catch at age millions
  TotFIC = ssp.TotFIC, # survey index spring and fall surveys summed over N at age NOT a biomass index
  M1 = Sp9.M1seg, # outside the model mort by age
  CBratio = CB.list, # consumption to biomass ratio

  # survey catch at age number/tow
   FIC.SDog.1  = lapply(ssp.FIC[[1]],as.matrix),
   FIC.WSk.2   = lapply(ssp.FIC[[2]],as.matrix), 
   FIC.Goose.3 = lapply(ssp.FIC[[3]],as.matrix), 
   FIC.Cod.4    = lapply(ssp.FIC[[4]],as.matrix),
   FIC.Mack.5 = lapply(ssp.FIC[[5]],as.matrix),  
   FIC.Pol.6     = lapply(ssp.FIC[[6]],as.matrix), 
   FIC.WH.7   = lapply(ssp.FIC[[7]],as.matrix),    
   FIC.SH.8    = lapply(ssp.FIC[[8]],as.matrix), 
   FIC.Her.9   = lapply(ssp.FIC[[9]],as.matrix),  
  FH = Sp9.ModFH.list, #nbins year blocks, 
  Eof = list(eof)
 )

References

1.
Hansen C, Skern-Mauritzen M, Meeren G van der, Jähkel A, Drinkwater KF. Set-up of the Nordic and Barents Seas (NoBa) Atlantis model [Internet]. Havforskningsinstituttet; 2016. Available: https://imr.brage.unit.no/imr-xmlui/bitstream/handle/11250/2408609/FoH_2-2016.pdf?sequence=1&isAllowed=y
2.
Hansen C, Drinkwater KF, Jähkel A, Fulton EA, Gorton R, Skern-Mauritzen M. Sensitivity of the Norwegian and Barents Sea Atlantis end-to-end ecosystem model to parameter perturbations of key species. Tsikliras AC, editor. PLOS ONE. 2019;14: e0210419. doi:10.1371/journal.pone.0210419