We use existing Atlantis ecosystem model output to generate input datasets for a variety of multispecies models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models simulate a wide range of physical and ecological processes, include a full food web, and can be run using different climate forcing, fishing, and other scenarios.
We extract simulated data using the R package atlantisom. The purpose of atlantisom is to use existing Atlantis model output to generate input datasets for a variety of models, so that the performance of these models can be evaluated against known (simulated) ecosystem dynamics. Atlantis models can be run using different climate forcing, fishing, and other scenarios. Users of atlantisom will be able to specify fishery independent and fishery dependent sampling in space and time, as well as species-specific catchability, selectivty, and other observation processes for any Atlantis scenario. Internally consistent multispecies and ecosystem datasets with known observation error characteristics will be the atlantisom outputs, for use in individual model performance testing, comparing performance of alternative models, and performance testing of model ensembles against “true” Atlantis outputs.
Our initial species selection includes 11 single species groups from the Norwegian Barents Sea (NOBA) Atlantis model [1,2]. These groups are fully age structured. All but two of them are fished.
spptable <- mskeyrun::simFocalSpecies %>%
select(Name, SciName)
knitr::kable(spptable, col.names = c("Model name", "Latin name"))
Model name | Latin name |
---|---|
Long_rough_dab | Hippoglossoides platessoides |
Green_halibut | Reinhardtius hippoglossoides |
Mackerel | Scomber scombrus |
Haddock | Melongrammus aeglefinus |
Saithe | Pollachius virens |
Redfish | Sebastes mentella |
Blue_whiting | Micromesistius poutassou |
Norwegian_ssh | Clupea harengus |
North_atl_cod | Gadus morhua |
Polar_cod | Boreogadus saida |
Capelin | Mallotus villosus |
The WGSAM skill assessment subgroup generated initial survey outputs that reflect initial group decisions. This first dataset will be near-perfect (no additional observation error or bias) and all modelers will know this. Dataset will include survey biomass and catch indices, survey and fishery length and age compositions, and diet information, and possibly other input parameters (VBGF parameters, etc).
We will use this same dataset to make inputs for Hydra and MSCAA. Hydra inputs are described at the link above and reside in the Hydra-sim repository where the estimation version is being developed.
A basic simulated dataset is available in the mskeyrun R package and we will use those outputs to generate the inputs for MSCAA below, avoiding the direct use of atlantisom output files if possible.
The time series have already been dimensioned in the mskeyrun R data package, so we focus on functions to generate input files that can eventually be collected in a package.
Similar to Hydra, MSCAA requires .dat and .pin files as inputs.
Kiersten’s file
[9Species_Input_MspModel.R](https://github.com/thefaylab/MS-SCAA/blob/master/9Species/R.programs.Inputs/9Species_Input_MspModel.R)
outlines the steps for creating these files and is used as a basis here
as much as possible.
An important note for MSCAA is that the ragged arrays of age compositions require that the species order be from most to least age classes.
Therefore we first order the species for use in the rest of the functions.
The number of age classes for each species is found in the
mskeyrun::simSurveyAgeComp
data object. (To be safe we
could also check the mskeyrun::simFisheryAgeComp
object to
ensure we haven’t missed the max age).
# Identify species information
nsp <- length(mskeyrun::simFocalSpecies$Name)
# Sp order by descending number of age-classes
Nages <- mskeyrun::simSurveyAgecomp %>%
select(Name, age) %>%
group_by(Name) %>%
summarise(maxage = max(age)) %>%
arrange(desc(maxage))
sp.names <- Nages$Name
Nage <- as.vector(Nages$maxage)
sp.order <- 1:nsp
names(sp.order) <- sp.names
mod.sp.names <- sp.names
names(mod.sp.names) <- sp.names
# mod.sp.names['SH'] <- 'SHake'
# mod.sp.names['WH'] <- 'WHake'
# ID'g which species had length structured ssp runs (bc inputs differ)
length.struct <- rep(0,length(sp.names))
names(length.struct) <- sp.names
#length.struct[c('SDog','WSk')] <- 1 # all sim species are age structured; could change
length.struct.sp <- names(length.struct[length.struct==1])
To get model years we will use the earliest year of either the
mskeyrun survey index or fishery catch files:
mskeyrun::simSurveyIndex
,
mskeyrun::simCatchIndex
. Similarly we take the initial and
last years of food habits data from
mskeyrun::simSurveyDietcomp
.
# Identify year information
fyr<-rep(min(mskeyrun::simSurveyIndex$year, mskeyrun::simCatchIndex$year),nsp)
lyr<-rep(max(mskeyrun::simSurveyIndex$year, mskeyrun::simCatchIndex$year),nsp)
names(fyr) <- sp.names
names(lyr) <- sp.names
# Food habits data
FH.fyr<-min(mskeyrun::simSurveyDietcomp$year)
FH.lyr<-max(mskeyrun::simSurveyDietcomp$year)
binsize <- 5
Constants to be left alone
# Ecosystem biomass
eco.b = 15; ##Original units = million metric tons
eco.b = eco.b*1000; ##Units = 10^6 kg; Conversion: million metric tons * 1e6(=million) * 1000(kg=metric ton) / 1e6(=million)
# Constants
eof <- 54321
o.constant <- 1e-3
p.constant <- 1e-30
# Phases:
aAge1.ph <- rep(1,nsp)
names(aAge1.ph) <- sp.names
aAge1.ph[length.struct.sp] <- -1
aFt.ph <- aAge1.ph
dAge1.ph <- aAge1.ph
dAge1.ph[dAge1.ph>0] <- 1 # 3,1
dFt.ph <- dAge1.ph
Yr1.ph <- aAge1.ph
Yr1.ph[Yr1.ph>0] <- 1 # 2,1
fic.ph <- aAge1.ph
fic.ph[fic.ph>0] <- 1 # 4,1
fish.ph <- fic.ph
Identifying species interactions is done similarly to the predprey
matrix in hydra, but using themskeyrun::simSurveyDietcomp
data. This code keeps all pred-prey interactions among the species no
matter how small.
predprey <- mskeyrun::simSurveyDietcomp %>%
filter(prey %in% unique(Name)) %>%
select(Name, agecl, year, prey, value) %>%
group_by(Name, prey) %>%
summarise(avgpreyprop = mean(value)) %>%
unite('Pred-Prey', Name:prey, sep = "-")
int.names <- as.vector(predprey$`Pred-Prey`)
# int.names <- c( #'Pred-Prey'
# 'Cod-Cod', # high stdev
# 'Cod-SH',
# 'Cod-Her',
# 'Cod-Mack', # high stdev
# # 'Cod-WH', #???
#
# 'SH-SH',
# 'SH-Her',
# 'SH-Mack', #???
#
# ## 'Goose-SDog', #???
# 'Goose-Cod',
# 'Goose-SH',
# 'Goose-Her',
# 'Goose-Mack',
# ## ?Goose-Goose?, #???
#
# # 'WH-WH',
# 'WH-SH',
# 'WH-Her',
# # 'WH-Mack', #??? # high stdev
#
# # 'Pol-Mack', #???
# 'Pol-SH',
# 'Pol-Her',
#
# 'SDog-SH',
# 'SDog-Her',
# 'SDog-Mack',
# 'SDog-Cod',
# # 'SDog-Goose', #???
#
# 'WSk-SH',
# 'WSk-Her'
# # 'WSk-Mack'
# )
#print('Remember to modify species interactions if necessary')
The next step is reading in single species model outputs to initialize the multispeces model. We need to think about whether we want to simulate this whole process, and if not, how best to initialize the multispecies model. Or perhaps it is the same input data no matter what.
Things to confirm for generating data:
#Number of trawl survey datasets
nFIC <- length(unique(mskeyrun::simSurveyIndex$survey)) #WARNING not safe for multiple surveys in the dataset
While there are multiple surveys, the survey age at first recruitment is dimensioned only per species, not survey, so if they are different we will need to make a decision which survey data to use. In the initial dataset there is no survey selectivity so they are the same anyway.
Some default assumptions matching the current 9species and appropriate for simulated data as well to go directly into dat file:
# Number of segments of different natural mortality (M1) rates, called nMseg in tpl, Mnseg in dat, M1nseg in singlespp
Mnseg <- 1
# Number of segments for the FIC data; for each segment, a separate q is estimated. call all one survey, therefore 1
Nseg <- rep(1, nsp)
# First age when partially recruited to the fishery; used to avoid hitting parameter bounds
agePR <- rep(1, nsp) # placeholder, use below to estimate full rec - 1 or -2?
# Age of full recruitment to the fishery
ageFR <- mskeyrun::simFisheryAgecomp %>%
select(Name, fishery, age) %>%
group_by(Name, fishery) %>%
summarise(minage = min(age)) %>%
pivot_wider(names_from = fishery, values_from = minage) %>%
arrange(factor(Name, levels =sp.names)) %>%
ungroup()
# Age of full recruitment to the fishery-independent survey, FIC
# ficFR == 0 for species that never become fully recruited to the survey
# All NOBA species have age 1 in first dataset, to be safe pull from age dat:
ficFR <- mskeyrun::simSurveyAgecomp %>%
select(Name, survey, age) %>%
group_by(Name, survey) %>%
summarise(minage = min(age)) %>%
pivot_wider(names_from = survey, values_from = minage) %>%
arrange(factor(Name, levels =sp.names))
# Survey selectivity coefficient for the last age class; used to anchor the curve
# FICs_lage == 1 for species that become fully recruited to the survey
FICs_lage <- rep(1, nsp)
Peek at a single species rep file for structure:
reptoRlist<-dget(here("9species/R.functions/reptoRlist.R"))
Cod.rep<-reptoRlist(here("Single.Species/Cod.Run119/Cod.rep"))
Cod.rep$ofv
## [1] 5050.31
Cod.rep$M1.1
## [1] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
Since the multispecies simulated data is already together in mskeyrun, I may bypass this step for now and build a .dat file directly.
Kiersten’s workflow below for reference
# Organize data for each individual species into lists
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Organize_Individ_SpData.R")
# Organize food habits data
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Organize_FH_Data.R")
# Concatenate size preference parameters; Size preference parameters were initially estimated in SAS
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Concatenate_SizePref_Parameters.R")
# Modify FH data to remove any observed %Wts representing species interactions that are not explicitly modeled
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Modify_Sp9_FH_ForSpInts.R")
# Import consumption estimates; Modify to accomodate low sample sizes; Consumption estimates were initially estimated in SAS
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Import_Modify_Consumption_Rates.R")
# Organize consumption estimates
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Organize_Consumption_Estimates.R")
# Rho
# Initial parameter estimates
iRho <- rep(1,nint)
# Phase
Rho.ph <- rep(1,nint)# rep(1,nint) #
# Obj fx weights for FH components
Owt <- 10
FHwt <- rep(10,length(sp.order))
names(FHwt) <- names(sp.order)
ssp.CPwt[['Her']] <- 3
ssp.SPwt[['Mack']][1] <- 5
ssp.SPwt[['Her']][1] <- 10
# Create list for dat file
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Create_DatFile_List.R")
# Create list for pin file
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Create_PinFile_List.R")
# Create modified list for pin file (using the estimates that went *into* the ssp runs
source("C:\\My_Programs\\R\\9Species Model\\Input Files\\Create_PinFile_With.SSPInputs.R")
# Write dat and pin files
setwd(sp9.dir)
Kdat_write("9Species",Sp9.dat)
Kpin_write("9Species",Sp9.pin)
# Kpin_write("9Species",Sp9.SSp.pin)
Try going straight to Create_DatFile_List.R
Run <- c("Test NOBA")
##########################################
#Organizes food habits data for input into msp model
###########################################
# Organize species interactions; Determine pred, prey and nint
split.ints <- strsplit(int.names,"-")
pred.names <- unlist(lapply(split.ints,function(x){x[1]}))
prey.names <- unlist(lapply(split.ints,function(x){x[2]}))
pred <- sp.order[pred.names]
prey <- sp.order[prey.names]
nint<-length(pred)
# Rho
# Initial parameter estimates
iRho <- rep(1,nint)
# Phase
Rho.ph <- rep(1,nint)# rep(1,nint) #
# Obj fx weights for FH components
Owt <- 10
FHwt <- rep(10,length(sp.order))
names(FHwt) <- names(sp.order)
## Make the observed data ##
## List structure by species
list.bysp <- vector('list',nsp)
names(list.bysp) <- sp.names
seasons <- c('Spring','Fall')
list.bysp.season <- lapply(list.bysp,function(x){
x<-vector('list',length(seasons))
names(x) <- seasons
x })
ssp.CAA <- list.bysp
ssp.pred.cprop <- list.bysp
ssp.N <- list.bysp
ssp.Wt <- list.bysp
ssp.TotC <- list.bysp
ssp.TotFIC <- list.bysp
ssp.M1seg <- list.bysp
ssp.FIC <- list.bysp.season
ssp.pred.sprop <- list.bysp.season
## read in NOBA data to structures
# handy function see https://github.com/tidyverse/dplyr/issues/4223#issuecomment-469269857
named_group_split <- function(.tbl, ...) {
grouped <- group_by(.tbl, ...)
#names <- rlang::inject(paste(!!!group_keys(grouped), sep = " / "))
names <- rlang::inject(paste(!!!group_keys(grouped), sep = "$"))
grouped %>%
group_split() %>%
rlang::set_names(names)
}
# total catch, vector for each species (thousand ? tons)
TotClist <- named_group_split(mskeyrun::simCatchIndex %>%
filter(variable=="catch") %>%
mutate(value = value/1000) %>%
select(Name, year, value), Name) %>%
map("value")
# need to order by sp.order
ssp.TotC <- TotClist[names(ssp.TotC)]
# WT AGE INTERPOLATION BREAKS FOR FIRST FISHERY AGE IF NOT AGE 1
# fix bug in atlantisom--done and mskeyrun updated
# fishery weight at age (kg), years in rows, ages in columns
sspWtlist <- named_group_split(mskeyrun::simFisheryWtatAge %>%
filter(variable=="Wtage") %>%
mutate(value = value/1000) %>%
select(Name, year, value), Name) %>%
map("value")
# calculate expansion factor for CAA from sample
# weight at age times number sampled at age
fishwtagesamp <- mskeyrun::simFisheryAgecomp %>%
dplyr::mutate(Natage = value) %>%
dplyr::select(-variable, -value, -units) %>%
dplyr::left_join(mskeyrun::simFisheryWtatAge) %>%
dplyr::mutate(Wtage = value) %>%
dplyr::select(-variable, -value, -units) %>%
# fill missing wtages with average survey wtage, otherwise keep fishery wtage
dplyr::left_join(mskeyrun::simSurveyWtatAge) %>%
dplyr::select(-units) %>%
tidyr::pivot_wider(names_from = c("survey", "variable"), values_from = "value") %>%
dplyr::ungroup()%>%
dplyr::mutate(meanSvWtage = select(., tail(names(.), 2)) %>%
pmap(~ mean(c(...))) %>% unlist()) %>%
dplyr::mutate(Wtage = ifelse(is.na(Wtage), meanSvWtage, Wtage)) %>%
dplyr::mutate(wtagesamp_g = Wtage*Natage) %>%
dplyr::group_by(ModSim, year, Code, Name, fishery) %>%
dplyr::mutate(sumsamp_g = sum(wtagesamp_g),
propsampwt = wtagesamp_g/sumsamp_g)
fishtotc <- mskeyrun::simCatchIndex %>%
dplyr::filter(variable == "catch") %>%
dplyr::mutate(catch = value) %>%
dplyr::select(-variable, -value, -units)
# SOLVED: DIAGNOSE WHY FISHERY AGE SAMPLE ONE YEAR EARLIER THAN TOTAL CATCH
# Fixed in mskeyrun dataset June 9 2022
# total catch from Catch.txt, includes 0 year
# Natage in catch from ANNAGECATCH.nc
# also mismatches lengths derived from CATCH.nc, same direction
# July 2022
# NEXT PROBLEM: annagecatch.nc and catch.nc have different age classes caught
# annagecatch.nc has all age classes of North_atl_cod in the catch
# catch.nc does not have age classes 1 and 2 (annual age 1-4)
# weight at age extrapolation is based on wt at age derived from catch.nc
# so fishery weight at age file is missing wt at age for ages present in annagecatch
# This is an Atlantis output comparison problem and nothing I can fix
# For our purposes, I will use the weight at age from surveys for missing catch weight at age
fishCAAest <- fishwtagesamp %>%
dplyr::left_join(fishtotc)
# testing
# problem is only cod
# fishCAAest %>% filter(is.na(Wtage)) %>% ungroup() %>% select(Name) %>% distinct()
# but it is all years
# fishCAAest %>% filter(is.na(Wtage)) %>% ungroup() %>% select(year) %>% distinct()
# # this works so integrating above
# fishCAAestfix <- fishCAAest %>%
# dplyr::filter(is.na(Wtage)) %>%
# dplyr::left_join(mskeyrun::simSurveyWtatAge) %>%
# dplyr::select(-units) %>%
# tidyr::pivot_wider(names_from = c("survey", "variable"), values_from = "value") %>%
# dplyr::ungroup()%>%
# dplyr::mutate(meanSvWtage = select(., tail(names(.), 2)) %>%
# pmap(~ mean(c(...))) %>% unlist()) %>%
# dplyr::mutate(Wtage = meanSvWtage,
# wtagesamp_g = Wtage*Natage)
# sum sampled weight
# catch numbers at age (millions)
sspCAAlist <- named_group_split(mskeyrun::simFisheryAgecomp %>%
filter(variable=="Natage") %>%
#mutate(value = value/1000000) %>%
select(Name, year, value), Name) %>%
map("value")
# total FIC is two vectors
# seasonal structure is different, season is one more list object down
# but writes to dat file as two vectors without season header
# MSCAA doesn't use biomass based index, uses numbers based so sum FIC CAA instead
# this does work though to split the surveys into species and season list objects
# surveys, vector for each species and season
sspFICblist <- named_group_split(mskeyrun::simSurveyIndex %>%
mutate(season = case_when(survey == "BTS_fall_allbox_effic1" ~ "Fall",
survey == "BTS_spring_allbox_effic1" ~ "Spring")) %>%
filter(variable=="biomass") %>%
select(Name, season, year, value), Name, season) %>%
map("value")
## make the dat list
Sp9.dat <- c(
list (
Run = Run,
Simulation = 1,
Trophic = 1,
nsp = nsp,
nFIC = nFIC.num,
o = o.constant,
p = p.constant,
Nint = nint,
Binsize = binsize,
EcoB.mil.kg = eco.b,
Owt = Owt,
Fyr=fyr, Lyr=lyr,
FHfyr=FH.fyr,FHlyr=FH.lyr,
Nage = Nage, #unlist(ssp.nage),
Mnseg = M1nseg.num,
Nseg = Nseg, #unlist(ssp.FICnseg),
agePR = agePR, #unlist(ssp.agePR),
ageFR = as.vector(t(ageFR[,-1])), #unlist(ssp.ageFR),
ficFR = #unlist(ssp.ficFR),
FICs_lage = FICs_lage, #unlist(ssp.FICs.lage),
aAge1ph = aAge1.ph,
aFtph = aFt.ph,
dAge1ph = dAge1.ph,
dFtph = dFt.ph,
ficph = fic.ph,
fishph = fish.ph,
Yr1ph = Yr1.ph,
Pred = pred,
Prey = prey,
Rhoph = Rho.ph,
M1yr = M1yr.num,
TCwt = unlist(ssp.TCwt),
CPwt = unlist(ssp.CPwt),
Bwt = unlist(ssp.Bwt),
Ywt = unlist(ssp.Ywt),
Rwt = unlist(ssp.Rwt),
FHwt = FHwt,
Bthres = unlist(ssp.Bthres),
Rthres = unlist(ssp.Rthres),
TSwt = do.call(rbind,ssp.TSwt), # weights for survey, weights at 0 if not used
SPwt = do.call(rbind,ssp.SPwt),
iM2 = rep(0,sum(unlist(ssp.nage))),
Eta = Eta,
Sigma1 = Sig1.out,
Sigma2 = Sig2.out,
fic_fage = do.call(rbind,ssp.FICfage),
fic_lage = do.call(rbind,ssp.FIClage),
FICmonth = do.call(rbind,ssp.FICmon),
FICyr = unlist(ssp.FICyr)
),
TotC.tmt = ssp.TotC, # thousands metric tons fishery catch
WeightAtAge.kg=ssp.Wt, # fishery catch weight at age or survey? start with fishery
CatchAtAge.mil = ssp.CAA, # fishery catch at age millions
TotFIC = ssp.TotFIC, # survey index spring and fall surveys summed over N at age NOT a biomass index
M1 = Sp9.M1seg, # outside the model mort by age
CBratio = CB.list, # consumption to biomass ratio
# survey catch at age number/tow
FIC.SDog.1 = lapply(ssp.FIC[[1]],as.matrix),
FIC.WSk.2 = lapply(ssp.FIC[[2]],as.matrix),
FIC.Goose.3 = lapply(ssp.FIC[[3]],as.matrix),
FIC.Cod.4 = lapply(ssp.FIC[[4]],as.matrix),
FIC.Mack.5 = lapply(ssp.FIC[[5]],as.matrix),
FIC.Pol.6 = lapply(ssp.FIC[[6]],as.matrix),
FIC.WH.7 = lapply(ssp.FIC[[7]],as.matrix),
FIC.SH.8 = lapply(ssp.FIC[[8]],as.matrix),
FIC.Her.9 = lapply(ssp.FIC[[9]],as.matrix),
FH = Sp9.ModFH.list, #nbins year blocks,
Eof = list(eof)
)